6.9 Problems
Write the equations for each of these models: ARIMA(0,0,0), ARIMA(0,1,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1). Read the help file for the
Arima()
function (in the forecast package) if you are fuzzy on the arima notation.The MARSS package includes a data set of sharp-tailed grouse in Washington. Load the data to use as follows:
library(MARSS) <- log(grouse[, 2]) dat
Consider these two models for the data:
Model 1 random walk with no drift observed with no error
Model 2 random walk with drift observed with no error
Written as a univariate state-space model, model 1 is \[\begin{equation} \begin{gathered} x_t = x_{t-1}+w_t \text{ where } w_t \sim \,\text{N}(0,q)\\ x_0 = a \\ y_t = x_t \end{gathered} \tag{6.13} \end{equation}\] Model 2 is almost identical except with \(u\) added \[\begin{equation} \begin{gathered} x_t = x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q)\\ x_0 = a \\ y_t = x_t \end{gathered} \tag{6.14} \end{equation}\] \(y\) is the log grouse count in year \(t\).
Plot the data. The year is in column 1 of
grouse
.Fit each model using
MARSS()
.Which one appears better supported given AICc?
Load the forecast package. Use
?auto.arima
to learn what it does. Then useauto.arima(dat)
to fit the data. Next runauto.arima(dat, trace=TRUE)
to see all the ARIMA models that the function compared. Note, ARIMA(0,1,0) is a random walk with b=1. ARIMA(0,1,0) with drift would be a random walk (b=1) with drift (with \(u\)).
Is the difference in the AICc values between a random walk with and without drift comparable between MARSS() and auto.arima()?
Note when using
auto.arima()
, an AR(1) model of the following form will be fit (notice the \(b\)): \(x_t = b x_{t-1}+w_t\).auto.arima()
refers to this model \(x_t = x_{t-1}+w_t\), which is also AR(1) but with \(b=1\), as ARIMA(0,1,0). This says that the first difference of the data (that’s the 1 in the middle) is a ARMA(0,0) process (the 0s in the 1st and 3rd spots). So ARIMA(0,1,0) means this: \(x_t - x_{t-1} = w_t\).Create a random walk with drift time series using
cumsum()
andrnorm()
. Look at thernorm()
help file (?rnorm
) to make sure you know what the arguments to thernorm()
are.<- cumsum(rnorm(100, 0.1, 1)) dat
What is the order of this random walk written as ARIMA(p, d, q)? “what is the order” means “what is \(p\), \(d\), and \(q\). Model”order" is how
arima()
andArima()
specify arima models.Fit that model using
Arima()
in the forecast package. You’ll need to specify the argumentsorder
andinclude.drift
. Use?Arima
to review what that function does if needed.Write out the equation for this random walk as a univariate state-space model. Notice that there is no observation error, but still write this as a state-space model.
Fit that model with
MARSS()
.How are the two estimates from
Arima()
andMARSS()
different?
The first-difference of
dat
used in the previous problem is:<- diff(dat) diff.dat
Use
?diff
to check what thediff()
function does.If \(x_t\) denotes a time series. What is the first difference of \(x\)? What is the second difference?
What is the \(\mathbf{x}\) model for
diff.dat
? Look at your answer to part (a) and the answer to part (e).Fit
diff.dat
usingArima()
. You’ll need to change the argumentsorder
andinclude.mean
.Fit with
MARSS()
. You will need to write the model fordiff.dat
as a state-space model. If you’ve done this right, the estimated parameters usingArima()
andMARSS()
will now be the same.
This question should clue you into the fact that
Arima()
is not exactly fitting Equation (6.1). It’s very similar, but not quite written that way. By the way, Equation (6.1) is how structural time series observed with error are written (state-space models). To recover the estimates that a function likearima()
orArima()
returns, you need to write your state-space model in a specific way (as seen above).Arima()
will also fit what it calls an “AR(1) with drift.” An AR(1) with drift is NOT this model: \[\begin{equation} x_t = b x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q) \tag{6.15} \end{equation}\] In the population dynamics literature, this equation is called the Gompertz model and is a type of density-dependent population model.Write R code to simulate Equation (6.15). Make \(b\) less than 1 and greater than 0. Set \(u\) and \(x_0\) to whatever you want. You can use a for loop.
Plot the trajectories and show that this model does not “drift” upward or downward. It fluctuates about a mean value.
Hold \(b\) constant and change \(u\). How do the trajectories change?
Hold \(u\) constant and change \(b\). Make sure to use a \(b\) close to 1 and another close to 0. How do the trajectories change?
Do 2 simulations each with the same \(w_t\). In one simulation, set \(u=1\) and in the other \(u=2\). For both simulations, set \(x_1 = u/(1-b)\). You can set \(b\) to whatever you want as long as \(0<b<1\). Plot the 2 trajectories on the same plot. What is different?
We will fit what
Arima()
calls “AR(1) with drift” models in the chapter on MARSS models with covariates.The MARSS package includes a data set of gray whales. Load the data to use as follows:
library(MARSS) <- log(graywhales[, 2]) dat
Fit a random walk with drift model observed with error to the data: \[\begin{equation} \begin{gathered} x_t = x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q) \\ y_t = x_t+v_t \text{ where } v_t \sim \,\text{N}(0,r) \\ x_0 = a \end{gathered} \tag{6.16} \end{equation}\] \(y\) is the whale count in year \(t\). \(x\) is interpreted as the ‘true’ unknown population size that we are trying to estimate.
Fit this model with
MARSS()
Plot the estimated \(x\) as a line with the actual counts added as points. \(x\) is in
fit$states
. It is a matrix. To plot usingplot()
, you will need to change it to a vector usingas.vector()
orfit$states[1,]
Simulate 1000 sample gray whale populstion trajectories (the \(x\) in your model) using the estimated \(u\) and \(q\) starting at the estimated \(x\) in 1997. You can do this with a couple for loops or write something terse with
cumsum()
andapply()
.Using these simulated trajectories, what is your estimate of the probability that the grey whale population will be above 50,000 graywhales in 2007?
What kind(s) of uncertainty does your estimate above NOT include?
Fit the following models to the graywhales data using MARSS(). Assume \(b=1\).
Model 1 Process error only model with drift
Model 2 Process error only model without drift
Model 3 Process error with drift and observation error with observation error variance fixed = 0.05.
Model 4 Process error with drift and observation error with observation error variance estimated.
Compute the AICc’s for each model and likelihood or deviance (-2 * log likelihood). Where to find these? Try
names(fit)
.logLik()
is the standard R function to return log-likelihood from fits.Calculate a table of \(\Delta\text{AICc}\) values and AICc weights.
Show the acf of the model and state residuals for the best model. You will need a vector of the residuals to do this. If
fit
is the fit from a fit call likefit = MARSS(dat)
, you get the residuals using this code:residuals(fit)$state.residuals[1, ] residuals(fit)$model.residuals[1, ]
Do the acf’s suggest any problems?
Evaluate the predictive accuracy of forecasts using the forecast package using the
airmiles
dataset. Load the data to use as follows:library(forecast) <- log(airmiles) dat <- length(dat) n <- dat[1:(n - 3)] training.dat <- dat[(n - 2):n] test.dat
This will prepare the training data and set aside the last 3 data points for validation.
Fit the following four models using
Arima()
: ARIMA(0,0,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1).Use
forecast()
to make 3 step ahead forecasts from each.Calculate the MASE statistic for each using the
accuracy()
function in the forecast package. Type?accuracy
to learn how to use this function.Present the results in a table.
Which model is best supported based on the MASE statistic?
The WhaleNet Archive of STOP Data has movement data on loggerhead turtles on the east coast of the US from ARGOS tags. The MARSS package
loggerheadNoisy
dataset is lat/lot data on eight individuals, however we have corrupted this data severely by adding random errors in order to create a “bad tag” problem (very noisy). Usehead(loggerheadNoisy)
to get an idea of the data. Then load the data on one turtle, MaryLee. MARSS needs time across the columns to you need to use transpose the data (as shown).<- "MaryLee" turtlename <- loggerheadNoisy[which(loggerheadNoisy$turtle == turtlename), dat 5:6] <- t(dat) dat
Plot MaryLee’s locations (as a line not dots). Put the latitude locations on the y-axis and the longitude on the y-axis. You can use
rownames(dat)
to see which is in which row. You can just useplot()
for the homework. But if you want, you can look at the MARSS Manual chapter on animal movement to see how to plot the turtle locations on a map using the maps package.Analyze the data with a state-space model (movement observed with error) using
<- MARSS(dat) fit0
Look at the output from the above MARSS call. What is the meaning of the parameters output from MARSS in terms of turtle movement? What exactly is the \(u\) estimate for example? Look at the data and think about the model you fit.
What assumption did the default MARSS model make about observation error and process error? What does that assumption mean in terms of how steps in the N-S and E-W directions are related? What does that assumption mean in terms of our assumption about the latitudal and longitudinal observation errors?
Does MaryLee move faster in the latitude direction versus longitude direction?
Add MaryLee’s estimated “true” positions to your plot of her locations. You can use
lines(x, y, col="red")
(with x and y replaced with your x and y data). The true position is the “state.” This is in the states element of an output from MARSSfit0$states
.Fit the following models with different assumptions regarding the movement in the lat/lon direction:
Lat/lon movements are independent but the variance is the same
Lat/lon movements are correlated and lat/lon variances are different
Lat/lon movements are correlated and the lat/lon variances are the same.
You only need to change
Q
specification. Your MARSS call will now look like the following with...
replaced with yourQ
specification.<- MARSS(dat, list(Q = ...)) fit1
Plot your state residuals (true location residuals). What are the problems? Discuss in reference to your plot of the location data. Here is how to get state residuals from
MARSS()
output:<- residuals(fit0)$state.residuals resids
The lon residuals are in row 1 and lat residuals are in row 2 (same order as the data).