6.9 Problems

  1. Write the equations for each of these models: ARIMA(0,0,0), ARIMA(0,1,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1). Read the help file for the Arima() function (in the forecast package) if you are fuzzy on the arima notation.

  2. The MARSS package includes a data set of sharp-tailed grouse in Washington. Load the data to use as follows:

    dat <- log(grouse[, 2])

    Consider these two models for the data:

    • Model 1 random walk with no drift observed with no error

    • Model 2 random walk with drift observed with no error

    Written as a univariate state-space model, model 1 is \[\begin{equation} \begin{gathered} x_t = x_{t-1}+w_t \text{ where } w_t \sim \,\text{N}(0,q)\\ x_0 = a \\ y_t = x_t \end{gathered} \tag{6.13} \end{equation}\] Model 2 is almost identical except with \(u\) added \[\begin{equation} \begin{gathered} x_t = x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q)\\ x_0 = a \\ y_t = x_t \end{gathered} \tag{6.14} \end{equation}\] \(y\) is the log grouse count in year \(t\).

    1. Plot the data. The year is in column 1 of grouse.

    2. Fit each model using MARSS().

    3. Which one appears better supported given AICc?

    4. Load the forecast package. Use ?auto.arima to learn what it does. Then use auto.arima(dat) to fit the data. Next run auto.arima(dat, trace=TRUE) to see all the ARIMA models that the function compared. Note, ARIMA(0,1,0) is a random walk with b=1. ARIMA(0,1,0) with drift would be a random walk (b=1) with drift (with \(u\)).

    5. Is the difference in the AICc values between a random walk with and without drift comparable between MARSS() and auto.arima()?

    Note when using auto.arima(), an AR(1) model of the following form will be fit (notice the \(b\)): \(x_t = b x_{t-1}+w_t\). auto.arima() refers to this model \(x_t = x_{t-1}+w_t\), which is also AR(1) but with \(b=1\), as ARIMA(0,1,0). This says that the first difference of the data (that’s the 1 in the middle) is a ARMA(0,0) process (the 0s in the 1st and 3rd spots). So ARIMA(0,1,0) means this: \(x_t - x_{t-1} = w_t\).

  3. Create a random walk with drift time series using cumsum() and rnorm(). Look at the rnorm() help file (?rnorm) to make sure you know what the arguments to the rnorm() are.

    dat <- cumsum(rnorm(100, 0.1, 1))
    1. What is the order of this random walk written as ARIMA(p, d, q)? “what is the order” means “what is \(p\), \(d\), and \(q\). Model”order" is how arima() and Arima() specify arima models.

    2. Fit that model using Arima() in the forecast package. You’ll need to specify the arguments order and include.drift. Use ?Arima to review what that function does if needed.

    3. Write out the equation for this random walk as a univariate state-space model. Notice that there is no observation error, but still write this as a state-space model.

    4. Fit that model with MARSS().

    5. How are the two estimates from Arima() and MARSS() different?

  4. The first-difference of dat used in the previous problem is:

    diff.dat <- diff(dat)

    Use ?diff to check what the diff() function does.

    1. If \(x_t\) denotes a time series. What is the first difference of \(x\)? What is the second difference?

    2. What is the \(\mathbf{x}\) model for diff.dat? Look at your answer to part (a) and the answer to part (e).

    3. Fit diff.dat using Arima(). You’ll need to change the arguments order and include.mean.

    4. Fit with MARSS(). You will need to write the model for diff.dat as a state-space model. If you’ve done this right, the estimated parameters using Arima() and MARSS() will now be the same.

    This question should clue you into the fact that Arima() is not exactly fitting Equation (6.1). It’s very similar, but not quite written that way. By the way, Equation (6.1) is how structural time series observed with error are written (state-space models). To recover the estimates that a function like arima() or Arima() returns, you need to write your state-space model in a specific way (as seen above).

  5. Arima() will also fit what it calls an “AR(1) with drift.” An AR(1) with drift is NOT this model: \[\begin{equation} x_t = b x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q) \tag{6.15} \end{equation}\] In the population dynamics literature, this equation is called the Gompertz model and is a type of density-dependent population model.

    1. Write R code to simulate Equation (6.15). Make \(b\) less than 1 and greater than 0. Set \(u\) and \(x_0\) to whatever you want. You can use a for loop.

    2. Plot the trajectories and show that this model does not “drift” upward or downward. It fluctuates about a mean value.

    3. Hold \(b\) constant and change \(u\). How do the trajectories change?

    4. Hold \(u\) constant and change \(b\). Make sure to use a \(b\) close to 1 and another close to 0. How do the trajectories change?

    5. Do 2 simulations each with the same \(w_t\). In one simulation, set \(u=1\) and in the other \(u=2\). For both simulations, set \(x_1 = u/(1-b)\). You can set \(b\) to whatever you want as long as \(0<b<1\). Plot the 2 trajectories on the same plot. What is different?

    We will fit what Arima() calls “AR(1) with drift” models in the chapter on MARSS models with covariates.

  6. The MARSS package includes a data set of gray whales. Load the data to use as follows:

    dat <- log(graywhales[, 2])

    Fit a random walk with drift model observed with error to the data: \[\begin{equation} \begin{gathered} x_t = x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q) \\ y_t = x_t+v_t \text{ where } v_t \sim \,\text{N}(0,r) \\ x_0 = a \end{gathered} \tag{6.16} \end{equation}\] \(y\) is the whale count in year \(t\). \(x\) is interpreted as the ‘true’ unknown population size that we are trying to estimate.

    1. Fit this model with MARSS()

    2. Plot the estimated \(x\) as a line with the actual counts added as points. \(x\) is in fit$states. It is a matrix. To plot using plot(), you will need to change it to a vector using as.vector() or fit$states[1,]

    3. Simulate 1000 sample gray whale populstion trajectories (the \(x\) in your model) using the estimated \(u\) and \(q\) starting at the estimated \(x\) in 1997. You can do this with a couple for loops or write something terse with cumsum() and apply().

    4. Using these simulated trajectories, what is your estimate of the probability that the grey whale population will be above 50,000 graywhales in 2007?

    5. What kind(s) of uncertainty does your estimate above NOT include?

  7. Fit the following models to the graywhales data using MARSS(). Assume \(b=1\).

    • Model 1 Process error only model with drift

    • Model 2 Process error only model without drift

    • Model 3 Process error with drift and observation error with observation error variance fixed = 0.05.

    • Model 4 Process error with drift and observation error with observation error variance estimated.

    1. Compute the AICc’s for each model and likelihood or deviance (-2 * log likelihood). Where to find these? Try names(fit). logLik() is the standard R function to return log-likelihood from fits.

    2. Calculate a table of \(\Delta\text{AICc}\) values and AICc weights.

    3. Show the acf of the model and state residuals for the best model. You will need a vector of the residuals to do this. If fit is the fit from a fit call like fit = MARSS(dat), you get the residuals using this code:

      residuals(fit)$state.residuals[1, ]
      residuals(fit)$model.residuals[1, ]

      Do the acf’s suggest any problems?

  8. Evaluate the predictive accuracy of forecasts using the forecast package using the airmiles dataset. Load the data to use as follows:

    dat <- log(airmiles)
    n <- length(dat)
    training.dat <- dat[1:(n - 3)]
    test.dat <- dat[(n - 2):n]

    This will prepare the training data and set aside the last 3 data points for validation.

    1. Fit the following four models using Arima(): ARIMA(0,0,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1).

    2. Use forecast() to make 3 step ahead forecasts from each.

    3. Calculate the MASE statistic for each using the accuracy() function in the forecast package. Type ?accuracy to learn how to use this function.

    4. Present the results in a table.

    5. Which model is best supported based on the MASE statistic?

  9. The WhaleNet Archive of STOP Data has movement data on loggerhead turtles on the east coast of the US from ARGOS tags. The MARSS package loggerheadNoisy dataset is lat/lot data on eight individuals, however we have corrupted this data severely by adding random errors in order to create a “bad tag” problem (very noisy). Use head(loggerheadNoisy) to get an idea of the data. Then load the data on one turtle, MaryLee. MARSS needs time across the columns to you need to use transpose the data (as shown).

    turtlename <- "MaryLee"
    dat <- loggerheadNoisy[which(loggerheadNoisy$turtle == turtlename), 
    dat <- t(dat)
    1. Plot MaryLee’s locations (as a line not dots). Put the latitude locations on the y-axis and the longitude on the y-axis. You can use rownames(dat) to see which is in which row. You can just use plot() for the homework. But if you want, you can look at the MARSS Manual chapter on animal movement to see how to plot the turtle locations on a map using the maps package.

    2. Analyze the data with a state-space model (movement observed with error) using

      fit0 <- MARSS(dat)

      Look at the output from the above MARSS call. What is the meaning of the parameters output from MARSS in terms of turtle movement? What exactly is the \(u\) estimate for example? Look at the data and think about the model you fit.

    3. What assumption did the default MARSS model make about observation error and process error? What does that assumption mean in terms of how steps in the N-S and E-W directions are related? What does that assumption mean in terms of our assumption about the latitudal and longitudinal observation errors?

    4. Does MaryLee move faster in the latitude direction versus longitude direction?

    5. Add MaryLee’s estimated “true” positions to your plot of her locations. You can use lines(x, y, col="red") (with x and y replaced with your x and y data). The true position is the “state.” This is in the states element of an output from MARSS fit0$states.

    6. Fit the following models with different assumptions regarding the movement in the lat/lon direction:

      • Lat/lon movements are independent but the variance is the same

      • Lat/lon movements are correlated and lat/lon variances are different

      • Lat/lon movements are correlated and the lat/lon variances are the same.

      You only need to change Q specification. Your MARSS call will now look like the following with ... replaced with your Q specification.

      fit1 <- MARSS(dat, list(Q = ...))
    7. Plot your state residuals (true location residuals). What are the problems? Discuss in reference to your plot of the location data. Here is how to get state residuals from MARSS() output:

      resids <- residuals(fit0)$state.residuals

      The lon residuals are in row 1 and lat residuals are in row 2 (same order as the data).