## 6.9 Problems

Write the equations for each of these models: ARIMA(0,0,0), ARIMA(0,1,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1). Read the help file for the

`Arima()`

function (in the**forecast**package) if you are fuzzy on the arima notation.The

**MARSS**package includes a data set of sharp-tailed grouse in Washington. Load the data to use as follows:`library(MARSS) <- log(grouse[, 2]) dat`

Consider these two models for the data:

Model 1 random walk with no drift observed with no error

Model 2 random walk with drift observed with no error

Written as a univariate state-space model, model 1 is \[\begin{equation} \begin{gathered} x_t = x_{t-1}+w_t \text{ where } w_t \sim \,\text{N}(0,q)\\ x_0 = a \\ y_t = x_t \end{gathered} \tag{6.13} \end{equation}\] Model 2 is almost identical except with \(u\) added \[\begin{equation} \begin{gathered} x_t = x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q)\\ x_0 = a \\ y_t = x_t \end{gathered} \tag{6.14} \end{equation}\] \(y\) is the log grouse count in year \(t\).

Plot the data. The year is in column 1 of

`grouse`

.Fit each model using

`MARSS()`

.Which one appears better supported given AICc?

Load the

**forecast**package. Use`?auto.arima`

to learn what it does. Then use`auto.arima(dat)`

to fit the data. Next run`auto.arima(dat, trace=TRUE)`

to see all the ARIMA models that the function compared. Note, ARIMA(0,1,0) is a random walk with b=1. ARIMA(0,1,0) with drift would be a random walk (b=1) with drift (with \(u\)).

Is the difference in the AICc values between a random walk with and without drift comparable between MARSS() and auto.arima()?

Note when using

`auto.arima()`

, an AR(1) model of the following form will be fit (notice the \(b\)): \(x_t = b x_{t-1}+w_t\).`auto.arima()`

refers to this model \(x_t = x_{t-1}+w_t\), which is also AR(1) but with \(b=1\), as ARIMA(0,1,0). This says that the first difference of the data (that’s the 1 in the middle) is a ARMA(0,0) process (the 0s in the 1st and 3rd spots). So ARIMA(0,1,0) means this: \(x_t - x_{t-1} = w_t\).Create a random walk with drift time series using

`cumsum()`

and`rnorm()`

. Look at the`rnorm()`

help file (`?rnorm`

) to make sure you know what the arguments to the`rnorm()`

are.`<- cumsum(rnorm(100, 0.1, 1)) dat`

What is the order of this random walk written as ARIMA(p, d, q)? “what is the order” means “what is \(p\), \(d\), and \(q\). Model”order" is how

`arima()`

and`Arima()`

specify arima models.Fit that model using

`Arima()`

in the**forecast**package. You’ll need to specify the arguments`order`

and`include.drift`

. Use`?Arima`

to review what that function does if needed.Write out the equation for this random walk as a univariate state-space model. Notice that there is no observation error, but still write this as a state-space model.

Fit that model with

`MARSS()`

.How are the two estimates from

`Arima()`

and`MARSS()`

different?

The first-difference of

`dat`

used in the previous problem is:`<- diff(dat) diff.dat`

Use

`?diff`

to check what the`diff()`

function does.If \(x_t\) denotes a time series. What is the first difference of \(x\)? What is the second difference?

What is the \(\mathbf{x}\) model for

`diff.dat`

? Look at your answer to part (a) and the answer to part (e).Fit

`diff.dat`

using`Arima()`

. You’ll need to change the arguments`order`

and`include.mean`

.Fit with

`MARSS()`

. You will need to write the model for`diff.dat`

as a state-space model. If you’ve done this right, the estimated parameters using`Arima()`

and`MARSS()`

will now be the same.

This question should clue you into the fact that

`Arima()`

is not exactly fitting Equation (6.1). It’s very similar, but not quite written that way. By the way, Equation (6.1) is how structural time series observed with error are written (state-space models). To recover the estimates that a function like`arima()`

or`Arima()`

returns, you need to write your state-space model in a specific way (as seen above).`Arima()`

will also fit what it calls an “AR(1) with drift.” An AR(1) with drift is NOT this model: \[\begin{equation} x_t = b x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q) \tag{6.15} \end{equation}\] In the population dynamics literature, this equation is called the Gompertz model and is a type of density-dependent population model.Write R code to simulate Equation (6.15). Make \(b\) less than 1 and greater than 0. Set \(u\) and \(x_0\) to whatever you want. You can use a for loop.

Plot the trajectories and show that this model does not “drift” upward or downward. It fluctuates about a mean value.

Hold \(b\) constant and change \(u\). How do the trajectories change?

Hold \(u\) constant and change \(b\). Make sure to use a \(b\) close to 1 and another close to 0. How do the trajectories change?

Do 2 simulations each with the same \(w_t\). In one simulation, set \(u=1\) and in the other \(u=2\). For both simulations, set \(x_1 = u/(1-b)\). You can set \(b\) to whatever you want as long as \(0<b<1\). Plot the 2 trajectories on the same plot. What is different?

We will fit what

`Arima()`

calls “AR(1) with drift” models in the chapter on MARSS models with covariates.The

**MARSS**package includes a data set of gray whales. Load the data to use as follows:`library(MARSS) <- log(graywhales[, 2]) dat`

Fit a random walk with drift model observed with error to the data: \[\begin{equation} \begin{gathered} x_t = x_{t-1}+u+w_t \text{ where } w_t \sim \,\text{N}(0,q) \\ y_t = x_t+v_t \text{ where } v_t \sim \,\text{N}(0,r) \\ x_0 = a \end{gathered} \tag{6.16} \end{equation}\] \(y\) is the whale count in year \(t\). \(x\) is interpreted as the ‘true’ unknown population size that we are trying to estimate.

Fit this model with

`MARSS()`

Plot the estimated \(x\) as a line with the actual counts added as points. \(x\) is in

`fit$states`

. It is a matrix. To plot using`plot()`

, you will need to change it to a vector using`as.vector()`

or`fit$states[1,]`

Simulate 1000 sample gray whale populstion trajectories (the \(x\) in your model) using the estimated \(u\) and \(q\) starting at the estimated \(x\) in 1997. You can do this with a couple for loops or write something terse with

`cumsum()`

and`apply()`

.Using these simulated trajectories, what is your estimate of the probability that the grey whale population will be above 50,000 graywhales in 2007?

What kind(s) of uncertainty does your estimate above NOT include?

Fit the following models to the graywhales data using MARSS(). Assume \(b=1\).

Model 1 Process error only model with drift

Model 2 Process error only model without drift

Model 3 Process error with drift and observation error with observation error variance fixed = 0.05.

Model 4 Process error with drift and observation error with observation error variance estimated.

Compute the AICc’s for each model and likelihood or deviance (-2 * log likelihood). Where to find these? Try

`names(fit)`

.`logLik()`

is the standard R function to return log-likelihood from fits.Calculate a table of \(\Delta\text{AICc}\) values and AICc weights.

Show the acf of the model and state residuals for the best model. You will need a vector of the residuals to do this. If

`fit`

is the fit from a fit call like`fit = MARSS(dat)`

, you get the residuals using this code:`residuals(fit)$state.residuals[1, ] residuals(fit)$model.residuals[1, ]`

Do the acf’s suggest any problems?

Evaluate the predictive accuracy of forecasts using the

**forecast**package using the`airmiles`

dataset. Load the data to use as follows:`library(forecast) <- log(airmiles) dat <- length(dat) n <- dat[1:(n - 3)] training.dat <- dat[(n - 2):n] test.dat`

This will prepare the training data and set aside the last 3 data points for validation.

Fit the following four models using

`Arima()`

: ARIMA(0,0,0), ARIMA(1,0,0), ARIMA(0,0,1), ARIMA(1,0,1).Use

`forecast()`

to make 3 step ahead forecasts from each.Calculate the MASE statistic for each using the

`accuracy()`

function in the**forecast**package. Type`?accuracy`

to learn how to use this function.Present the results in a table.

Which model is best supported based on the MASE statistic?

The WhaleNet Archive of STOP Data has movement data on loggerhead turtles on the east coast of the US from ARGOS tags. The

**MARSS**package`loggerheadNoisy`

dataset is lat/lot data on eight individuals, however we have corrupted this data severely by adding random errors in order to create a “bad tag” problem (very noisy). Use`head(loggerheadNoisy)`

to get an idea of the data. Then load the data on one turtle, MaryLee. MARSS needs time across the columns to you need to use transpose the data (as shown).`<- "MaryLee" turtlename <- loggerheadNoisy[which(loggerheadNoisy$turtle == turtlename), dat 5:6] <- t(dat) dat`

Plot MaryLee’s locations (as a line not dots). Put the latitude locations on the y-axis and the longitude on the y-axis. You can use

`rownames(dat)`

to see which is in which row. You can just use`plot()`

for the homework. But if you want, you can look at the MARSS Manual chapter on animal movement to see how to plot the turtle locations on a map using the**maps**package.Analyze the data with a state-space model (movement observed with error) using

`<- MARSS(dat) fit0`

Look at the output from the above MARSS call. What is the meaning of the parameters output from MARSS in terms of turtle movement? What exactly is the \(u\) estimate for example? Look at the data and think about the model you fit.

What assumption did the default MARSS model make about observation error and process error? What does that assumption mean in terms of how steps in the N-S and E-W directions are related? What does that assumption mean in terms of our assumption about the latitudal and longitudinal observation errors?

Does MaryLee move faster in the latitude direction versus longitude direction?

Add MaryLee’s estimated “true” positions to your plot of her locations. You can use

`lines(x, y, col="red")`

(with x and y replaced with your x and y data). The true position is the “state.” This is in the states element of an output from MARSS`fit0$states`

.Fit the following models with different assumptions regarding the movement in the lat/lon direction:

Lat/lon movements are independent but the variance is the same

Lat/lon movements are correlated and lat/lon variances are different

Lat/lon movements are correlated and the lat/lon variances are the same.

You only need to change

`Q`

specification. Your MARSS call will now look like the following with`...`

replaced with your`Q`

specification.`<- MARSS(dat, list(Q = ...)) fit1`

Plot your state residuals (true location residuals). What are the problems? Discuss in reference to your plot of the location data. Here is how to get state residuals from

`MARSS()`

output:`<- residuals(fit0)$state.residuals resids`

The lon residuals are in row 1 and lat residuals are in row 2 (same order as the data).