## 4.9 Autoregressive moving-average (ARMA) models

ARMA(\(p,q\)) models have a rich history in the time series literature, but they are not nearly as common in ecology as plain AR(\(p\)) models. As we discussed in lecture, both the ACF and PACF are important tools when trying to identify the appropriate order of \(p\) and \(q\). Here we will see how to simulate time series from AR(\(p\)), MA(\(q\)), and ARMA(\(p,q\)) processes, as well as fit time series models to data based on insights gathered from the ACF and PACF.

We can write an ARMA(\(p,q\)) as a mixture of AR(\(p\)) and MA(\(q\)) models, such that

\[\begin{equation} \tag{4.24} x_t = \phi_1x_{t-1} + \phi_2x_{t-2} + \dots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \dots + \theta_q w_{t-q}, \end{equation}\]

and the \(w_t\) are white noise.

### 4.9.1 Fitting ARMA(\(p,q\)) models with `arima()`

We have already seen how to simulate AR(\(p\)) and MA(\(q\)) models with `arima.sim()`

; the same concepts apply to ARMA(\(p,q\)) models and therefore we will not do that here. Instead, we will move on to fitting ARMA(\(p,q\)) models when we only have a realization of the process (*i.e.*, data) and do not know the underlying parameters that generated it.

The function `arima()`

accepts a number of arguments, but two of them are most important:

`x`

: a univariate time series`order`

: a vector of length 3 specifying the order of ARIMA(p,d,q) model

In addition, note that by default `arima()`

will estimate an underlying mean of the time series unless \(d>0\). For example, an AR(1) process with mean \(\mu\) would be written

\[\begin{equation} \tag{4.25} x_t = \mu + \phi (x_{t-1} - \mu) + w_t. \end{equation}\]

If you know for a fact that the time series data have a mean of zero (*e.g.*, you already subtracted the mean from them), you should include the argument `include.mean = FALSE`

, which is set to `TRUE`

by default. Note that ignoring and not estimating a mean in ARMA(\(p,q\)) models when one exists will bias the estimates of all other parameters.

Let’s see an example of how `arima()`

works. First we’ll simulate an ARMA(2,2) model and then estimate the parameters to see how well we can recover them. In addition, we’ll add in a constant to create a non-zero mean, which `arima()`

reports as `intercept`

in its output.

```
set.seed(123)
## ARMA(2,2) description for arim.sim()
<- list(order = c(2, 0, 2), ar = c(-0.7, 0.2), ma = c(0.7,
ARMA22 0.2))
## mean of process
<- 5
mu ## simulated process (+ mean)
<- arima.sim(n = 10000, model = ARMA22) + mu
ARMA_sim ## estimate parameters
arima(x = ARMA_sim, order = c(2, 0, 2))
```

```
Call:
arima(x = ARMA_sim, order = c(2, 0, 2))
Coefficients:
ar1 ar2 ma1 ma2 intercept
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
sigma^2 estimated as 0.9972: log likelihood = -14175.92, aic = 28363.84
```

It looks like we were pretty good at estimating the true parameters, but our sample size was admittedly quite large; the estimate of the variance of the process errors is reported as `sigma^2`

below the other coefficients. As an exercise, try decreasing the length of time series in the `arima.sim()`

call above from 10,000 to something like 100 and see what effect it has on the parameter estimates.

### 4.9.2 Searching over model orders

In an ideal situation, you could examine the ACF and PACF of the time series of interest and immediately decipher what orders of \(p\) and \(q\) must have generated the data, but that doesn’t always work in practice. Instead, we are often left with the task of searching over several possible model forms and seeing which of them provides the most parsimonious fit to the data. There are two easy ways to do this for ARIMA models in R. The first is to write a little script that loops ove the possible dimensions of \(p\) and \(q\). Let’s try that for the process we simulated above and search over orders of \(p\) and \(q\) from 0-3 (it will take a few moments to run and will likely report an error about a “`possible convergence problem`

,” which you can ignore).

```
## empty list to store model fits
<- list()
ARMA_res ## set counter
<- 1
cc ## loop over AR
for (p in 0:3) {
## loop over MA
for (q in 0:3) {
<- arima(x = ARMA_sim, order = c(p, 0,
ARMA_res[[cc]]
q))<- cc + 1
cc
} }
```

```
Warning in arima(x = ARMA_sim, order = c(p, 0, q)): possible convergence
problem: optim gave code = 1
```

```
## get AIC values for model evaluation
<- sapply(ARMA_res, function(x) x$aic)
ARMA_AIC ## model with lowest AIC is the best
which(ARMA_AIC == min(ARMA_AIC))]] ARMA_res[[
```

```
Call:
arima(x = ARMA_sim, order = c(p, 0, q))
Coefficients:
ar1 ar2 ma1 ma2 intercept
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
sigma^2 estimated as 0.9972: log likelihood = -14175.92, aic = 28363.84
```

It looks like our search worked, so let’s look at the other method for fitting ARIMA models. The `auto.arima()`

function in the **forecast** package will conduct an automatic search over all possible orders of ARIMA models that you specify. For details, type `?auto.arima`

after loading the package. Let’s repeat our search using the same criteria.

```
## find best ARMA(p,q) model
auto.arima(ARMA_sim, start.p = 0, max.p = 3, start.q = 0, max.q = 3)
```

```
Series: ARMA_sim
ARIMA(2,0,2) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 mean
-0.7079 0.1924 0.6912 0.2001 4.9975
s.e. 0.0291 0.0284 0.0289 0.0236 0.0125
sigma^2 estimated as 0.9977: log likelihood=-14175.92
AIC=28363.84 AICc=28363.84 BIC=28407.1
```

We get the same results with an increase in speed and less coding, which is nice. If you want to see the form for each of the models checked by `auto.arima()`

and their associated AIC values, include the argument `trace = 1`

.