4.9 Autoregressive moving-average (ARMA) models

ARMA(\(p,q\)) models have a rich history in the time series literature, but they are not nearly as common in ecology as plain AR(\(p\)) models. As we discussed in lecture, both the ACF and PACF are important tools when trying to identify the appropriate order of \(p\) and \(q\). Here we will see how to simulate time series from AR(\(p\)), MA(\(q\)), and ARMA(\(p,q\)) processes, as well as fit time series models to data based on insights gathered from the ACF and PACF.

We can write an ARMA(\(p,q\)) as a mixture of AR(\(p\)) and MA(\(q\)) models, such that

\[\begin{equation} \tag{4.24} x_t = \phi_1x_{t-1} + \phi_2x_{t-2} + \dots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \dots + \theta_q w_{t-q}, \end{equation}\]

and the \(w_t\) are white noise.

4.9.1 Fitting ARMA(\(p,q\)) models with `arima()`

We have already seen how to simulate AR(\(p\)) and MA(\(q\)) models with arima.sim(); the same concepts apply to ARMA(\(p,q\)) models and therefore we will not do that here. Instead, we will move on to fitting ARMA(\(p,q\)) models when we only have a realization of the process (i.e., data) and do not know the underlying parameters that generated it.

The function arima() accepts a number of arguments, but two of them are most important:

x: a univariate time series
order: a vector of length 3 specifying the order of ARIMA(p,d,q) model

In addition, note that by default arima() will estimate an underlying mean of the time series unless \(d>0\). For example, an AR(1) process with mean \(\mu\) would be written

\[\begin{equation} \tag{4.25} x_t = \mu + \phi (x_{t-1} - \mu) + w_t. \end{equation}\]

If you know for a fact that the time series data have a mean of zero (e.g., you already subtracted the mean from them), you should include the argument include.mean = FALSE, which is set to TRUE by default. Note that ignoring and not estimating a mean in ARMA(\(p,q\)) models when one exists will bias the estimates of all other parameters.

Let’s see an example of how arima() works. First we’ll simulate an ARMA(2,2) model and then estimate the parameters to see how well we can recover them. In addition, we’ll add in a constant to create a non-zero mean, which arima() reports as intercept in its output.

set.seed(123)
## ARMA(2,2) description for arim.sim()
ARMA22 <- list(order = c(2, 0, 2), ar = c(-0.7, 0.2), ma = c(0.7, 
    0.2))
## mean of process
mu <- 5
## simulated process (+ mean)
ARMA_sim <- arima.sim(n = 10000, model = ARMA22) + mu
## estimate parameters
arima(x = ARMA_sim, order = c(2, 0, 2))


Call:
arima(x = ARMA_sim, order = c(2, 0, 2))

Coefficients:
          ar1     ar2     ma1     ma2  intercept
      -0.7079  0.1924  0.6912  0.2001     4.9975
s.e.   0.0291  0.0284  0.0289  0.0236     0.0125

sigma^2 estimated as 0.9972:  log likelihood = -14175.92,  aic = 28363.84

It looks like we were pretty good at estimating the true parameters, but our sample size was admittedly quite large; the estimate of the variance of the process errors is reported as sigma^2 below the other coefficients. As an exercise, try decreasing the length of time series in the arima.sim() call above from 10,000 to something like 100 and see what effect it has on the parameter estimates.

4.9.2 Searching over model orders

In an ideal situation, you could examine the ACF and PACF of the time series of interest and immediately decipher what orders of \(p\) and \(q\) must have generated the data, but that doesn’t always work in practice. Instead, we are often left with the task of searching over several possible model forms and seeing which of them provides the most parsimonious fit to the data. There are two easy ways to do this for ARIMA models in R. The first is to write a little script that loops ove the possible dimensions of \(p\) and \(q\). Let’s try that for the process we simulated above and search over orders of \(p\) and \(q\) from 0-3 (it will take a few moments to run and will likely report an error about a “possible convergence problem,” which you can ignore).

## empty list to store model fits
ARMA_res <- list()
## set counter
cc <- 1
## loop over AR
for (p in 0:3) {
    ## loop over MA
    for (q in 0:3) {
        ARMA_res[[cc]] <- arima(x = ARMA_sim, order = c(p, 0, 
            q))
        cc <- cc + 1
    }
}

Warning in arima(x = ARMA_sim, order = c(p, 0, q)): possible convergence
problem: optim gave code = 1

## get AIC values for model evaluation
ARMA_AIC <- sapply(ARMA_res, function(x) x$aic)
## model with lowest AIC is the best
ARMA_res[[which(ARMA_AIC == min(ARMA_AIC))]]


Call:
arima(x = ARMA_sim, order = c(p, 0, q))

Coefficients:
          ar1     ar2     ma1     ma2  intercept
      -0.7079  0.1924  0.6912  0.2001     4.9975
s.e.   0.0291  0.0284  0.0289  0.0236     0.0125

sigma^2 estimated as 0.9972:  log likelihood = -14175.92,  aic = 28363.84

It looks like our search worked, so let’s look at the other method for fitting ARIMA models. The auto.arima() function in the forecast package will conduct an automatic search over all possible orders of ARIMA models that you specify. For details, type ?auto.arima after loading the package. Let’s repeat our search using the same criteria.

## find best ARMA(p,q) model
auto.arima(ARMA_sim, start.p = 0, max.p = 3, start.q = 0, max.q = 3)

Series: ARMA_sim 
ARIMA(2,0,2) with non-zero mean 

Coefficients:
          ar1     ar2     ma1     ma2    mean
      -0.7079  0.1924  0.6912  0.2001  4.9975
s.e.   0.0291  0.0284  0.0289  0.0236  0.0125

sigma^2 estimated as 0.9977:  log likelihood=-14175.92
AIC=28363.84   AICc=28363.84   BIC=28407.1

We get the same results with an increase in speed and less coding, which is nice. If you want to see the form for each of the models checked by auto.arima() and their associated AIC values, include the argument trace = 1.