5.8 Estimating the ARMA orders

We will use the auto.arima() function in forecast. This function will estimate the level of differencing needed to make our data stationary and estimate the AR and MA orders using AICc (or BIC if we choose).

5.8.1 Example: model selection for AR(2) data

forecast::auto.arima(ar2)
Series: ar2 
ARIMA(2,0,2) with non-zero mean 

Coefficients:
         ar1     ar2     ma1      ma2    mean
      0.2795  0.5938  0.4861  -0.0943  0.9553
s.e.  1.1261  1.0413  1.1284   0.1887  0.3398

sigma^2 estimated as 0.9848:  log likelihood=-1409.57
AIC=2831.15   AICc=2831.23   BIC=2860.59

Works with missing data too though might not estimate very close to the true model form.

forecast::auto.arima(ar2miss)
Series: ar2miss 
ARIMA(0,1,0) 

sigma^2 estimated as 1.066:  log likelihood=-82.07
AIC=166.15   AICc=166.19   BIC=168.72

5.8.2 Fitting to 100 simulated data sets

Let’s fit to 100 simulated data sets and see how often the true (generating) model form is selected.

save.fits <- rep(NA, 100)
for (i in 1:100) {
    a2 <- arima.sim(n = 100, model = list(ar = c(0.8, 0.1)))
    fit <- auto.arima(a2, seasonal = FALSE, max.d = 0, max.q = 0)
    save.fits[i] <- paste0(fit$arma[1], "-", fit$arma[2])
}
table(save.fits)
save.fits
1-0 2-0 3-0 
 71  22   7 

auto.arima() uses AICc for selection by default. You can change that to AIC or BIC using ic="aic" or ic="bic".

Repeat the simulation using AIC and BIC to see how the choice of the information criteria affects the model that is selected.

5.8.3 Trace=TRUE

We can set Trace=TRUE to see what models auto.arima() fit.

forecast::auto.arima(ar2, trace = TRUE)

 Fitting models using approximations to speed things up...

 ARIMA(2,0,2) with non-zero mean : 2824.88
 ARIMA(0,0,0) with non-zero mean : 4430.868
 ARIMA(1,0,0) with non-zero mean : 2842.785
 ARIMA(0,0,1) with non-zero mean : 3690.512
 ARIMA(0,0,0) with zero mean     : 4602.31
 ARIMA(1,0,2) with non-zero mean : 2827.422
 ARIMA(2,0,1) with non-zero mean : 2825.235
 ARIMA(3,0,2) with non-zero mean : 2830.176
 ARIMA(2,0,3) with non-zero mean : 2826.503
 ARIMA(1,0,1) with non-zero mean : 2825.438
 ARIMA(1,0,3) with non-zero mean : 2829.358
 ARIMA(3,0,1) with non-zero mean : Inf
 ARIMA(3,0,3) with non-zero mean : 2825.766
 ARIMA(2,0,2) with zero mean     : 2829.536

 Now re-fitting the best model(s) without approximations...

 ARIMA(2,0,2) with non-zero mean : 2831.232

 Best model: ARIMA(2,0,2) with non-zero mean 
Series: ar2 
ARIMA(2,0,2) with non-zero mean 

Coefficients:
         ar1     ar2     ma1      ma2    mean
      0.2795  0.5938  0.4861  -0.0943  0.9553
s.e.  1.1261  1.0413  1.1284   0.1887  0.3398

sigma^2 estimated as 0.9848:  log likelihood=-1409.57
AIC=2831.15   AICc=2831.23   BIC=2860.59

5.8.4 stepwise=FALSE

We can set stepwise=FALSE to use an exhaustive search. The model may be different than the result from the non-exhaustive search.

forecast::auto.arima(ar2, trace = TRUE, stepwise = FALSE)

 Fitting models using approximations to speed things up...

 ARIMA(0,0,0) with zero mean     : 4602.31
 ARIMA(0,0,0) with non-zero mean : 4430.868
 ARIMA(0,0,1) with zero mean     : 3815.931
 ARIMA(0,0,1) with non-zero mean : 3690.512
 ARIMA(0,0,2) with zero mean     : 3425.037
 ARIMA(0,0,2) with non-zero mean : 3334.754
 ARIMA(0,0,3) with zero mean     : 3239.347
 ARIMA(0,0,3) with non-zero mean : 3170.541
 ARIMA(0,0,4) with zero mean     : 3114.265
 ARIMA(0,0,4) with non-zero mean : 3059.938
 ARIMA(0,0,5) with zero mean     : 3042.136
 ARIMA(0,0,5) with non-zero mean : 2998.531
 ARIMA(1,0,0) with zero mean     : 2850.655
 ARIMA(1,0,0) with non-zero mean : 2842.785
 ARIMA(1,0,1) with zero mean     : 2830.652
 ARIMA(1,0,1) with non-zero mean : 2825.438
 ARIMA(1,0,2) with zero mean     : 2832.668
 ARIMA(1,0,2) with non-zero mean : 2827.422
 ARIMA(1,0,3) with zero mean     : 2834.675
 ARIMA(1,0,3) with non-zero mean : 2829.358
 ARIMA(1,0,4) with zero mean     : 2835.539
 ARIMA(1,0,4) with non-zero mean : 2829.825
 ARIMA(2,0,0) with zero mean     : 2828.987
 ARIMA(2,0,0) with non-zero mean : 2823.774
 ARIMA(2,0,1) with zero mean     : 2829.952
 ARIMA(2,0,1) with non-zero mean : 2825.235
 ARIMA(2,0,2) with zero mean     : 2829.536
 ARIMA(2,0,2) with non-zero mean : 2824.88
 ARIMA(2,0,3) with zero mean     : 2831.461
 ARIMA(2,0,3) with non-zero mean : 2826.503
 ARIMA(3,0,0) with zero mean     : 2831.057
 ARIMA(3,0,0) with non-zero mean : 2826.236
 ARIMA(3,0,1) with zero mean     : Inf
 ARIMA(3,0,1) with non-zero mean : Inf
 ARIMA(3,0,2) with zero mean     : 2834.788
 ARIMA(3,0,2) with non-zero mean : 2830.176
 ARIMA(4,0,0) with zero mean     : 2833.323
 ARIMA(4,0,0) with non-zero mean : 2828.759
 ARIMA(4,0,1) with zero mean     : 2827.798
 ARIMA(4,0,1) with non-zero mean : 2823.853
 ARIMA(5,0,0) with zero mean     : 2835.315
 ARIMA(5,0,0) with non-zero mean : 2830.501

 Now re-fitting the best model(s) without approximations...




 Best model: ARIMA(2,0,0) with non-zero mean 
Series: ar2 
ARIMA(2,0,0) with non-zero mean 

Coefficients:
         ar1     ar2    mean
      0.7684  0.1387  0.9561
s.e.  0.0314  0.0314  0.3332

sigma^2 estimated as 0.9832:  log likelihood=-1409.77
AIC=2827.54   AICc=2827.58   BIC=2847.17

5.8.5 Fit to the anchovy data

fit <- auto.arima(anchovyts)
fit
Series: anchovyts 
ARIMA(0,1,1) with drift 

Coefficients:
          ma1   drift
      -0.6685  0.0542
s.e.   0.1977  0.0142

sigma^2 estimated as 0.04037:  log likelihood=5.39
AIC=-4.79   AICc=-3.65   BIC=-1.13

Note arima() writes a MA model like:

\[x_t = e_t + b_1 e_{t-1} + b_2 e_{t-2}\]

while many authors use this notation:

\[x_t = e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2}\]

so the MA parameters reported by auto.arima() will be NEGATIVE of that reported in Stergiou and Christou (1996) who analyze these same data. Note, in Stergiou and Christou, the model is written in backshift notation on page 112. To see the model as the equation above, I translated from backshift to non-backshift notation.