## 5.8 Estimating the ARMA orders

We will use the auto.arima() function in forecast. This function will estimate the level of differencing needed to make our data stationary and estimate the AR and MA orders using AICc (or BIC if we choose).

### 5.8.1 Example: model selection for AR(2) data

forecast::auto.arima(ar2)
Series: ar2
ARIMA(2,0,2) with non-zero mean

Coefficients:
ar1     ar2     ma1      ma2    mean
0.2795  0.5938  0.4861  -0.0943  0.9553
s.e.  1.1261  1.0413  1.1284   0.1887  0.3398

sigma^2 estimated as 0.9848:  log likelihood=-1409.57
AIC=2831.15   AICc=2831.23   BIC=2860.59

Works with missing data too though might not estimate very close to the true model form.

forecast::auto.arima(ar2miss)
Series: ar2miss
ARIMA(0,1,0)

sigma^2 estimated as 1.066:  log likelihood=-82.07
AIC=166.15   AICc=166.19   BIC=168.72

### 5.8.2 Fitting to 100 simulated data sets

Let’s fit to 100 simulated data sets and see how often the true (generating) model form is selected.

save.fits <- rep(NA, 100)
for (i in 1:100) {
a2 <- arima.sim(n = 100, model = list(ar = c(0.8, 0.1)))
fit <- auto.arima(a2, seasonal = FALSE, max.d = 0, max.q = 0)
save.fits[i] <- paste0(fit$arma[1], "-", fit$arma[2])
}
table(save.fits)
save.fits
1-0 2-0 3-0
71  22   7 

auto.arima() uses AICc for selection by default. You can change that to AIC or BIC using ic="aic" or ic="bic".

Repeat the simulation using AIC and BIC to see how the choice of the information criteria affects the model that is selected.

### 5.8.3 Trace=TRUE

We can set Trace=TRUE to see what models auto.arima() fit.

forecast::auto.arima(ar2, trace = TRUE)

Fitting models using approximations to speed things up...

ARIMA(2,0,2) with non-zero mean : 2824.88
ARIMA(0,0,0) with non-zero mean : 4430.868
ARIMA(1,0,0) with non-zero mean : 2842.785
ARIMA(0,0,1) with non-zero mean : 3690.512
ARIMA(0,0,0) with zero mean     : 4602.31
ARIMA(1,0,2) with non-zero mean : 2827.422
ARIMA(2,0,1) with non-zero mean : 2825.235
ARIMA(3,0,2) with non-zero mean : 2830.176
ARIMA(2,0,3) with non-zero mean : 2826.503
ARIMA(1,0,1) with non-zero mean : 2825.438
ARIMA(1,0,3) with non-zero mean : 2829.358
ARIMA(3,0,1) with non-zero mean : Inf
ARIMA(3,0,3) with non-zero mean : 2825.766
ARIMA(2,0,2) with zero mean     : 2829.536

Now re-fitting the best model(s) without approximations...

ARIMA(2,0,2) with non-zero mean : 2831.232

Best model: ARIMA(2,0,2) with non-zero mean 
Series: ar2
ARIMA(2,0,2) with non-zero mean

Coefficients:
ar1     ar2     ma1      ma2    mean
0.2795  0.5938  0.4861  -0.0943  0.9553
s.e.  1.1261  1.0413  1.1284   0.1887  0.3398

sigma^2 estimated as 0.9848:  log likelihood=-1409.57
AIC=2831.15   AICc=2831.23   BIC=2860.59

### 5.8.4 stepwise=FALSE

We can set stepwise=FALSE to use an exhaustive search. The model may be different than the result from the non-exhaustive search.

forecast::auto.arima(ar2, trace = TRUE, stepwise = FALSE)

Fitting models using approximations to speed things up...

ARIMA(0,0,0) with zero mean     : 4602.31
ARIMA(0,0,0) with non-zero mean : 4430.868
ARIMA(0,0,1) with zero mean     : 3815.931
ARIMA(0,0,1) with non-zero mean : 3690.512
ARIMA(0,0,2) with zero mean     : 3425.037
ARIMA(0,0,2) with non-zero mean : 3334.754
ARIMA(0,0,3) with zero mean     : 3239.347
ARIMA(0,0,3) with non-zero mean : 3170.541
ARIMA(0,0,4) with zero mean     : 3114.265
ARIMA(0,0,4) with non-zero mean : 3059.938
ARIMA(0,0,5) with zero mean     : 3042.136
ARIMA(0,0,5) with non-zero mean : 2998.531
ARIMA(1,0,0) with zero mean     : 2850.655
ARIMA(1,0,0) with non-zero mean : 2842.785
ARIMA(1,0,1) with zero mean     : 2830.652
ARIMA(1,0,1) with non-zero mean : 2825.438
ARIMA(1,0,2) with zero mean     : 2832.668
ARIMA(1,0,2) with non-zero mean : 2827.422
ARIMA(1,0,3) with zero mean     : 2834.675
ARIMA(1,0,3) with non-zero mean : 2829.358
ARIMA(1,0,4) with zero mean     : 2835.539
ARIMA(1,0,4) with non-zero mean : 2829.825
ARIMA(2,0,0) with zero mean     : 2828.987
ARIMA(2,0,0) with non-zero mean : 2823.774
ARIMA(2,0,1) with zero mean     : 2829.952
ARIMA(2,0,1) with non-zero mean : 2825.235
ARIMA(2,0,2) with zero mean     : 2829.536
ARIMA(2,0,2) with non-zero mean : 2824.88
ARIMA(2,0,3) with zero mean     : 2831.461
ARIMA(2,0,3) with non-zero mean : 2826.503
ARIMA(3,0,0) with zero mean     : 2831.057
ARIMA(3,0,0) with non-zero mean : 2826.236
ARIMA(3,0,1) with zero mean     : Inf
ARIMA(3,0,1) with non-zero mean : Inf
ARIMA(3,0,2) with zero mean     : 2834.788
ARIMA(3,0,2) with non-zero mean : 2830.176
ARIMA(4,0,0) with zero mean     : 2833.323
ARIMA(4,0,0) with non-zero mean : 2828.759
ARIMA(4,0,1) with zero mean     : 2827.798
ARIMA(4,0,1) with non-zero mean : 2823.853
ARIMA(5,0,0) with zero mean     : 2835.315
ARIMA(5,0,0) with non-zero mean : 2830.501

Now re-fitting the best model(s) without approximations...

Best model: ARIMA(2,0,0) with non-zero mean 
Series: ar2
ARIMA(2,0,0) with non-zero mean

Coefficients:
ar1     ar2    mean
0.7684  0.1387  0.9561
s.e.  0.0314  0.0314  0.3332

sigma^2 estimated as 0.9832:  log likelihood=-1409.77
AIC=2827.54   AICc=2827.58   BIC=2847.17

### 5.8.5 Fit to the anchovy data

fit <- auto.arima(anchovyts)
fit
Series: anchovyts
ARIMA(0,1,1) with drift

Coefficients:
ma1   drift
-0.6685  0.0542
s.e.   0.1977  0.0142

sigma^2 estimated as 0.04037:  log likelihood=5.39
AIC=-4.79   AICc=-3.65   BIC=-1.13

Note arima() writes a MA model like:

$x_t = e_t + b_1 e_{t-1} + b_2 e_{t-2}$

while many authors use this notation:

$x_t = e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2}$

so the MA parameters reported by auto.arima() will be NEGATIVE of that reported in Stergiou and Christou (1996) who analyze these same data. Note, in Stergiou and Christou, the model is written in backshift notation on page 112. To see the model as the equation above, I translated from backshift to non-backshift notation.