ARIMA models
- An approach for fitting time series data by differencing the data to produce a new stationary time series
- Works because any stationary time series can be modeled as a ARMA process (Wold Decomposition)
19 Jan 2021
ARIMA models
ARIMA models are one approach for fitting non-stationary time series data.
Other approaches (and what we’ll be mostly cover in this class)
AR(1) data: \(x_t = 0.8 x_{t-1} + e_t\)
Trend: \(z_t = f(t, t^2, t^3)\), some 3rd order polynomial
Data: \(y_t = z_t + x_t\), Trend + AR(1)
t1 <- 1:90 fit <- forecast::auto.arima(y[t1]) fit
## Series: y[t1] ## ARIMA(0,1,0) with drift ## ## Coefficients: ## drift ## 0.2212 ## s.e. 0.1019 ## ## sigma^2 estimated as 0.9339: log likelihood=-122.74 ## AIC=249.47 AICc=249.61 BIC=254.45
plot(forecast::forecast(fit, h=10)) t2 <- 91:100 points(t2, y[t2])
Use xreg
to do this. Fits \(y_t = \alpha + \beta_1 t + \beta_2 t^2 + \beta_3 t^3 + e_t\)
t1=1:90 xreg <- cbind(t1=t, t2=t^2, t3=t^3) # use poly(t, 3) in practice fit <- forecast::auto.arima(y[t1], xreg=xreg[t1,], d=0) fit
## Series: y[t1] ## Regression with ARIMA(1,0,0) errors ## ## Coefficients: ## ar1 intercept t1 t2 t3 ## 0.7634 -13.1034 0.8063 -0.0155 1e-04 ## s.e. 0.0712 1.2643 0.1097 0.0024 0e+00 ## ## sigma^2 estimated as 0.7955: log likelihood=-115.28 ## AIC=242.55 AICc=243.56 BIC=257.55
t2=91:100 plot(forecast::forecast(fit, h=10, xreg=xreg[t2,])) points(t2, y[t2])
It has WA and OR data.
library(atsalibrary) head(chinook.month)
## Year Month Species State log.metric.tons metric.tons Value ## 1 1990 Jan Chinook WA 3.397858 29.9 108515 ## 2 1990 Feb Chinook WA 3.808882 45.1 308707 ## 3 1990 Mar Chinook WA 3.511545 33.5 200535 ## 4 1990 Apr Chinook WA 4.248495 70.0 431136 ## 5 1990 May Chinook WA 5.200705 181.4 859988 ## 6 1990 Jun Chinook WA 4.371976 79.2 419635
The data are monthly and start in January 1990. To make this into a ts object do
chinookts <- ts(chinook.month$log.metric.tons[chinook.month$State=="WA"], start=c(1990,1), frequency=12)
start
is the year and month and frequency is the number of months in the year.
Use ?ts
to see more examples of how to set up ts objects.
A seasonal difference is \(x_t\) minus \(x_{t-s}\), e.g. chinook data January 1990 - chinook data January 1989.
\[z_t = x_t - x_{t-12}\]
This will remove the seasonality.
dts <- diff(chinookts, lag=12) plot(dts)
I = differencing to produce stationarity
SARMA = AR & MA with seasonal lags.
Basic structure of a SARIMA(1) model is AR(non-seasonal) + AR(seasonal)
\[z_t = \phi_1 z_{t-1} + \Phi_1 z_{t-12} - \phi_1\Phi_1 z_{t-13}\]
So feb 90 - feb 89 (\(z_t\)) is, potentially, correlated with jan 90 - jan 89 (\(z_{t-1}\)) and feb 89 - feb 88 (\(z_{t-12}\))
ARIMA (p,d,q)(ps,ds,qs)s
ARIMA (non-seasonal part)(seasonal part)Frequency
ARIMA (non-seasonal) means \(z_t\) correlated with \(z_{t-1}\)
ARIMA (seasonal) means \(z_t\) correlated with \(z_{t-s}\)
ARIMA (1,0,0)(1,1,0)[12]
\[z_t = x_t - x_{t-12}\]
\[z_t = \phi_1 z_{t-1} + \Phi_1 z_{t-12} - \phi_1\Phi_1 z_{t-13} + w_t\]
\[w_t = e_t\]
ARIMA(0,0,0)(0,1,0)[12] with drift
\[z_t = x_t - x_{t-12} + m\]
No AR parts. \(z_t = w_t\)
No MA part, so \(w_t = e_t\) (white noise)
January 1990 = January 1989 + constant mean
ARIMA(0, 1, 1)(0, 1, 1)[12]
Figure out \(z_t\).
\[z_t = (x_t - x_{t-12}) - (x_{t-1} - x_{t-13})\]
Write out the AR parts with \(z_t\). No AR part.
\[z_t = w_t\]
Write out the MA parts, the \(w_t\).
\[w_t = e_t - \theta_1 e_{t-1} - \Theta_1 e_{t-12} + \theta_1\Theta_1 e_{t-13}\]
auto.arima()
for seasonal tsauto.arima()
will recognize that our data has season and fit a seasonal ARIMA model to our data by default. We will fit to 1990 to 2000 data.
dat <- window(chinookts, start=c(1990,1), end=c(2000,12)) fit <- forecast::auto.arima(dat) fit
## Series: dat ## ARIMA(1,0,0)(0,1,2)[12] with drift ## ## Coefficients: ## ar1 sma1 sma2 drift ## 0.3222 -0.2499 -0.2151 -0.0246 ## s.e. 0.1077 0.1253 0.1348 0.0064 ## ## sigma^2 estimated as 0.7523: log likelihood=-126.74 ## AIC=263.49 AICc=264.08 BIC=276.9
ARIMA(1,0,0)(0,1,2)[12] with drift
fr <- forecast::forecast(fit, h=10) plot(fr)
Basic steps for identifying a seasonal model. forecast automates most of this.
Make your data into a ts object with frequency specified.
Plot your data. Look for trend, seasonality and random walks.
Use differencing to remove season and trend.
Examine the ACF and PACF of the differenced data.
Estimate likely models and compare with model selection criteria (or cross-validation). Use TRACE=TRUE
Do residual checks