Let’s imagine that we can describe our data as a combination of the mean trend and error.
\[x_t = m_t + e_t\]
What if our main goal is to predict \(x_{t+1}\)? Note we (fisheries biologists and ecologists) often want to know \(m_t\). In fact, that is often our main and sole goal. But let’s say our goal is to predict \(x_{t+1}\).
How do we do this?
We estimate \(m_t\) from the data. Once we have an estimate of \(m_t\), we have an estimate of \(e_t\). We can model that error (think AR and MA) to help us predict future \(x_t\) from \(x_{t-1}\).
Mark introduced filtering (moving averages) as one way to estimate \(m_t\). We will not be doing that in the rest course. Instead, we will have a model for the \(m_t\) and we will fit \[x_t = m_t + e_t\] directly. We will get an estimate of the model that produced \(m_t\). We will also estimate the structure (think AR, MA) of \(w_t\). directly means that we not be transforming our data with differencing in order to produce a statistic (difference) that is stationary. The \(x_t\) and \(m_t\) models will have a biological interpretation.
So for example if our model is
\[m_t = m_{t-1}+\mu \\ x_t = m_t + w_t.\] Then we fit that model without taking any differences. Note the above is a trivial example as it is just a linear deterministic trend but gives you the idea.
The Box-Jenkins approach is different. In this approach to predicting \(x_{t+1}\), we remove \(m_t\) from our data using differencing. We don’t have to worry about a model for \(m_t\) because we have removed it!
And we don’t actually need it to predict \(x_{t+1}\) since we have \(x_{t}\), \(x_{t-1}\), etc. We model \(\Delta^d x_t\) and then predict \(\Delta^d x_{t+1}\). \(\Delta^d x_t\) is a d-th difference. Once we have that we can predict \(x_{t+1}\) by using \(x_t\), \(x_{t-1}\), etc with some algebra.
The error structure of \(\Delta^d x_{t+1}\) is NOT the same as \(e_t\). \[\Delta^d x_{t} = \phi_1\Delta^d x_{t-1} + \phi_2\Delta^d x_{t-2} + \dots + z_t\]
\(z_t\) is the error of the differences. And the \(\phi\) in the AR part are for the differences not the original \(x_t\). So it is more difficult to make a biological interpretation of the ARMA model. But remember, the objective was to predict \(x_{t+1}\) not to fit a model with a biological interpretation.
\(m_t\) is linear and deterministic.
\[x_t = \mu t + w_t\]
When we take the first difference, we get \[\Delta x_t = \mu (t - (t-1)) + w_t - w_{t-1} = \mu + w_t - w_{t-1}\] So you expect that we should estimate a ARIMA(0,1,1) with drift. For some reason auto.arima()
is returning Inf for the AICc for ARIMA(0,1,1) with drift although Arima()
returns it. Turn on trace=TRUE
to see the problem.
forecast::auto.arima(xt, stepwise=FALSE)
## Series: xt
## ARIMA(0,1,3)
##
## Coefficients:
## ma1 ma2 ma3
## -0.7878 -0.1612 0.2303
## s.e. 0.1036 0.1496 0.1103
##
## sigma^2 estimated as 0.1014: log likelihood=-26.19
## AIC=60.39 AICc=60.81 BIC=70.77
But we can recover the parameters with Arima()
. Notice that the AICc is much smaller than for the model chosen by auto.arima()
.
forecast::Arima(xt,order=c(0,1,1),include.drift=TRUE)
## Series: xt
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## -1.0000 0.0208
## s.e. 0.0295 0.0010
##
## sigma^2 estimated as 0.0845: log likelihood=-19.45
## AIC=44.9 AICc=45.16 BIC=52.69
\(m_t\) is quadratic linear and deterministic.
\[x_t = \mu t^2 + w_t\]
When we take the first difference, we get \[\Delta x_t = \mu (t^2 - (t-1)^2) + w_t - w_{t-1} = \mu(2t-1) + w_t - w_{t-1}\] When we take another difference, and we get \[\Delta^2 x_t = 2\mu(2t-1-2(t-1)+1) + w_t - 2w_{t-1} + w_{t-2}\] So you expect that we should estimate a ARIMA(0,2,2) with drift. Because Arima()
will not estimate the drift term if we have a 2nd difference, I will fit to the first difference. And we recover the terms as expected.
forecast::Arima(diff(xt),order=c(0,1,2), include.drift=TRUE)
## Series: diff(xt)
## ARIMA(0,1,2) with drift
##
## Coefficients:
## ma1 ma2 drift
## -1.9888 0.9888 0.01
## s.e. 0.0037 0.0035 0.00
##
## sigma^2 estimated as 0.09959: log likelihood=-272.92
## AIC=553.83 AICc=553.87 BIC=573.46
auto.arima()
struggles however.
forecast::auto.arima(diff(xt), stepwise=FALSE)
## Series: diff(xt)
## ARIMA(1,1,3) with drift
##
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
## ar1 ma1 ma2 ma3 drift
## -0.9588 -1.0201 -0.9202 0.9405 0.01
## s.e. NaN NaN NaN NaN 0.00
##
## sigma^2 estimated as 0.1005: log likelihood=-274.79
## AIC=561.59 AICc=561.67 BIC=591.02
\(m_t\) is a random walk.
\[x_t = m_t + w_t\\m_t = m_{t-1}+v_t\] When we take the first difference, we get \[\Delta x_t = m_t - m_{t-1} + w_t - w_{t-1} = v_t + w_t - w_{t-1}\] That’s a moving average (\(w_t-w_{t+1}\)) plus added white noise which is not an ARMA model. We can fit as a state-space model however.
We can take the same approach with a seasonal model. Let’s imagine that we can describe our data as a combination of the mean trend, a seasonal term, and error.
\[x_t = \mu t+ s_t + w_t\] Let’s imagine that the seasonal term is just a constant based on month and doesn’t change with time.
\[s_t = f(month)\] We want to remove the \(s_t\) with differencing so that we can model \(e_t\). We can solve for \(x_{t+1}\) by using \(x_{t-s}\) where \(s\) is the seasonal length (e.g. 12 if season is yearly).
When we take the first seasonal difference, we get \[\Delta_s x_t = \mu(t-(t-s)) + s_t - s_{t-s} + w_t - w_{t-s} = \mu s + w_t - w_{t-s}\] The \(s_t-s_{t-s}\) disappears because \(s_t = s_{t-s}\) when the seasonal effect is just a function of the month. Depending on what \(m_t\) is, we might be done or we might have to do a first difference. Notice that the error term is a moving average in the seasonal part.
auto.arima()
identifies a ARIMA(0,0,0)(0,1,2)[12].
forecast::auto.arima(xt, stepwise=FALSE)
## Series: xt
## ARIMA(0,0,0)(0,1,2)[12] with drift
##
## Coefficients:
## sma1 sma2 drift
## -1.0488 0.1765 0.0496
## s.e. 0.1606 0.1148 0.0007
##
## sigma^2 estimated as 0.08705: log likelihood=-29.56
## AIC=67.12 AICc=67.51 BIC=77.85
But we can recover the model parameters with Arima()
. Note the drift term is returned at \(\mu\) not \(\mu s\).
forecast::Arima(xt, seasonal=c(0,1,1), include.drift=TRUE)
## Series: xt
## ARIMA(0,0,0)(0,1,1)[12] with drift
##
## Coefficients:
## sma1 drift
## -1.0000 0.0496
## s.e. 0.1845 0.0008
##
## sigma^2 estimated as 0.08162: log likelihood=-30.74
## AIC=67.49 AICc=67.72 BIC=75.53
Let’s imagine that our seasonality in increasing over time.
\[s_t = \beta \times year \times f(month)\times\] When we take the first seasonal difference, we get \[\Delta_s x_t = \mu(t-(t-s)) + \beta f(month)\times (year - (year-1)) + w_t - w_{t-s} \\ = \mu s + \beta f(month) + w_t - w_{t-s}\] We need to take another seasonal difference to get rid of the \(f(month)\) which is not a constant; it is different for different months as it is our seasonality. \[\Delta^2_{s} x_t = w_t - w_{t-s} - w_{t-s} + w_{t-2s}=w_t - 2w_{t-s} + w_{t-2s}\] So our ARIMA model should be ARIMA(0,0,0)(0,2,2) auto.arima()
again has problems and returns many Infs; turn on trace=TRUE
to see the problem.
forecast::auto.arima(xt, stepwise=FALSE)
## Series: xt
## ARIMA(4,0,0)(1,1,0)[12] with drift
##
## Coefficients:
## ar1 ar2 ar3 ar4 sar1 drift
## 0.2548 -0.0245 0.2166 -0.2512 -0.5393 0.0492
## s.e. 0.0945 0.0947 0.0943 0.0966 0.0868 0.0025
##
## sigma^2 estimated as 0.1452: log likelihood=-48.2
## AIC=110.41 AICc=111.53 BIC=129.18
But we can recover the model with Arima()
. Note the drift term is returned at \(\mu\) not \(\mu s\).
forecast::Arima(xt, seasonal=c(0,2,2))
## Series: xt
## ARIMA(0,0,0)(0,2,2)[12]
##
## Coefficients:
## sma1 sma2
## -1.7745 0.9994
## s.e. 0.4178 0.4624
##
## sigma^2 estimated as 0.1045: log likelihood=-55.1
## AIC=116.2 AICc=116.46 BIC=123.9
Let’s imagine that our seasonality increases and then decreases.
\[s_t = (a y^2-b y+h) f(month)\]
Then we need to take 3 seasonal differences to get rid of the seasonality. The first will get rid of the \(h f(month)\), the next will get rid of \(by\) (year) terms and \(y^2\) terms, the third will get rid of extra \(y\) terms introduced by the 2nd difference. The seasonal differences will get rid of the linear trend also.
\[\Delta^3_{s} x_t = w_t - 2w_{t-s} + w_{t-2s}-w_{t-s}+2w_{t-2s}-w_{t-3s}=w_t - 3w_{t-s} + 3w_{t-2s}-w_{t-3s}\]
So our ARIMA model should be ARIMA(0,0,0)(0,3,3).
auto.arima()
again has problems and returns many Infs; turn on trace=TRUE
to see the problem.
forecast::auto.arima(xt, stepwise=FALSE)
But we can recover the model parameters, more or less, with Arima()
.
forecast::Arima(xt, seasonal=c(0,3,3))
## Series: xt
## ARIMA(0,0,0)(0,3,3)[12]
##
## Coefficients:
## sma1 sma2 sma3
## -2.6795 2.4905 -0.7899
## s.e. 2.4995 4.3468 2.0690
##
## sigma^2 estimated as 0.154: log likelihood=-94.67
## AIC=197.34 AICc=197.85 BIC=207.06