Let’s imagine that we can describe our data as a combination of the mean trend and error.

\[x_t = m_t + e_t\]

What if our main goal is to predict \(x_{t+1}\)? Note we (fisheries biologists and ecologists) often want to know \(m_t\). In fact, that is often our main and sole goal. But let’s say our goal is to predict \(x_{t+1}\).

How do we do this?

Approach #1 which we will use in the rest of the course

We estimate \(m_t\) from the data. Once we have an estimate of \(m_t\), we have an estimate of \(e_t\). We can model that error (think AR and MA) to help us predict future \(x_t\) from \(x_{t-1}\).

Mark introduced filtering (moving averages) as one way to estimate \(m_t\). We will not be doing that in the rest course. Instead, we will have a model for the \(m_t\) and we will fit \[x_t = m_t + e_t\] directly. We will get an estimate of the model that produced \(m_t\). We will also estimate the structure (think AR, MA) of \(w_t\). directly means that we not be transforming our data with differencing in order to produce a statistic (difference) that is stationary. The \(x_t\) and \(m_t\) models will have a biological interpretation.

So for example if our model is

\[m_t = m_{t-1}+\mu \\ x_t = m_t + w_t.\] Then we fit that model without taking any differences. Note the above is a trivial example as it is just a linear deterministic trend but gives you the idea.

Approach #2 Box-Jenkins

The Box-Jenkins approach is different. In this approach to predicting \(x_{t+1}\), we remove \(m_t\) from our data using differencing. We don’t have to worry about a model for \(m_t\) because we have removed it!

And we don’t actually need it to predict \(x_{t+1}\) since we have \(x_{t}\), \(x_{t-1}\), etc. We model \(\Delta^d x_t\) and then predict \(\Delta^d x_{t+1}\). \(\Delta^d x_t\) is a d-th difference. Once we have that we can predict \(x_{t+1}\) by using \(x_t\), \(x_{t-1}\), etc with some algebra.

The error structure of \(\Delta^d x_{t+1}\) is NOT the same as \(e_t\). \[\Delta^d x_{t} = \phi_1\Delta^d x_{t-1} + \phi_2\Delta^d x_{t-2} + \dots + z_t\]

\(z_t\) is the error of the differences. And the \(\phi\) in the AR part are for the differences not the original \(x_t\). So it is more difficult to make a biological interpretation of the ARMA model. But remember, the objective was to predict \(x_{t+1}\) not to fit a model with a biological interpretation.

Examples of differencing, linear trend

\(m_t\) is linear and deterministic.

\[x_t = \mu t + w_t\]

When we take the first difference, we get \[\Delta x_t = \mu (t - (t-1)) + w_t - w_{t-1} = \mu + w_t - w_{t-1}\] So you expect that we should estimate a ARIMA(0,1,1) with drift. For some reason auto.arima() is returning Inf for the AICc for ARIMA(0,1,1) with drift although Arima() returns it. Turn on trace=TRUE to see the problem.

forecast::auto.arima(xt, stepwise=FALSE)
## Series: xt 
## ARIMA(0,1,3) 
## 
## Coefficients:
##           ma1      ma2     ma3
##       -0.7878  -0.1612  0.2303
## s.e.   0.1036   0.1496  0.1103
## 
## sigma^2 estimated as 0.1014:  log likelihood=-26.19
## AIC=60.39   AICc=60.81   BIC=70.77

But we can recover the parameters with Arima(). Notice that the AICc is much smaller than for the model chosen by auto.arima().

forecast::Arima(xt,order=c(0,1,1),include.drift=TRUE)
## Series: xt 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -1.0000  0.0208
## s.e.   0.0295  0.0010
## 
## sigma^2 estimated as 0.0845:  log likelihood=-19.45
## AIC=44.9   AICc=45.16   BIC=52.69

Examples of differencing, quadradic trend

\(m_t\) is quadratic linear and deterministic.

\[x_t = \mu t^2 + w_t\]

When we take the first difference, we get \[\Delta x_t = \mu (t^2 - (t-1)^2) + w_t - w_{t-1} = \mu(2t-1) + w_t - w_{t-1}\] When we take another difference, and we get \[\Delta^2 x_t = 2\mu(2t-1-2(t-1)+1) + w_t - 2w_{t-1} + w_{t-2}\] So you expect that we should estimate a ARIMA(0,2,2) with drift. Because Arima() will not estimate the drift term if we have a 2nd difference, I will fit to the first difference. And we recover the terms as expected.

forecast::Arima(diff(xt),order=c(0,1,2), include.drift=TRUE)
## Series: diff(xt) 
## ARIMA(0,1,2) with drift 
## 
## Coefficients:
##           ma1     ma2  drift
##       -1.9888  0.9888   0.01
## s.e.   0.0037  0.0035   0.00
## 
## sigma^2 estimated as 0.09959:  log likelihood=-272.92
## AIC=553.83   AICc=553.87   BIC=573.46

auto.arima() struggles however.

forecast::auto.arima(diff(xt), stepwise=FALSE)
## Series: diff(xt) 
## ARIMA(1,1,3) with drift 
## 
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
##           ar1      ma1      ma2     ma3  drift
##       -0.9588  -1.0201  -0.9202  0.9405   0.01
## s.e.      NaN      NaN      NaN     NaN   0.00
## 
## sigma^2 estimated as 0.1005:  log likelihood=-274.79
## AIC=561.59   AICc=561.67   BIC=591.02

Examples of differencing, random walk trend

\(m_t\) is a random walk.

\[x_t = m_t + w_t\\m_t = m_{t-1}+v_t\] When we take the first difference, we get \[\Delta x_t = m_t - m_{t-1} + w_t - w_{t-1} = v_t + w_t - w_{t-1}\] That’s a moving average (\(w_t-w_{t+1}\)) plus added white noise which is not an ARMA model. We can fit as a state-space model however.

Seasonal models

We can take the same approach with a seasonal model. Let’s imagine that we can describe our data as a combination of the mean trend, a seasonal term, and error.

\[x_t = \mu t+ s_t + w_t\] Let’s imagine that the seasonal term is just a constant based on month and doesn’t change with time.

\[s_t = f(month)\] We want to remove the \(s_t\) with differencing so that we can model \(e_t\). We can solve for \(x_{t+1}\) by using \(x_{t-s}\) where \(s\) is the seasonal length (e.g. 12 if season is yearly).

When we take the first seasonal difference, we get \[\Delta_s x_t = \mu(t-(t-s)) + s_t - s_{t-s} + w_t - w_{t-s} = \mu s + w_t - w_{t-s}\] The \(s_t-s_{t-s}\) disappears because \(s_t = s_{t-s}\) when the seasonal effect is just a function of the month. Depending on what \(m_t\) is, we might be done or we might have to do a first difference. Notice that the error term is a moving average in the seasonal part.

auto.arima() identifies a ARIMA(0,0,0)(0,1,2)[12].

forecast::auto.arima(xt, stepwise=FALSE)
## Series: xt 
## ARIMA(0,0,0)(0,1,2)[12] with drift 
## 
## Coefficients:
##          sma1    sma2   drift
##       -1.0488  0.1765  0.0496
## s.e.   0.1606  0.1148  0.0007
## 
## sigma^2 estimated as 0.08705:  log likelihood=-29.56
## AIC=67.12   AICc=67.51   BIC=77.85

But we can recover the model parameters with Arima(). Note the drift term is returned at \(\mu\) not \(\mu s\).

forecast::Arima(xt, seasonal=c(0,1,1), include.drift=TRUE)
## Series: xt 
## ARIMA(0,0,0)(0,1,1)[12] with drift 
## 
## Coefficients:
##          sma1   drift
##       -1.0000  0.0496
## s.e.   0.1845  0.0008
## 
## sigma^2 estimated as 0.08162:  log likelihood=-30.74
## AIC=67.49   AICc=67.72   BIC=75.53

Seasonal model with changing season

Let’s imagine that our seasonality in increasing over time.

\[s_t = \beta \times year \times f(month)\times\] When we take the first seasonal difference, we get \[\Delta_s x_t = \mu(t-(t-s)) + \beta f(month)\times (year - (year-1)) + w_t - w_{t-s} \\ = \mu s + \beta f(month) + w_t - w_{t-s}\] We need to take another seasonal difference to get rid of the \(f(month)\) which is not a constant; it is different for different months as it is our seasonality. \[\Delta^2_{s} x_t = w_t - w_{t-s} - w_{t-s} + w_{t-2s}=w_t - 2w_{t-s} + w_{t-2s}\] So our ARIMA model should be ARIMA(0,0,0)(0,2,2) auto.arima() again has problems and returns many Infs; turn on trace=TRUE to see the problem.

forecast::auto.arima(xt, stepwise=FALSE)
## Series: xt 
## ARIMA(4,0,0)(1,1,0)[12] with drift 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     sar1   drift
##       0.2548  -0.0245  0.2166  -0.2512  -0.5393  0.0492
## s.e.  0.0945   0.0947  0.0943   0.0966   0.0868  0.0025
## 
## sigma^2 estimated as 0.1452:  log likelihood=-48.2
## AIC=110.41   AICc=111.53   BIC=129.18

But we can recover the model with Arima(). Note the drift term is returned at \(\mu\) not \(\mu s\).

forecast::Arima(xt, seasonal=c(0,2,2))
## Series: xt 
## ARIMA(0,0,0)(0,2,2)[12] 
## 
## Coefficients:
##          sma1    sma2
##       -1.7745  0.9994
## s.e.   0.4178  0.4624
## 
## sigma^2 estimated as 0.1045:  log likelihood=-55.1
## AIC=116.2   AICc=116.46   BIC=123.9

Seasonal model with changing season #2

Let’s imagine that our seasonality increases and then decreases.

\[s_t = (a y^2-b y+h) f(month)\]

Then we need to take 3 seasonal differences to get rid of the seasonality. The first will get rid of the \(h f(month)\), the next will get rid of \(by\) (year) terms and \(y^2\) terms, the third will get rid of extra \(y\) terms introduced by the 2nd difference. The seasonal differences will get rid of the linear trend also.

\[\Delta^3_{s} x_t = w_t - 2w_{t-s} + w_{t-2s}-w_{t-s}+2w_{t-2s}-w_{t-3s}=w_t - 3w_{t-s} + 3w_{t-2s}-w_{t-3s}\]

So our ARIMA model should be ARIMA(0,0,0)(0,3,3).

auto.arima() again has problems and returns many Infs; turn on trace=TRUE to see the problem.

forecast::auto.arima(xt, stepwise=FALSE)

But we can recover the model parameters, more or less, with Arima().

forecast::Arima(xt, seasonal=c(0,3,3))
## Series: xt 
## ARIMA(0,0,0)(0,3,3)[12] 
## 
## Coefficients:
##          sma1    sma2     sma3
##       -2.6795  2.4905  -0.7899
## s.e.   2.4995  4.3468   2.0690
## 
## sigma^2 estimated as 0.154:  log likelihood=-94.67
## AIC=197.34   AICc=197.85   BIC=207.06