10 Apr 2025

Forecasting with an ARIMA model

The basic idea of forecasting with an ARIMA model to estimate the parameters and forecast forward.

For example, let’s say we want to forecast with a ARIMA(2,1,0) model with drift: \[z_t = \mu + \beta_1 z_{t-1} + \beta_2 z_{t-2} + e_t\] where \(z_t = x_t - x_{t-1}\), the first difference.

Arima() would write this model: \[(z_t-m) = \beta_1 (z_{t-1}-m) + \beta_2 (z_{t-2}-m) + e_t\] The relationship between \(\mu\) and \(m\) is \(\mu = m(1 - \beta_1 - \beta_2)\).

Let’s estimate the \(\beta\)’s for this model from the anchovy data.

fit <- forecast::Arima(anchovyts, order=c(2,1,0), include.constant=TRUE)
coef(fit)
##         ar1         ar2       drift 
## -0.53850433 -0.44732522  0.05367062
mu <- coef(fit)[3]*(1-coef(fit)[1]-coef(fit)[2])
mu
##     drift 
## 0.1065807

So we will forecast with this model:

\[z_t = 0.1065807-0.53850433 z_{t-1} - 0.44732522 z_{t-2} + e_t\]

To get our forecast for 1990, we do this

\[(x_{90}-x_{89}) = 0.106 - 0.538 (x_{89}-x_{88}) - 0.447 (x_{88}-x_{87})\]

Thus

\[x_{90} = x_{89}+0.106 -0.538 (x_{89}-x_{88}) - 0.447 (x_{88}-x_{87})\]

Forecasting with forecast()

forecast(fit, h=h) automates the forecast calculations for us and computes the upper and lower prediction intervals. Prediction intervals include uncertainty in parameter estimates plus the process error uncertainty.

fr <- forecast::forecast(fit, h=5)
fr
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1990       9.962083 9.702309 10.22186 9.564793 10.35937
## 1991       9.990922 9.704819 10.27703 9.553365 10.42848
## 1992       9.920798 9.623984 10.21761 9.466861 10.37473
## 1993      10.052240 9.713327 10.39115 9.533917 10.57056
## 1994      10.119407 9.754101 10.48471 9.560719 10.67809

Plotting our forecasts

plot(fr, xlab="Year")

Missing values

Missing values are allowed for forecast::Arima(). We can produce forecasts with the same code.

anchovy.miss <- anchovyts
anchovy.miss[10:11] <- NA
anchovy.miss[20:21] <- NA
fit <- forecast::Arima(anchovy.miss, order=c(2,1,0), include.constant=TRUE)
fr <- forecast::forecast(fit, h=5)
fr
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1990       9.938269 9.664479 10.21206 9.519543 10.35700
## 1991      10.014686 9.700961 10.32841 9.534885 10.49449
## 1992       9.924208 9.601147 10.24727 9.430129 10.41829
## 1993      10.029988 9.666069 10.39391 9.473421 10.58656
## 1994      10.128066 9.729066 10.52707 9.517848 10.73828

plot(fr)

Using auto.arima()

We can let forecast to select the ARIMA model:

anchovy.miss <- anchovyts
anchovy.miss[10:11] <- NA
anchovy.miss[20:21] <- NA
fit <- forecast::auto.arima(anchovy.miss)
fit
## Series: anchovy.miss 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -0.7240  0.0548
## s.e.   0.2283  0.0125
## 
## sigma^2 = 0.04355:  log likelihood = 3.52
## AIC=-1.03   AICc=0.11   BIC=2.63

fr <- forecast::forecast(fit, h=5)
plot(fr)

Seasonal ARIMA Models

Load the chinook salmon data set

load("chinook.RData")
head(chinook)
##   Year Month Species log.metric.tons metric.tons
## 1 1990   Jan Chinook        3.397858        29.9
## 2 1990   Feb Chinook        3.808882        45.1
## 3 1990   Mar Chinook        3.511545        33.5
## 4 1990   Apr Chinook        4.248495        70.0
## 5 1990   May Chinook        5.200705       181.4
## 6 1990   Jun Chinook        4.371976        79.2

The data are monthly and start in January 1990. To make this into a ts object do

chinookts <- ts(chinook$log.metric.tons, start=c(1990,1), 
                frequency=12)

start is the year and month and frequency is the number of months in the year.

Use ?ts to see more examples of how to set up ts objects.

Plot seasonal data

plot(chinookts)

Seasonal ARIMA model: SARIMA

Basic structure of a seasonal AR model

differenced data = AR(non-seasonal) + AR(seasonal) + cross-terms + MA

Differenced chinook data

  • No differencing \(x_t\). We are modeling the data.
  • \(x_t - x_{t-1}\). We are modeling the monthly change.
  • \(x_t - x_{t-12}\). We are modeling the seasonal change. Jan vs Jan last year, Feb vs Feb last year
  • \((x_t - x_{t-12}) - (x_{t-1} - x_{t-13})\). We are modeling the seasonal rate of monthly change.
  • \((x_t - x_{t-1}) - (x_{t-12} - x_{t-13})\). We are modeling the seasonal monthly change.

Notation

ARIMA (p,d,q)(ps,ds,qs)S

ARIMA (non-seasonal part)(seasonal part)Frequency

ARIMA (non-seasonal) means \(t\) correlated with \(t-1\)

ARIMA (seasonal) means \(t\) correlated with \(t-s\)

Seasonal autoregressive

ARIMA (1,0,0)(1,0,0)[12]

This month depends on last month plus the same month last year

What data are we modeling? Zero differences, so

\[z_t = x_t\]

Write out the AR parts with \(z_t\)

\[z_t = \phi_1 z_{t-1} + \Phi_1 z_{t-12} - \text{cross-term} + w_t\]

No MA in this model so \(w_t = e_t\) is white noise.

Autocorrelated seasonal differences

ARIMA (1,0,0)(1,1,0)[12]

The seasonal difference this month depends on the seasonal difference last month plus the seasonal difference last year

\(z_t\) is just a seasonal difference.

\[z_t = x_t - x_{t-12}\]

Write out the AR parts with \(z_t\)

\[z_t = \phi_1 z_{t-1} + \Phi_1 z_{t-12} - \text{cross-term} + w_t\]

No MA in this model so \(w_t = e_t\) is white noise.

Seasonal random walk model

ARIMA(0,0,0)(0,1,0)[12]

This month’s value is the same as last year’s, plus some random noise

expected January 1990 = January 1989 + constant mean

\[z_t = x_t - x_{t-12}\]

No AR nor MA

\[z_t = w_t = x_t - x_{t-12}\] \[x_t = x_{t-12} + w_t\]

Seasonal random walk model with random trend

ARIMA(0,1,0)(0,1,0)[12]

expected Feb 1990 = Feb 1989 + (Jan 1990 - Jan 1989)

\[z_t = (x_t - x_{t-12}) - (x_{t-1} - x_{t-13})\]

No AR or MA part.

\[z_t = w_t = (x_t - x_{t-12}) - (x_{t-1} - x_{t-13})\] \[x_t = x_{t-12} + (x_{t-1} - x_{t-13}) + w_t\]

airline model

ARIMA(0, 1, 1)(0, 1, 1)[12]

Seasonal random walk model with random trend + autocorrelated noise

\[z_t = (x_t - x_{t-12}) - (x_{t-1} - x_{t-13})\]

No AR part. Write out the MA parts, the \(w_t\).

\[w_t = e_t - \theta_1 e_{t-1} - \Theta_1 e_{t-12} + \text{cross-term}\]

auto.arima() for seasonal ts

auto.arima() will recognize that our data has season and fit a seasonal ARIMA model to our data by default. We will define the training data up to 1998 and use 1999 as the test data.

traindat <- window(chinookts, c(1990,10), c(1998,12))
testdat <- window(chinookts, c(1999,1), c(1999,12))
fit <- forecast::auto.arima(traindat)
fit
## Series: traindat 
## ARIMA(1,0,0)(0,1,0)[12] with drift 
## 
## Coefficients:
##          ar1    drift
##       0.3676  -0.0320
## s.e.  0.1335   0.0127
## 
## sigma^2 = 0.8053:  log likelihood = -107.37
## AIC=220.73   AICc=221.02   BIC=228.13

Forecasting with a Seasonal model

Load the chinook salmon data

library(atsalibrary)
## Warning: package 'atsalibrary' was built under R version
## 4.4.3
## Loading required package: broom
## Loading required package: corrplot
## corrplot 0.95 loaded
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: forecast
## Loading required package: fpp2
## ── Attaching packages ────────────────────────── fpp2 2.5 ──
## ✔ fma       2.5     ✔ expsmooth 2.3
## 
## Loading required package: ggmap
## ℹ Google's Terms of Service: <https://mapsplatform.google.com>
##   Stadia Maps' Terms of Service: <https://stadiamaps.com/terms-of-service/>
##   OpenStreetMap's Tile Usage Policy: <https://operations.osmfoundation.org/policies/tiles/>
## ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
## Loading required package: grid
## 
## Loading required package: Hmisc
## 
## 
## Attaching package: 'Hmisc'
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units
## 
## 
## Loading required package: kableExtra
## 
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## 
## Loading required package: knitr
## 
## Loading required package: loo
## 
## This is loo version 2.8.0
## 
## - Online documentation and vignettes at mc-stan.org/loo
## 
## - As of v2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session. 
## 
## 
## Attaching package: 'loo'
## 
## 
## The following object is masked from 'package:fma':
## 
##     milk
## 
## 
## Loading required package: magrittr
## 
## 
## Attaching package: 'magrittr'
## 
## 
## The following object is masked from 'package:ggmap':
## 
##     inset
## 
## 
## Loading required package: MARSS
## 
## Loading required package: quantmod
## 
## Loading required package: xts
## 
## Loading required package: zoo
## 
## 
## Attaching package: 'zoo'
## 
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## 
## Attaching package: 'xts'
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## 
## Loading required package: TTR
## 
## 
## Attaching package: 'quantmod'
## 
## 
## The following object is masked from 'package:Hmisc':
## 
##     Lag
## 
## 
## Loading required package: RCurl
## 
## Loading required package: readr
## 
## Loading required package: rmarkdown
## 
## Loading required package: stringr
## 
## Loading required package: tidyr
## 
## 
## Attaching package: 'tidyr'
## 
## 
## The following object is masked from 'package:RCurl':
## 
##     complete
## 
## 
## The following object is masked from 'package:magrittr':
## 
##     extract
## 
## 
## The following object is masked from 'package:reshape2':
## 
##     smiths
chinook <- chinook.month %>% subset(State == "WA")
chinookts <- ts(chinook$log.metric.tons, start=c(1990,1), 
                frequency=12)

Plot seasonal data

plot(chinookts)

Fit

Missing values are ok when fitting a seasonal ARIMA model

traindat <- window(chinookts, c(1990,10), c(1998,12))
testdat <- window(chinookts, c(1999,1), c(1999,12))
fit <- forecast::auto.arima(traindat)
fit
## Series: traindat 
## ARIMA(1,0,0)(0,1,0)[12] with drift 
## 
## Coefficients:
##          ar1    drift
##       0.3728  -0.0315
## s.e.  0.1332   0.0127
## 
## sigma^2 = 0.7918:  log likelihood = -106.67
## AIC=219.35   AICc=219.64   BIC=226.75

Forecast using seasonal model

fr <- forecast::forecast(fit, h=12)
plot(fr)
points(testdat)