Lab 1 Forecasting with ARIMA and ETS models

For this lab you will use the material you have learned on ARMA and ETS models to explore features of time series of commercial landings of Chinook salmon in WA, CA, and OR. Then you will use these models to create forecasts and ask a research or forecasting question with those forecasts.

You will find this Rmd file in the Fish550-2025 repository. Clone the repository and then make sure you can Knit the Lab1-ARIMA-2025.Rmd file on your computer. Install any packages that it complains about.

Teams

Team 1: Anna, Brian, Nicole
Team 2: Matheus, Lilac, Mary
Team 3: Ben, Emma, Tina

References

Holmes, E. E., M. D. Scheuerell, and E. J. Ward. Applied Time Series Analysis for Fisheries and Environmental data. Box-Jenkins Method chapter and time series chapter.

Holmes, E. E. (2020) Fisheries Catch Forecasting https://fish-forecast.github.io/Fish-Forecast-Bookdown

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. https://otexts.com/fpp2/.

Last year’s team write-ups Fish550-2023

Plus the lecture material on the ATSA website.

Type of questions you might ask

“Compare the accuracy of forecasts using ARIMA models versus best fit ETS models. Are some types better than others for forecast accuracy?”

“Compare the accuracy of forecasts trained on log landings versus total landings.”

“Do the accuracy of forecasts change over time?”

“Look at the autoregressive structure of for the 3 states. Are there differences by region (WA vs OR vs CA)?”

“Compare the forecasts using 5, 10, 15, and 20 years of training data. Does forecast accuracy increase with more training data?”

“Compare the forecasts using monthly versus yearly data. Are forecasts better using one or the other?”

“There is something off with the California yearly data. How would you deal with that?”

“Using monthly data, are forecasts for some months better than others?”

“How does forecast accuracy change the farther out you try to forecast?”

“Create 1-year forecasts using all the data (for your state or all if you want). Is forecast error correlated with the PDO or other ocean indices?”

Chinook commercial landings data

The dataset has commercial landings data for WA, OR and CA. There is a monthly and a yearly dataset.

Load the data. The data is in the atsalibrary R package.

# if you need to install
# install.packages('atsalibrary', repos = c('https://atsa-es.r-universe.dev', 'https://cloud.r-project.org'))
library(atsalibrary)

Monthly

The monthly data is chinook.month and has 3 states WA, OR, CA.

Monthly data summed across states.

library(dplyr)
library(lubridate)

chinook.month.wc <- chinook.month %>%
  group_by(Year, Month) %>%
  summarise(
    metric.tons = sum(metric.tons, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    Date = as.Date(paste(Year, Month, "01", sep = "-"), format = "%Y-%b-%d"),
    log.metric.tons = log(metric.tons)
  ) %>%
  arrange(Date)

Yearly

The yearly data is chinook.year and has 6 states Alaska, California, Oregon, Washington, Michigan, Pennsylvania.

library(dplyr)
library(ggplot2)

chinook.year.filtered <- chinook.year %>%
  filter(!State %in% c("Michigan", "Pennsylvania"))

ggplot(chinook.year.filtered, aes(x = Year, y = log.metric.tons)) +
  geom_line() +
  facet_wrap(~ State, ncol = 2, scales = "free_y") +
  labs(
    title = "Chinook Landings by State",
    x = "Year",
    y = "Log(Metric Tons)"
  ) +
  theme_minimal()

Example analysis

Get one time series and split into train and test. Each with 10 years.

library(dplyr)

# Create a month number from the abbreviated month name
dat <- chinook.month %>%
  filter(State == "WA") %>%
  mutate(
    lnreturns = log.metric.tons,
    year = Year,
    month = match(Month, month.abb)  # Convert "Jan", "Feb", ... to 1, 2, ...
  ) %>%
  select(year, month, lnreturns)

# Create the time series object with monthly frequency
datts <- ts(dat$lnreturns, start = c(dat$year[1], dat$month[1]), frequency = 12)

# Create 10-year train and test windows
train <- window(datts, start = c(dat$year[1], dat$month[1]), end = c(dat$year[1]+9, dat$month[1]))
test <- window(datts, start = c(dat$year[1]+10, dat$month[1]), end = c(dat$year[1]+19, dat$month[1]))

train

##             Jan        Feb        Mar        Apr        May        Jun
## 1990  3.2619353  3.7773481  3.4688560  4.2341065  5.0894465  4.2752763
## 1991  2.8622009  3.9759363  2.1633230  3.2958369  4.6737630  4.6308379
## 1992  2.9123507  3.2464910  1.9600948  3.3214324  5.1607780  4.6821312
## 1993  1.1314021  1.5892352  1.5686159  2.5257286  4.6539604  4.2959239
## 1994  1.4109870  0.5877867  1.9021075  2.1633230  2.1860513  3.2921263
## 1995  0.1823216  0.0000000 -0.5108256  1.4350845  1.7404662  2.0149030
## 1996         NA  0.7419373 -0.2231436  2.8903718  2.2823824  2.9069011
## 1997 -0.3566749 -0.2231436 -2.3025851  3.8691155  3.6988298  3.6763007
## 1998         NA         NA         NA -0.9162907  4.0360090  3.4843123
## 1999 -0.9162907                                                       
##             Jul        Aug        Sep        Oct        Nov        Dec
## 1990  4.6210435  6.9321552  6.9397383  5.8051350  3.4934727  3.3032170
## 1991  3.9834130  6.2783338  6.5603232  5.6726357  4.1541846  3.3603754
## 1992  4.2326562  6.1180972  6.3205884  5.4773000  2.8449094  1.5892352
## 1993  3.8242841  5.9110676  6.0901780  5.3927180  2.5649494  1.5686159
## 1994  2.4595888  6.0633200  6.0330862  5.2765829  2.0668628  0.2623643
## 1995  2.7278528  5.7791991  6.1595181  5.4676381  2.7343675  0.5877867
## 1996  2.9549103  6.0388251  6.6062447  5.3151745  1.9315214  0.4054651
## 1997  2.6946272  6.1540085  6.5453497  4.9000761  1.7917595         NA
## 1998  2.1860513  5.6333600  5.9781260  4.6718938  2.7472709 -2.3025851
## 1999

Fit a model with auto.arima() in the forecast package.

library(forecast)
mod <- auto.arima(train)
mod

## Series: train 
## ARIMA(1,0,0)(0,1,0)[12] with drift 
## 
## Coefficients:
##          ar1    drift
##       0.3204  -0.0306
## s.e.  0.1162   0.0106
## 
## sigma^2 = 0.7391:  log likelihood = -116.82
## AIC=239.64   AICc=239.9   BIC=247.36

Plot a 10-year forecast against the test data.

library(zoo)
fr <- forecast(mod, h=10*12)
autoplot(fr) + geom_point(aes(x=x, y=y), data=fortify(test))

## Train / test for yearly data

dat <- chinook.year %>%
  filter(State == "Washington") %>%
  mutate(lnreturns = log.metric.tons,
         year = Year) %>%
  select(year, lnreturns)
datts <- ts(dat$lnreturns, start=dat$year[1])
train <- window(datts, dat$year[1], dat$year[1]+9)
test <- window(datts, dat$year[1]+10, dat$year[1]+10+9)

train

## Time Series:
## Start = 1950 
## End = 1959 
## Frequency = 1 
##  [1] 8.294300 8.506678 8.569786 8.500596 8.343839 8.423366 8.232334 8.244702
##  [9] 8.094958 7.889459