For this lab you will use the material you have learned in the first 3 lectures to explore features of time series of salmon in the North Pacific (Alaska and E Asia). Then you will use ARIMA models to create forecasts and ask a research question with those forecasts.
You will find this Rmd file in the Fish550-2023
repository. Clone the repository and then make sure you can Knit the
Lab1-ARIMA.Rmd
file on your computer. Install any packages
that it complains about.
Holmes, E. E. (2020) Fisheries Catch Forecasting https://fish-forecast.github.io/Fish-Forecast-Bookdown
Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. https://otexts.com/fpp2/.
Plus the lecture material on the ATSA website.
“Compare the accuracy of forecasts using best fit ARIMA models for pink salmon using the different regions in the Ruggerone & Irvine data. Is forecast accuracy is different for different regions?”
“Compare the accuracy of total abundance forecasts using ARIMA models for Bristol Bay sockeye rivers and compare to the AKFW and UW FRI forecasts.”
“Compare the accuracy of age-group forecasts using ARIMA models for Bristol Bay sockeye and compare to the AKFW and UW FRI forecasts.”
“Use the Ruggerone & Irvine data and ARIMA models to study the autoregressive structure of pink, chum and sockeye. Are there differences by region (AK verus E Asia)?”
“Compare the forecasts of total North Pacific pink and chum using 5, 10, 15, and 20 years of training data. Does forecast accuracy increase with more training data?”
“Create 1-year forecasts of total North Pacific pink salmon using 20 years of training data for all of the Ruggerone and Irvine data. Is forecast error correlated with the PDO?”
The bristol_bay_data_plus_covariates.rds
file has
Bristol Bay sockeye abundance for 9 rivers for 4 age-groups. The data
are from Ovando et al 2021 Improving forecasts of sockeye salmon
(Oncorhynchus nerka) with parametric and nonparametric models DOI:
10.1139/cjfas-2021-0287. You’ll find a copy in the lab folder. The data
file also has the covariates for year that the smolts enter the ocean as
used in Ovando et al.
Load the data.
bb_data <- readRDS(file.path("Data_Images", "bristol_bay_data_plus_covariates.rds"))
The data you will most likely want are
ret_yr
The year the spawners return to the spawning
groundsret
The returns (number of fish in 1000s)system
The river nameage_group
The age_groupforecast.adfw
The forecast from AK Fish and
Wildlifeforecast.fri
The forecast from UW Fisheries Research
Instituteenv_*
are some covariates at the year the age group
entered the oceanIn the data file, the age group designation is “a.b” where “a” is
number of years in freshwater and “b” is number of years in the ocean.
The age of the spawners in then a+b
.
The data
## colnames: brood_yr ret_yr system fw_age o_age age_group ret forecast.adfw forecast.fri env_pdo env_sst env_slp env_upstr
## system (river): Igushik Wood Nushagak Kvichak Naknek Egegik Ugashik
## age groups:
Some plots of the Bristol Bay data. Hmm there is a NA that was replaced with 0 it looks like.
bb_data %>%
filter(system=="Kvichak") %>%
ggplot(aes(x=ret_yr, y=log(ret))) +
geom_line() +
ggtitle("log abundance by age group") +
facet_wrap(~age_group)
bb_data %>%
group_by(system, ret_yr) %>%
summarize(total = sum(ret, na.rm=TRUE)) %>%
ggplot(aes(x=ret_yr, y=log(total))) +
geom_line() +
ggtitle("log abundance by river") +
facet_wrap(~system)
Here are some subsets of the data that you might want to use.
Log total by age group
subdata <- bb_data %>%
group_by(age_group, ret_yr) %>%
summarize(lntotal = log(sum(ret, na.rm=TRUE)))
head(subdata)
## # A tibble: 6 × 3
## # Groups: age_group [1]
## age_group ret_yr lntotal
## <chr> <dbl> <dbl>
## 1 1.2 1963 7.49
## 2 1.2 1964 8.50
## 3 1.2 1965 7.22
## 4 1.2 1966 7.18
## 5 1.2 1967 6.88
## 6 1.2 1968 8.11
Log total by river
subdata <- bb_data %>%
group_by(system, ret_yr) %>%
summarize(lntotal = log(sum(ret, na.rm=TRUE)))
head(subdata)
## # A tibble: 6 × 3
## # Groups: system [1]
## system ret_yr lntotal
## <chr> <dbl> <dbl>
## 1 Egegik 1963 7.54
## 2 Egegik 1964 7.55
## 3 Egegik 1965 8.55
## 4 Egegik 1966 7.95
## 5 Egegik 1967 7.41
## 6 Egegik 1968 6.88
Compare fish that spend 2 years in ocean versus those that spend 3 years.
subdata <- bb_data %>%
mutate(
ocean_years = case_match(
age_group,
c("2.3", "1.3") ~ "3-yr-ocean",
c("1.2", "2.2") ~ "2-yr-ocean",
.default = age_group
)) %>%
group_by(system, ocean_years, ret_yr) %>%
summarize(lntotal = log(sum(ret, na.rm=TRUE)))
head(subdata)
## # A tibble: 6 × 4
## # Groups: system, ocean_years [1]
## system ocean_years ret_yr lntotal
## <chr> <chr> <dbl> <dbl>
## 1 Egegik 2-yr-ocean 1963 7.02
## 2 Egegik 2-yr-ocean 1964 7.29
## 3 Egegik 2-yr-ocean 1965 8.34
## 4 Egegik 2-yr-ocean 1966 5.96
## 5 Egegik 2-yr-ocean 1967 6.58
## 6 Egegik 2-yr-ocean 1968 6.35
Get one time series and split into train and test. Each with 10 years.
dat <- bb_data %>%
filter(system == "Kvichak", age_group == "1.3") %>%
mutate(lnreturns = log(ret),
year = ret_yr) %>%
select(year, lnreturns)
datts <- ts(dat$lnreturns, start=dat$year[1])
train <- window(datts, dat$year[1], dat$year[1]+9)
test <- window(datts, dat$year[1]+10, dat$year[1]+10+9)
The data set Data_Images/ruggerone_data.rds
has total
abundance of natural spawners (not hatchery) from 15 regions in the N
Pacific. These are data provided with Ruggerone, G. and Irvine, J. 2018.
Numbers and biomass of natural- and hatchery-origin Pink, Chum, and
Sockeye Salmon in the North Pacific Ocean, 1925-2015. Marine and Coastal
Fisheries: Dynamics, Management, and Ecosystem Science 10. DOI: 10.1002/mcf2.10023.
Open Access.
Load the data
ruggerone_data <- readRDS(file.path("Data_Images", "ruggerone_data.rds"))
region in data file | desciption | regions in map |
---|---|---|
japan | Japan & South Korea | 20 and 21 |
m_i | Russian Mainland & Islands | 13, 16, 17 18, 19 |
w_kam | Western Kamchatka | 15 |
e_kam | Eastern Kamchatka | 14 |
wak | Western Alaska | 9, 10, 11, 12 |
s_pen | Southern Alaska Peninsula | 8 |
kod | Kodiak | 7 |
ci | Cook Inlet | 6 |
pws | Prince William Sound | 5 |
seak | Southeast Alaska | 4 |
nbc | Northern British Columbia | 3 |
sbc | Southern British Columbia | 2 |
wa | Washington State | 1 |
wc | West Coast USA | mislabeled on map |
cak (not in data file) | Central Alaska | 5, 6, 7, 8 |
## colnames: year region returns species
## species: pink chum sockeye
## regions: ci e_kam japan kod korea m_i nbc pws sbc seak s_pen wa wak w_kam wc
ruggerone_data %>%
filter(species=="pink") %>%
ggplot(aes(x=year, y=log(returns))) +
geom_line() +
ggtitle("pink salmon log abundance") +
facet_wrap(~region)
ruggerone_data %>%
group_by(species, year) %>%
summarize(total = sum(returns, na.rm=TRUE)) %>%
ggplot(aes(x=year, y=log(total))) +
geom_line() +
ggtitle("log abundance by species") +
facet_wrap(~species)
Here are some subsets of the data that you might want to use.
Log total North Pacific pink, chum, sockeye
subdata <- ruggerone_data %>%
group_by(species, year) %>%
summarize(lntotal = log(sum(returns, na.rm=TRUE)))
head(subdata)
## # A tibble: 6 × 3
## # Groups: species [1]
## species year lntotal
## <chr> <dbl> <dbl>
## 1 chum 1952 3.91
## 2 chum 1953 3.88
## 3 chum 1954 4.20
## 4 chum 1955 4.28
## 5 chum 1956 4.38
## 6 chum 1957 4.08
Log North Pacific pink
subdata <- ruggerone_data %>%
filter(species == "pink") %>%
mutate(lnreturns = log(returns))
head(subdata)
## # A tibble: 6 × 5
## year region returns species lnreturns
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1952 ci 4.36 pink 1.47
## 2 1953 ci 1.30 pink 0.264
## 3 1954 ci 4.67 pink 1.54
## 4 1955 ci 2.67 pink 0.981
## 5 1956 ci 3.57 pink 1.27
## 6 1957 ci 0.804 pink -0.218
Total in some bigger areas
subdata <- ruggerone_data %>%
mutate(
area = case_match(
region,
c("japan", "korea", "m_i", "e_kam", "w_kam") ~ "East_Asia",
c("wak", "s_pen", "kod", "ci", "pws", "seak") ~ "Alaska",
c("nbc", "sbc", "wa", "wc") ~ "WC",
.default = region
)) %>%
group_by(area, species, year) %>%
summarize(lntotal = log(sum(returns, na.rm=TRUE)))
head(subdata)
## # A tibble: 6 × 4
## # Groups: area, species [1]
## area species year lntotal
## <chr> <chr> <dbl> <dbl>
## 1 Alaska chum 1952 2.83
## 2 Alaska chum 1953 2.77
## 3 Alaska chum 1954 2.92
## 4 Alaska chum 1955 2.47
## 5 Alaska chum 1956 2.81
## 6 Alaska chum 1957 2.89
Get one time series out of ruggerone_data
dat <- ruggerone_data %>%
filter(region == "wak", species == "pink") %>%
mutate(lnreturns = log(returns)) %>%
select(year, lnreturns)
head(dat)
## # A tibble: 6 × 2
## year lnreturns
## <dbl> <dbl>
## 1 1952 1.40
## 2 1953 -1.24
## 3 1954 1.40
## 4 1955 -1.24
## 5 1956 1.40
## 6 1957 -1.24
Make a time series object and divide into train and test data.
datts <- ts(dat$lnreturns, start=dat$year[1])
train <- window(datts, 1952, 1971)
test <- window(datts, 1972, 2001)
Fit a model with auto.arima()
in the forecast
package.
library(forecast)
mod <- auto.arima(train)
mod
## Series: train
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## -0.9293
## s.e. 0.0725
##
## sigma^2 = 0.3815: log likelihood = -19.22
## AIC=42.45 AICc=43.15 BIC=44.44
Plot a 30-year forecast against the test data.
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
fr <- forecast(mod, h=30)
autoplot(fr) + geom_point(aes(x=x, y=y), data=fortify(test))