Dynamic Linear Models (DLMs) - Time-varying productivity of salmon

Author

Zoe Rand, Terrance Wang

Published

May 11, 2023


Data

Describe the salmon data and any environmental covariates you decide to examine.

Load the data

Code
## library(devtools)
## Windows users will likely need to set this
## Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")
#devtools::install_github("nwfsc-timeseries/atsalibrary")

## get data
data(KvichakSockeye, package="atsalibrary")
SR_data <- KvichakSockeye

The data are a dataframe with columns for brood year (brood_year), number of spawners (spawners), number of recruits (recruits) and PDO at year \(t-2\) in summer (pdo_summer_t2) and in winter (pdo_winter_t2).

Code
## head of data file
head(SR_data)
  brood_year spawners recruits pdo_summer_t2 pdo_winter_t2
1       1952       NA    20200         -2.79         -1.68
2       1953       NA      593         -1.20         -1.05
3       1954       NA      799         -1.85         -1.25
4       1955       NA     1500         -0.60         -0.68
5       1956     9440    39000         -0.50         -0.31
6       1957     2840     4090         -2.36         -1.78

Wrangle the data

Code
library(tidyverse)
Code
#removing years with no spawner data
#creating log (recruits/spawners)
SR_data<-SR_data %>% filter(!is.na(spawners)) %>%mutate(logSR = log(recruits/spawners))
head(SR_data)
# A tibble: 6 × 6
# Groups:   brood_year [6]
  brood_year spawners recruits pdo_summer_t2 pdo_winter_t2   logSR
       <dbl>    <dbl>    <dbl>         <dbl>         <dbl>   <dbl>
1       1956     9440    39000         -0.5          -0.31  1.42  
2       1957     2840     4090         -2.36         -1.78  0.365 
3       1958      535      289         -1.48         -2.14 -0.616 
4       1959      680      547          0.58          0    -0.218 
5       1960    14600    55300          1.05          0.68  1.33  
6       1961     3710     3500         -0.22          0.77 -0.0583
Code
ggplot(SR_data) + geom_line(aes(x = brood_year, y = logSR))
Warning: Removed 7 rows containing missing values (`geom_line()`).

General tasks

Use the information and data in the previous section to answer the following questions. Note that if any model is not converging, then you will need to increase the maxit parameter in the control argument/list that gets passed to MARSS(). For example, you might try control=list(maxit=2000).

  1. Begin by fitting a reduced form of Ricker’s model that includes only a time-varying level (\(\alpha_t\)) and observation error (\(v_t\)). That is,
\[\begin{align*} \text{log}(R_t) &= \alpha_t + \text{log}(S_t) + v_t \\ \text{log}(R_t/S_t) &= \alpha_t + v_t \end{align*}\]

This model assumes no density-dependent survival in that the number of recruits is an ascending function of spawners. Plot the ts of \(\alpha_t\) and note the AICc for this model. Also plot appropriate model diagnostics.

  1. Fit the full Ricker model with a time-varying intercept and a static effect of spawners. For this model, obtain the time series of \(\alpha_t\), which is an estimate of the stock productivity in the absence of density-dependent effects. How do these estimates of productivity compare to those from the previous question? Plot the ts of \(\alpha_t\) and note the AICc for this model. Also plot appropriate model diagnostics. (\(Hint\): If you don’t want a parameter to vary with time, what does that say about its process variance?)

  2. Fit the expanded model that includes the summer PDO index as the covariate (pdo_summer_t2). What is the mean level of productivity? Plot the ts of \(\delta_t\) and note the AICc for this model. Also plot appropriate model diagnostics.

  3. Fit the expanded model that includes the winter PDO index as the covariate (pdo_winter_t2). What is the mean level of productivity? Plot the ts of \(\delta_t\) and note the AICc for this model. Also plot appropriate model diagnostics.

  4. Based on AICc, which of the models above is the most parsimonious? Is it well behaved (i.e., are the model assumptions met)? Plot the model forecasts for the best model. Is this a good forecast model?

  5. Consider other environmental factors that could influence the time-varying intrinsic productivity of these salmon. For example, are there other large-scale indices of ocean conditions (e.g., sea-surface temperature, El Niño Southern Oscillation)? You could also consider the possible influence of other salmon stocks such as those we examined in Lab 1.

Methods

Please address the following in your methods:

  • What environmental or dummy variables did you include and why?

  • What forms of models did you fit (ie, write them out in matrix form)?

  • What sort of model diagnostics did you use to examine model assumptions?

1. Reduced Ricker Model:

Equations:

\[ \text{log}(R_t/S_t) = \alpha_t + v_t \]

\[ \alpha_t = \alpha_{t-1} + w_t \]

This is a MARSS model where \(Z = 1\), a random walk with observation error:

Set \(y_t = log(R_t/S_t)\)

\[ y_t = Zx_t + v_t \]

\[ x_t = x_{t-1} + w_t \]

MARSS:

Code
library(MARSS)
Code
mod_list1<-list(B = "identity", Z = "identity", U = "zero", A = "zero", Q = matrix("q"), R = matrix("r"))
Code
#fit model with MARSS:
fit1<-MARSS(SR_data$logSR, mod_list1)
Success! abstol and log-log tests passed at 26 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 26 iterations. 
Log-likelihood: -51.87353 
AIC: 109.7471   AICc: 110.3625   
 
      Estimate
R.r      0.213
Q.q      0.301
x0.x0    0.998
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
#model diagnostics
autoplot(fit1, plot.type="fitted.ytT")

2. Full Ricker Model:

Equations:

\[ \text{log}(R_t/S_t) = \alpha_t - bS_t + v_t \]

\[ \alpha_t = \alpha_{t-1} + w_t \]

MARSS:

\[ y_t = Zx_t + v_t \]

\[ v_t \sim N(0, r) \]

\[ x_t = Bx_{t-1} + w_t \]

\[ w_t \sim N(0, \begin{bmatrix} q & 0 \\ 0 & 0 \end{bmatrix}) \]

where \(B = \begin{bmatrix}1 & 0\\0 & 1\end{bmatrix}\), \(x_t = \begin{bmatrix} \alpha_t \\ b \end{bmatrix}\), \(Z = \begin{bmatrix} 1 & S_t \end{bmatrix}\)

Code
#setting up model
Bmat <- matrix(c(1,0,0,1), 2, 2, byrow = TRUE)
TT<-nrow(SR_data)
ZZ <- array(NA, c(1, 2, TT))  
ZZ[1,1,] <- rep(1, TT) #intercept
ZZ[1,2,]<-SR_data$spawners #St
QQ<-matrix(list(0), 2, 2)
QQ[1,1]<-"q"
mod_list2<-list(B = Bmat, Z = ZZ, U = "zero", A = "zero", Q = QQ, R = matrix("r"))

inits_list <- list(x0 = matrix(c(0, 0), nrow = 2))
Code
fit2<-MARSS(SR_data$logSR, mod_list2, inits = inits_list)
Success! abstol and log-log tests passed at 29 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 29 iterations. 
Log-likelihood: -51.81022 
AIC: 111.6204   AICc: 112.6731   
 
      Estimate
R.r   2.19e-01
Q.q   2.92e-01
x0.X1 9.35e-01
x0.X2 7.02e-06
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
#model diagnostics
autoplot(fit2, plot.type="fitted.ytT")

3. Full Ricker Model with Summer PDO

Equations:

\[ \text{log}(R_t/S_t) = \alpha_t + \delta_t X_t - bS_t + v_t \]

\[ \alpha_t = \alpha_{t-1} + w_t \]

\[ \delta_t = \delta_{t-1} + w_t \]

MARSS:

\[ y_t = Zx_t + v_t \]

\[ v_t \sim N(0, r) \]

\[ x_t = Bx_{t-1} + w_t \]

\[ w_t \sim N(0, \begin{bmatrix} q_\alpha & 0 & 0\\ 0 & q_\delta & 0 \\ 0 & 0 & 0 \end{bmatrix}) \]

where \(B = \begin{bmatrix}1 & 0 & 0 \\0 & 1 & 0 \\ 0&0&1\end{bmatrix}\), \(x_t = \begin{bmatrix} \alpha_t \\ \delta_t \\ b \end{bmatrix}\), \(Z = \begin{bmatrix} 1 & X_t & S_t \end{bmatrix}\)

Code
#setting up model
Bmat <- matrix(c(1,0,0,0,1,0,0,0,1), 3, 3, byrow = TRUE)
TT<-nrow(SR_data)
ZZ <- array(NA, c(1, 3, TT))  
ZZ[1,1,] <- rep(1, TT) #intercept
ZZ[1,2,]<-(SR_data$pdo_summer_t2) #summer PDO
ZZ[1,3,]<-(SR_data$spawners) #St
QQ<-matrix(list(0), 3, 3)
QQ[1,1]<-"q_alpha"
QQ[2,2]<-"q_delta"
mod_list3<-list(B = Bmat, Z = ZZ, U = "zero", A = "zero", Q = QQ, R = matrix("r"))

inits_list <- list(x0 = matrix(c(0, 0, 0), nrow = 3))
Code
fit3<-MARSS(SR_data$logSR, mod_list3, inits = inits_list,control = list(maxit=1000,allow.degen=F))
Warning! Abstol convergence only. Maxit (=1000) reached before log-log convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
 maxit (=1000) reached before log-log convergence.
 The likelihood and params might not be at the ML values.
 Try setting control$maxit higher.
Log-likelihood: -51.81633 
AIC: 115.6327   AICc: 117.966   
 
          Estimate
R.r       2.00e-01
Q.q_alpha 3.16e-01
Q.q_delta 7.04e-05
x0.X1     1.02e+00
x0.X2     3.03e-02
x0.X3     5.48e-06
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

Convergence warnings
 Warning: the  Q.q_delta  parameter value has not converged.
 Type MARSSinfo("convergence") for more info on this warning.

4. Full Ricker Model with Winter PDO

The equations for #4 are identical with #3, with the exception of Winter PDO as \(X_t\)

Code
#setting up model
Bmat <- matrix(c(1,0,0,0,1,0,0,0,1), 3, 3, byrow = TRUE)
TT<-nrow(SR_data)
ZZ <- array(NA, c(1, 3, TT))  
ZZ[1,1,] <- rep(1, TT) #intercept
ZZ[1,2,]<-(SR_data$pdo_winter_t2) #summer PDO
ZZ[1,3,]<-(SR_data$spawners) #St
QQ<-matrix(list(0), 3, 3)
QQ[1,1]<-"q_alpha"
QQ[2,2]<-"q_delta"
mod_list3<-list(B = Bmat, Z = ZZ, U = "zero", A = "zero", Q = QQ, R = matrix("r"))

inits_list <- list(x0 = matrix(c(0, 0, 0), nrow = 3))

fit4<-MARSS(SR_data$logSR, mod_list3, inits = inits_list,control = list(maxit=1000,allow.degen=F))
Warning! Abstol convergence only. Maxit (=1000) reached before log-log convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
 maxit (=1000) reached before log-log convergence.
 The likelihood and params might not be at the ML values.
 Try setting control$maxit higher.
Log-likelihood: -51.44443 
AIC: 114.8889   AICc: 117.2222   
 
          Estimate
R.r       1.97e-01
Q.q_alpha 3.11e-01
Q.q_delta 2.27e-04
x0.X1     1.13e+00
x0.X2     1.54e-01
x0.X3     1.38e-06
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

Convergence warnings
 Warning: the  Q.q_delta  parameter value has not converged.
 Type MARSSinfo("convergence") for more info on this warning.

Results

1. Reduced Ricker Model

Plot of model fit (time series of \(\alpha_t\))

Code
ggplot(data = SR_data) + geom_line(aes(x = brood_year, y = logSR), color = "black") + geom_ribbon(aes(x = brood_year, ymin = t(fit1$states - 2*fit1$states.se), ymax = t(fit1$states + 2*fit1$states.se)), linetype = "dashed", color = "blue", fill = NA) + geom_line(aes(x = brood_year, y = t(fit1$states)), color = 'blue') + theme_classic()
Warning: Removed 7 rows containing missing values (`geom_line()`).

2. Full Ricker Model

Time series of regression parameters

Code
#intercept
int_mn<-fit2$states[1,]
int_se<-fit2$states.se[1,]
ggplot() + geom_ribbon(aes(x = SR_data$brood_year, ymin = int_mn-2*int_se, ymax = int_mn + 2*int_se), linetype = "dashed", fill = NA, color = "blue") + geom_line(aes(x = SR_data$brood_year, y = int_mn), color = "blue") + labs(x = "Brood Year", y = "alpha_t")

Code
#slope results
slp_mn<-fit2$states[2,1]
slp_se<-fit2$states.se[2,1]
print(paste("Mean: ", slp_mn))
[1] "Mean:  7.01979994813666e-06"
Code
print(paste("SE: ", slp_se))
[1] "SE:  0"

3. Full Ricker Model with Summer PDO

Code
#intercept
int_mn<-fit3$states[1,]
int_se<-fit3$states.se[1,]
ggplot() + geom_ribbon(aes(x = SR_data$brood_year, ymin = int_mn-2*int_se, ymax = int_mn + 2*int_se), linetype = "dashed", fill = NA, color = "blue") + geom_line(aes(x = SR_data$brood_year, y = int_mn), color = "blue") + labs(x = "Brood Year", y = "alpha_t")

Code
#summer pdo slope results
slp_mn<-fit3$states[2,]
slp_se<-fit3$states.se[2,]
ggplot() + geom_ribbon(aes(x = SR_data$brood_year, ymin = slp_mn-2*slp_se, ymax = slp_mn + 2*slp_se), linetype = "dashed", fill = NA, color = "blue") + geom_line(aes(x = SR_data$brood_year, y = slp_mn), color = "blue") + labs(x = "Brood Year", y = "delta_t (SummerPDO)")

Time-varying \(\delta_t\) seems unnecessary given its constant nature and very low value.

Code
# spawner slope results
slp_mn<-fit3$states[3,1]
slp_se<-fit3$states.se[3,1]
print(paste("Mean: ", slp_mn))
[1] "Mean:  5.47724897779201e-06"
Code
print(paste("SE: ", slp_se))
[1] "SE:  0"

Time-varying \(\delta_t\) seems unnecessary given its constant nature and very low value.

4. Full Ricker Model with Winter PDO

Code
#intercept
int_mn<-fit4$states[1,]
int_se<-fit4$states.se[1,]
ggplot() + geom_ribbon(aes(x = SR_data$brood_year, ymin = int_mn-2*int_se, ymax = int_mn + 2*int_se), linetype = "dashed", fill = NA, color = "blue") + geom_line(aes(x = SR_data$brood_year, y = int_mn), color = "blue") + labs(x = "Brood Year", y = "alpha_t")

Code
#winter pdo slope results
slp_mn<-fit4$states[2,]
slp_se<-fit4$states.se[2,]
ggplot() + geom_ribbon(aes(x = SR_data$brood_year, ymin = slp_mn-2*slp_se, ymax = slp_mn + 2*slp_se), linetype = "dashed", fill = NA, color = "blue") + geom_line(aes(x = SR_data$brood_year, y = slp_mn), color = "blue") + labs(x = "Brood Year", y = "delta_t (WinterPDO)")

Code
# spawner slope results
slp_mn<-fit4$states[3,1]
slp_se<-fit4$states.se[3,1]
print(paste("Mean: ", slp_mn))
[1] "Mean:  1.38299313195267e-06"
Code
print(paste("SE: ", slp_se))
[1] "SE:  0"

Time-varying \(\delta_t\) seems unnecessary given its constant nature and very low value.

Model Comparison

Code
AIC_tab<-tibble("Model" = c("Reduced Ricker", "Full Ricker", "Full w/ Summer PDO", "Full w/ Winter PDO"), "AICc" = c(fit1$AICc, fit2$AICc, fit3$AICc, fit4$AICc))
AIC_tab<- AIC_tab %>% mutate(DeltaAIC = AICc-min(AICc))
AIC_tab
# A tibble: 4 × 3
  Model               AICc DeltaAIC
  <chr>              <dbl>    <dbl>
1 Reduced Ricker      110.     0   
2 Full Ricker         113.     2.31
3 Full w/ Summer PDO  118.     7.60
4 Full w/ Winter PDO  117.     6.86

Reduced ricker is the best model, which makes sense since the estimate for the b parameter is essentially 0 in the other 3 models. Additionally the covariate parameters for Summer and PDO are both very low as well. All models are basically the same except the full ricker, and full ricker w/ covariates have extra parameter(s).

Forecasting with Reduced Ricker model

Code
## get list of Kalman filter output
kf_out <- MARSSkfss(fit1)
## forecasts of regr parameters; 2xT matrix
eta <- kf_out$xtt1

mod_list1<-list(B = "identity", Z = "identity", U = "zero", A = "zero", Q = matrix("q"), R = matrix("r"))

Z <- array(NA, c(1, TT))
Z[1,] <- rep(1, TT) 
## ts of E(forecasts)
fore_mean <- vector()
for(t in 1:TT) {
  fore_mean[t] <- Z[,t] %*% eta[, t, drop = FALSE]
}

## variance of regr parameters; 1x2xT array
Phi <- kf_out$Vtt1

## obs variance; 1x1 matrix
R_est <- coef(fit1, type="matrix")$R

## ts of Var(forecasts)
fore_var <- vector()
for(t in 1:TT) {
  tZ <- matrix(Z[,t], 1, 1) ## transpose of Z
  fore_var[t] <- Z[, t] %*% Phi[,,t] %*% tZ + R_est
}
ggplot() + geom_line(aes(x = SR_data$brood_year, y=fore_mean)) +
geom_ribbon(aes(x = SR_data$brood_year, ymin = fore_mean-2*fore_var, ymax = fore_mean + 2*fore_var), linetype = "dashed", fill = NA, color = "black") +  geom_point(aes(x=SR_data$brood_year, y=SR_data$logSR),color="blue") + labs(x="Year",y="Log(R/S)") + theme_bw()
Warning: Removed 7 rows containing missing values (`geom_point()`).

Code
ggplot() + geom_line(aes(x = SR_data$brood_year, y=exp(fore_mean))) +
geom_ribbon(aes(x = SR_data$brood_year, ymin = exp(fore_mean-2*fore_var), ymax = exp(fore_mean + 2*fore_var)), linetype = "dashed", fill = NA, color = "black") +  geom_point(aes(x=SR_data$brood_year, y=exp(SR_data$logSR)),color="blue") + labs(x="Year",y="R/S") + theme_bw() + ylim(0,25)
Warning: Removed 7 rows containing missing values (`geom_point()`).

Observed values almost all fall within the prediction intervals. This is a reasonably well fitting model, however there is an AR1 lag between prediction and observation. This AR1 lag seems to be common throughout most time series prediction models.

Discussion

We found that Kvichak sockeye recruitment/spawner ratio can be explained well by the simplest Ricker spawner-recruit model with only a time-varying level and observation error, essentially a simple random walk process. Based on the AICc, the best performing model was the simplest model without any density-dependent or PDO information. This is somewhat surprising, as we expected density dependence or conditions in the environment to have an effect. There may be environmental factors besides PDO that we did not account for this model that might influence productivity, including ENSO, SST, and sea surface air pressure (Ovando et al. 2022).

Team contributions

ZR did the data wrangling and wrote the methods, code, and results for the first two models (reduced and full Ricker). TW wrote the methods, code, and results for the models with covariates and the forecasting. Both team members contributed to the discussion.