Lab 2 MARSS Models

Author

Liz Elmstrom, Terrance Wang, Eric French

Published

April 20, 2023

General Questions

Each group has the same general tasks, but you will adapt them as you work on the data.

  1. Create estimates of spawner abundance for all missing years and provide estimates of the decline from the historical abundance.

  2. Evaluate support for the major population groups. Are the populations in the groups more correlated than outside the groups?

  3. Evaluate the evidence of cycling in the data. We will talk about how to do this on the Tuesday after lab.

We analyzed the Steelhead (Lower Columbia ESU). We decided to exclude the Gorge Tributary winter run because there was only 1 short time series, and focused the 9 remaining runs that belonged to Cascade major population group.

Methods

We tested 5 different population groups hypotheses: one population, individual populations, run timing groups, spatial groups, and correlation groups.

One population \[ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\y_{9} \end{bmatrix}_{t} = \begin{bmatrix}1\\ 1\\ \vdots\\ \\ 1\end{bmatrix} x_{t} + \mathbf{a} + \mathbf{v_t} \text{ where } \mathbf{v_t} \sim MVN(0, \mathbf{R}) \]

\[ \begin{bmatrix} x \end{bmatrix}_{t} = \begin{bmatrix}1\\ \end{bmatrix} x_{t-1} + u + w_t \text{ where } w_t \sim N(0,q) \]

Individual populations \[ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\y_{9} \end{bmatrix}_{t} = I_{9} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\x_{9} \end{bmatrix}_{t} + \mathbf{a} + \mathbf{v_t} \text{ where } \mathbf{v_t} \sim MVN(0, \mathbf{R}) \] \[ \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{9} \end{bmatrix}_{t} = I_{9} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{9} \end{bmatrix}_{t-1} + \begin{bmatrix}u_{1}\\u_{2}\\ \vdots \\u_{9} \end{bmatrix} + \mathbf{w_t} \text{ where } \mathbf{w_t} \sim MVN(0, \mathbf{Q}) \] Run Timing Groups, Spatial Groups, Correlation Groups We provide the Z matrices for the 3 groupings below. \[ \begin{equation*} \begin{array}{rcccc} &Run Timing&Spatial&Correlation\\ &\text{summer winter}&\text{Coweeman E-Fork-Lewis Kalama Cowlitz Washougal}&\text{Coweeman Kalama Cowlitz}\\ \hline \begin{array}{r}\text{Coweeman - W}\\ \text{Fork E Lewis - W} \\ \text{Kalama - S} \\ \text{Kalama - W} \\ \text{Lower Cowlitz - W} \\ \text{Toutle - W} \\ \text{Tilton - W} \\ \text{Upper Cowlitz - W} \\ \text{Washougal - W}\end{array}& \begin{bmatrix} 0 & 1 \\ 0 & 1 \\ 1 & 0 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ 0 & 1 \\ \end{bmatrix}& \begin{bmatrix} 1 & 0 & 0 & 0 &0 \\ 0 & 1 & 0 & 0 &0 \\ 0 & 0 & 1 & 0 &0 \\ 0 & 0 & 1 & 0 &0 \\ 0 & 0 & 0 & 1 &0 \\ 1 & 0 & 0 & 0 &0 \\ 0 & 0 & 0 & 1 &0 \\ 1 & 0 & 0 & 0 &0 \\ 0 & 0 & 0 & 0 &1 \\ \end{bmatrix}& \begin{bmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix}& \end{array} \end{equation*} \]

We assume that all models have identical observation errors, thus a diagonal and equal R matrix. \[ \mathbf{R} = \begin{bmatrix} r & 0 & ...& 0\\ 0 & r & ...& 0\\ \vdots & 0 & \ddots & \vdots\\ 0 & 0 & 0 & r \end{bmatrix} \] For each model we test both equal and unequal biases for the states. We also assumed all states were correlated over time and test both equal var-cov and unconstrained process errors for each state.

Equal Bias \[ \mathbf{u} = \begin{bmatrix} u\\ \vdots\\ u\end{bmatrix} \] Unequal Bias \[ \mathbf{u} = \begin{bmatrix} u_{1}\\ \vdots\\ u_{i}\end{bmatrix} \text{where i is # of states} \]

Equal Variance-Covariance \[ \begin{equation} \mathbf{Q}=\begin{bmatrix} q & c & ... & c \\ c & q & \ddots & \vdots\\ \vdots & c & \ddots & c \\ c & c & ... & q \end{bmatrix} \end{equation} \text{where # of rows is equal to # of states} \] Unconstrained \[ \begin{equation} \mathbf{Q}=\begin{bmatrix} q_{1}& c_{1,2} & ... & c_{1,i} \\ c_{1,2} & q_{2} & \ddots & c_{2,i}\\ \vdots & \ddots & \ddots & \vdots \\ c_{1,i} & ... & c_{i-1,i} & q_{i} \end{bmatrix} \end{equation} \text{where # of rows is equal to # of states} \]

Data exploration

Load the data.

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(dplyr)
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
Code
library(MARSS)
library(forecast)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Code
library(broom)

load(here::here("Lab-2", "Data_Images", "columbia-river.rda"))

#Evolutionary Significant Units
esu <- unique(columbia.river$esu_dps)
esu
[1] "Steelhead (Middle Columbia River DPS)"     
[2] "Steelhead (Upper Columbia River DPS)"      
[3] "Steelhead (Lower Columbia River DPS)"      
[4] "Salmon, coho (Lower Columbia River ESU)"   
[5] "Salmon, Chinook (Lower Columbia River ESU)"
Code
df <- columbia.river %>% subset(esu_dps %in% "Steelhead (Lower Columbia River DPS)")

cat("colnames: ", colnames(df), "\n")
colnames:  species esu_dps majorpopgroup esapopname commonpopname run spawningyear value value_type 
Code
unique(df$esapopname)
 [1] "Steelhead (Lower Columbia River DPS) Kalama River - summer"           
 [2] "Steelhead (Lower Columbia River DPS) Coweeman River - winter"         
 [3] "Steelhead (Lower Columbia River DPS) East Fork Lewis River - winter"  
 [4] "Steelhead (Lower Columbia River DPS) Kalama River - winter"           
 [5] "Steelhead (Lower Columbia River DPS) Lower Cowlitz River - winter"    
 [6] "Steelhead (Lower Columbia River DPS) South Fork Toutle River - winter"
 [7] "Steelhead (Lower Columbia River DPS) Tilton River - winter"           
 [8] "Steelhead (Lower Columbia River DPS) Upper Cowlitz River - winter"    
 [9] "Steelhead (Lower Columbia River DPS) Washougal River - winter"        
[10] "Steelhead (Lower Columbia River DPS) Upper Gorge Tributaries - winter"
Code
unique(df$commonpopname)
 [1] "Kalama Summer Steelhead"                  
 [2] "Coweeman Winter Steelhead"                
 [3] "East Fork Lewis Winter Steelhead"         
 [4] "Kalama Winter Steelhead"                  
 [5] "Lower Cowlitz Winter Steelhead"           
 [6] "South Fork Toutle Winter Steelhead"       
 [7] "Tilton Winter Steelhead"                  
 [8] "Upper Cowlitz and Cispus Winter Steelhead"
 [9] "Washougal Winter Steelhead"               
[10] "Upper Gorge (Columbia) Winter Steelhead"  
Code
df$esapopname2 <- stringr::str_replace(df$esapopname, "Steelhead [(]Lower Columbia River DPS[)] ", "")
unique(df$esapopname2)
 [1] "Kalama River - summer"            "Coweeman River - winter"         
 [3] "East Fork Lewis River - winter"   "Kalama River - winter"           
 [5] "Lower Cowlitz River - winter"     "South Fork Toutle River - winter"
 [7] "Tilton River - winter"            "Upper Cowlitz River - winter"    
 [9] "Washougal River - winter"         "Upper Gorge Tributaries - winter"
Code
ggplot(df, aes(x=spawningyear, y=log(value), color=majorpopgroup)) + 
  geom_point(size=2, na.rm = TRUE) + 
  theme(strip.text.x = element_text(size = 3)) +
  theme(axis.text.x = element_text(size = 5, angle = 90)) +
  facet_wrap(~esapopname2) +
  ggtitle('Lower Columbia Steelhead Populations')+
  theme(strip.text.x = element_text(size = 10))

Wrangle the data.

Code
esuname <- esu[3]
dat <- columbia.river %>% 
  subset(esu_dps == esuname) %>% # get only this ESU
  mutate(log.spawner = log(value)) %>% # create a column called log.spawner
  select(esapopname, spawningyear, log.spawner) %>% # get just the columns that I need
  pivot_wider(names_from = "esapopname", values_from = "log.spawner") %>% 
  column_to_rownames(var = "spawningyear") %>% # make the years rownames
  as.matrix() %>% # turn into a matrix with year down the rows
  t() # make time across the columns
dat[is.na(dat)] <- NA

## Fixing row names
tmp <- rownames(dat)
tmp <- stringr::str_replace(tmp, "Steelhead [(]Lower Columbia River DPS[)]", "")
tmp <- make_clean_names(tmp)
rownames(dat) <- tmp

dat <- dat[1:9,]# remove upper gorge
dat <- dat[ order(row.names(dat)), ]## sort

Results

The simplest model

Fitting the most simple model to help determine the need to group populations: One big population with different observations (populations).

Code
mod.list.0 <- list(B = matrix(1), U = matrix("u"), Q = matrix("q"), 
    Z = matrix(1, 9, 1), A = "scaling", R = "diagonal and equal", 
    x0 = matrix("mu"), tinitx = 0)
fit_simple <- MARSS(dat, model=mod.list.0, control=list(maxit=1000))
Success! abstol and log-log tests passed at 38 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 38 iterations. 
Log-likelihood: -247.9172 
AIC: 519.8343   AICc: 521.1792   
 
                                 Estimate
A.east_fork_lewis_river_winter   -0.51303
A.kalama_river_summer             0.58517
A.kalama_river_winter             0.82974
A.lower_cowlitz_river_winter      0.28048
A.south_fork_toutle_river_winter  0.33119
A.tilton_river_winter            -0.21650
A.upper_cowlitz_river_winter      0.21896
A.washougal_river_winter         -0.38030
R.diag                            0.38503
U.u                              -0.00853
Q.q                               0.04702
x0.mu                             5.81906
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
plot(fit_simple, plot.type="fitted.ytT")

plot type =  fitted.ytT  Observations with fitted values

We see that the simple model is not the best at fitting the population estimates for many of the populations (e.g., Upper Cowlitz winter, East Fork Lewis winter). Many of the residuals do not appear to be stationary.

The most flexible model

Fitting the most flexible model to help determine spatial Z hypotheses: Individual populations with an unconstrained Q matrix and unequal U

Code
mod.list1 <- list(
  U = "unequal",
  R = "diagonal and equal",
  Q = "unconstrained"
)
fit1 <- MARSS(dat, model=mod.list1, control=list(maxit=800))
Success! abstol and log-log tests passed at 792 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 792 iterations. 
Log-likelihood: -162.2543 
AIC: 452.5085   AICc: 498.7308   
 
                                    Estimate
R.diag                               0.07004
U.X.coweeman_river_winter            0.05643
U.X.east_fork_lewis_river_winter     0.06603
U.X.kalama_river_summer              0.05382
U.X.kalama_river_winter              0.01357
U.X.lower_cowlitz_river_winter       0.05480
U.X.south_fork_toutle_river_winter   0.05081
U.X.tilton_river_winter              0.03231
U.X.upper_cowlitz_river_winter       0.10266
U.X.washougal_river_winter           0.08353
Q.(1,1)                              0.18254
Q.(2,1)                              0.20888
Q.(3,1)                             -0.06669
Q.(4,1)                             -0.01073
Q.(5,1)                              0.07385
Q.(6,1)                              0.21461
Q.(7,1)                              0.01004
Q.(8,1)                             -0.00336
Q.(9,1)                              0.20057
Q.(2,2)                              0.30487
Q.(3,2)                             -0.16277
Q.(4,2)                             -0.03323
Q.(5,2)                              0.18199
Q.(6,2)                              0.23406
Q.(7,2)                              0.13516
Q.(8,2)                              0.10168
Q.(9,2)                              0.23761
Q.(3,3)                              0.21954
Q.(4,3)                              0.11641
Q.(5,3)                             -0.08252
Q.(6,3)                             -0.01588
Q.(7,3)                             -0.04973
Q.(8,3)                              0.00917
Q.(9,3)                             -0.04761
Q.(4,4)                              0.09583
Q.(5,4)                              0.04185
Q.(6,4)                              0.04127
Q.(7,4)                              0.08378
Q.(8,4)                              0.11758
Q.(9,4)                              0.02286
Q.(5,5)                              0.28695
Q.(6,5)                              0.13628
Q.(7,5)                              0.36189
Q.(8,5)                              0.24063
Q.(9,5)                              0.11017
Q.(6,6)                              0.29428
Q.(7,6)                              0.09380
Q.(8,6)                              0.04003
Q.(9,6)                              0.24779
Q.(7,7)                              0.50421
Q.(8,7)                              0.34139
Q.(9,7)                              0.05553
Q.(8,8)                              0.47369
Q.(9,8)                              0.08841
Q.(9,9)                              0.24231
x0.X.coweeman_river_winter           3.59695
x0.X.east_fork_lewis_river_winter    2.23471
x0.X.kalama_river_summer             6.15197
x0.X.kalama_river_winter             6.56522
x0.X.lower_cowlitz_river_winter      2.71630
x0.X.south_fork_toutle_river_winter  3.85807
x0.X.tilton_river_winter             2.83736
x0.X.upper_cowlitz_river_winter      1.29622
x0.X.washougal_river_winter          2.30745
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
plot(fit1, plot.type="fitted.ytT")
MARSSresiduals.tT reported warnings. See msg element or attribute of returned residuals object.

plot type =  fitted.ytT  Observations with fitted values
Code
library(corrplot)
corrplot 0.92 loaded
Code
Q <- coef(fit1, type="matrix")$Q
corrmat <- diag(1/sqrt(diag(Q))) %*% Q %*% diag(1/sqrt(diag(Q)))
corrplot(corrmat)

Some of the populations are correlated. This implies it would be reasonable to test for equal variance covariance, along with some other population structures.

We opted to test this in a for loop to compare multiple spatial structures, u options, and two options for Q.

Model Assumptions

Below we setting up population grouping hypotheses (Z matrices), correlation structures, drift or “bias” terms, and the fixed model list.

Code
# Here we evaluated the data support for the following hypotheses about Lower Columbia salmon river trends
# Each Z model is a hypothesis

## Simple run timing groupings
# summer = kalama_river_summer
# winter = remaining population

## Simple spatial groupings
# coweeman = coweeman and sf toutle
# ef_lewis = east fork lewis
# kalama = kalama summer and winter
# cowlitz = lower cowlitz, tilton, and upper cowlitz
# washougal = washougal

## Correlation spatial groupings
# coweeman = coweema, ef_lewis, sf toutle, washougal
# kalama = kalama summer and winter
# cowlitz = lower cowlitz, tilton, and upper cowlitz

Z.models <- list(
  H1 = matrix(1,9,1), #one hidden population state
  H2 = factor(c("coweeman","ef_lewis", "kalama_sum",
                'kalama_win', "low_cowlitz", "sf_toutle",
                "tilton", "up_cowlitz", "washougal")), #states are defined by individual population
  H3 = factor(c("winter","winter", "summer",
                'winter', "winter", "winter",
                "winter", "winter", "winter")),# states defined by running timing grouping (n = 2)
  H4 = factor(c("coweeman","ef_lewis", "kalama",
                'kalama', "cowlitz", "coweeman",
                "cowlitz", "cowlitz", "washougal")),# states defined by spatial grouping (n = 5)
  H5 = factor(c("coweeman","coweeman", "kalama",
                'kalama', "cowlitz", "coweeman",
                "cowlitz", "cowlitz", "coweeman"))# states defined by correlation (n = 3)
)

names(Z.models) <- c("one_population","indiv_population","run_timing_groups",'spatial_groups',"corr_groups")

# Also testing different process error varcovar matrices
Q.models2 <- c("equalvarcov","unconstrained")

# Bias terms
u2 <- c("unequal", 'equal')

# Setting fixed portion of mod list
mod.list = list(
  A = "scaling",
  R = "diagonal and equal")

MARSS Model Selection

Run MARSS models.

Code
out.tab <- NULL
fits <- list()
for(i in 1:length(Z.models)){
    for(Q.model in Q.models2){
      for(U.model in u2){
      fit.model = c(list(Z=Z.models[[i]], Q=Q.model, U=U.model), mod.list)
      fit = MARSS(dat, model=fit.model,
                silent=TRUE, control=list(maxit=3000))
      out=data.frame(H=names(Z.models)[i],Q=Q.model,U=U.model,
                   logLik=fit$logLik, AICc=fit$AICc, num.param=fit$num.params,
                   m=length(unique(Z.models[[i]])),
                   num.iter=fit$numIter, converged=!fit$convergence,
                   stringsAsFactors = FALSE)
      out.tab=rbind(out.tab,out)
      fits=c(fits,list(fit))
      } 
    }
  }

min.AICc <- order(out.tab$AICc)
out.tab.1 <- out.tab[min.AICc, ]
out.tab.1 <- cbind(out.tab.1, delta.AICc = out.tab.1$AICc - out.tab.1$AICc[1])
out.tab.1 <- cbind(out.tab.1, rel.like = exp(-1 * out.tab.1$delta.AICc/2))
out.tab.1 <- cbind(out.tab.1, AIC.weight = out.tab.1$rel.like/sum(out.tab.1$rel.like))
out.tab.1
                   H             Q       U    logLik     AICc num.param m
16    spatial_groups unconstrained   equal -200.3077 459.0557        26 5
15    spatial_groups unconstrained unequal -199.6277 467.9469        30 5
14    spatial_groups   equalvarcov   equal -221.5804 470.7365        13 5
6   indiv_population   equalvarcov   equal -223.3296 474.2350        13 9
13    spatial_groups   equalvarcov unequal -221.2383 479.1726        17 5
8   indiv_population unconstrained   equal -167.4589 480.8753        56 9
5   indiv_population   equalvarcov unequal -221.2639 488.6713        21 9
18       corr_groups   equalvarcov   equal -233.1729 493.9216        13 3
20       corr_groups unconstrained   equal -229.2451 495.1862        17 3
17       corr_groups   equalvarcov unequal -232.9357 497.9674        15 3
7   indiv_population unconstrained unequal -162.2543 498.7308        64 9
19       corr_groups unconstrained unequal -229.1000 499.5778        19 3
10 run_timing_groups   equalvarcov   equal -244.9572 517.4901        13 2
12 run_timing_groups unconstrained   equal -244.8224 519.4708        14 2
9  run_timing_groups   equalvarcov unequal -244.8625 519.5512        14 2
1     one_population   equalvarcov unequal -247.9172 521.1792        12 1
2     one_population   equalvarcov   equal -247.9172 521.1792        12 1
3     one_population unconstrained unequal -247.9172 521.1792        12 1
4     one_population unconstrained   equal -247.9172 521.1792        12 1
11 run_timing_groups unconstrained unequal -244.7254 521.5469        15 2
   num.iter converged delta.AICc     rel.like   AIC.weight
16      378      TRUE   0.000000 1.000000e+00 9.850231e-01
15      374      TRUE   8.891254 1.172975e-02 1.155407e-02
14       63      TRUE  11.680779 2.907709e-03 2.864161e-03
6        77      TRUE  15.179270 5.056657e-04 4.980924e-04
13       75      TRUE  20.116881 4.282276e-05 4.218141e-05
8       797      TRUE  21.819604 1.827819e-05 1.800444e-05
5       683      TRUE  29.615606 3.707258e-07 3.651735e-07
18       69      TRUE  34.865945 2.685073e-08 2.644859e-08
20       60      TRUE  36.130531 1.426773e-08 1.405405e-08
17       87      TRUE  38.911736 3.551599e-09 3.498407e-09
7       792      TRUE  39.675076 2.424749e-09 2.388433e-09
19       60      TRUE  40.522114 1.587577e-09 1.563800e-09
10       31      TRUE  58.434441 2.047021e-13 2.016363e-13
12       57      TRUE  60.415106 7.603724e-14 7.489844e-14
9        31      TRUE  60.495478 7.304222e-14 7.194827e-14
1        38      TRUE  62.123467 3.236387e-14 3.187916e-14
2        38      TRUE  62.123467 3.236387e-14 3.187916e-14
3        38      TRUE  62.123467 3.236387e-14 3.187916e-14
4        38      TRUE  62.123467 3.236387e-14 3.187916e-14
11       57      TRUE  62.491240 2.692773e-14 2.652443e-14

The best model (as supported by AICc) includes 5 states (our “spatial populations” model) that are correlated with unique variance and covariance terms (unconstrained Q) with the same drift term (U)

Inspecting the best model (spatial group, unconstrained Q, equal U)

Code
best_mod <- fits[[16]]
best_mod

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 378 iterations. 
Log-likelihood: -200.3077 
AIC: 452.6153   AICc: 459.0557   
 
                                 Estimate
A.kalama_river_winter             0.24416
A.south_fork_toutle_river_winter  0.38706
A.tilton_river_winter            -0.75459
A.upper_cowlitz_river_winter     -0.19777
R.diag                            0.15851
U.1                               0.03097
Q.(1,1)                           0.22524
Q.(2,1)                           0.17195
Q.(3,1)                          -0.01570
Q.(4,1)                           0.02272
Q.(5,1)                           0.17217
Q.(2,2)                           0.19999
Q.(3,2)                          -0.07064
Q.(4,2)                           0.12119
Q.(5,2)                           0.16006
Q.(3,3)                           0.09252
Q.(4,3)                           0.00657
Q.(5,3)                          -0.01488
Q.(4,4)                           0.38669
Q.(5,4)                           0.11117
Q.(5,5)                           0.15486
x0.coweeman                       3.81220
x0.ef_lewis                       2.92567
x0.kalama                         6.30723
x0.cowlitz                        2.50039
x0.washougal                      3.25935
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
plot(best_mod, plot.type = "fitted.ytT")
MARSSresiduals.tT reported warnings. See msg element or attribute of returned residuals object.

plot type =  fitted.ytT  Observations with fitted values
Code
plot(best_mod, plot.type = "acf.std.model.resids.ytt1")
MARSSresiduals.tt1 reported warnings. See msg element of returned residuals object.

plot type = acf.std.model.resids.ytt1

In our best model, we do see some autocorrelation in our residuals. In the next section, we will try to include a covariate of season into our best model and see if this improves the model fit.

Comparing best model to the most flexible model

Code
plot(fit1, plot.type="fitted.ytT")#flexible
MARSSresiduals.tT reported warnings. See msg element or attribute of returned residuals object.

plot type =  fitted.ytT  Observations with fitted values
Code
plot(best_mod, plot.type="fitted.ytT")#best
MARSSresiduals.tT reported warnings. See msg element or attribute of returned residuals object.

plot type =  fitted.ytT  Observations with fitted values

Do our estimates differ depending on the assumptions of the structure of the data, i.e. our assumptions about the x’s, Q, and U?

Definitely! The more flexible model (individual states, unequal bias term) produces more variable forecasts (with some populations increasing or decreasing through time). Our spatial model (5 states, equal bias term) provides relatively stable forecasts across populations, likely due to the “equal” bias term.

Seasonality as a covariate

In this next section, we test for age structured cycling of steelhead populations or “season” in our best “spatial” model.

Initial model fitting

Code
Z.mat <-  factor(c("coweeman","ef_lewis", "kalama",
                'kalama', "cowlitz", "coweeman",
                "cowlitz", "cowlitz", "washougal")) #Our best model Z matrix

# Setting up 3-5 year cycles as a covariate
TT <- dim(dat)[2]
covariates <- rbind(
  forecast::fourier(ts(1:TT, freq=3), K=1) |> t(),
  forecast::fourier(ts(1:TT, freq=4), K=1) |> t(),
  forecast::fourier(ts(1:TT, freq=5), K=1) |> t()
)

mod.list2 = list(
  U = "equal",
  Q = "unconstrained", 
  Z = Z.mat,
  A = "scaling",
  R = "diagonal and equal",
  D = "unconstrained", # We include these in the D matrix because the age structure cycling of salmon typically affects our observations rather than the state
  d = covariates)

fit_season <- MARSS(dat, model=mod.list2)
Success! abstol and log-log tests passed at 393 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 393 iterations. 
Log-likelihood: -171.8029 
AIC: 503.6058   AICc: 582.6302   
 
                                         Estimate
A.kalama_river_winter                    0.252970
A.south_fork_toutle_river_winter         0.343256
A.tilton_river_winter                   -0.903094
A.upper_cowlitz_river_winter            -0.320282
R.diag                                   0.122966
U.1                                      0.031019
Q.(1,1)                                  0.151655
Q.(2,1)                                  0.103304
Q.(3,1)                                 -0.006630
Q.(4,1)                                  0.048822
Q.(5,1)                                  0.076574
Q.(2,2)                                  0.137173
Q.(3,2)                                 -0.063070
Q.(4,2)                                  0.146876
Q.(5,2)                                  0.073604
Q.(3,3)                                  0.094141
Q.(4,3)                                 -0.012518
Q.(5,3)                                  0.003246
Q.(4,4)                                  0.396928
Q.(5,4)                                  0.114513
Q.(5,5)                                  0.060789
x0.coweeman                              4.260305
x0.ef_lewis                              3.498555
x0.kalama                                6.221230
x0.cowlitz                               2.861678
x0.washougal                             3.978254
D.(coweeman_river_winter,S1-3)           0.186021
D.(east_fork_lewis_river_winter,S1-3)    0.157392
D.(kalama_river_summer,S1-3)            -0.072940
D.(kalama_river_winter,S1-3)             0.022498
D.(lower_cowlitz_river_winter,S1-3)     -0.184966
D.(south_fork_toutle_river_winter,S1-3)  0.144443
D.(tilton_river_winter,S1-3)            -0.052278
D.(upper_cowlitz_river_winter,S1-3)     -0.160372
D.(washougal_river_winter,S1-3)          0.197520
D.(coweeman_river_winter,C1-3)          -0.044046
D.(east_fork_lewis_river_winter,C1-3)    0.089743
D.(kalama_river_summer,C1-3)            -0.000429
D.(kalama_river_winter,C1-3)            -0.021383
D.(lower_cowlitz_river_winter,C1-3)     -0.275540
D.(south_fork_toutle_river_winter,C1-3)  0.127018
D.(tilton_river_winter,C1-3)             0.088516
D.(upper_cowlitz_river_winter,C1-3)      0.266990
D.(washougal_river_winter,C1-3)          0.040757
D.(coweeman_river_winter,S1-4)          -0.117197
D.(east_fork_lewis_river_winter,S1-4)   -0.040715
D.(kalama_river_summer,S1-4)            -0.022386
D.(kalama_river_winter,S1-4)             0.071430
D.(lower_cowlitz_river_winter,S1-4)     -0.451392
D.(south_fork_toutle_river_winter,S1-4) -0.068067
D.(tilton_river_winter,S1-4)             0.122578
D.(upper_cowlitz_river_winter,S1-4)      0.059695
D.(washougal_river_winter,S1-4)         -0.219913
D.(coweeman_river_winter,C1-4)           0.124410
D.(east_fork_lewis_river_winter,C1-4)    0.127488
D.(kalama_river_summer,C1-4)            -0.066523
D.(kalama_river_winter,C1-4)             0.155787
D.(lower_cowlitz_river_winter,C1-4)     -0.246952
D.(south_fork_toutle_river_winter,C1-4)  0.068655
D.(tilton_river_winter,C1-4)             0.125492
D.(upper_cowlitz_river_winter,C1-4)      0.103936
D.(washougal_river_winter,C1-4)          0.195852
D.(coweeman_river_winter,S1-5)          -0.163883
D.(east_fork_lewis_river_winter,S1-5)   -0.082063
D.(kalama_river_summer,S1-5)             0.128463
D.(kalama_river_winter,S1-5)             0.068320
D.(lower_cowlitz_river_winter,S1-5)      0.409315
D.(south_fork_toutle_river_winter,S1-5) -0.009286
D.(tilton_river_winter,S1-5)            -0.386147
D.(upper_cowlitz_river_winter,S1-5)      0.087418
D.(washougal_river_winter,S1-5)         -0.069609
D.(coweeman_river_winter,C1-5)          -0.120105
D.(east_fork_lewis_river_winter,C1-5)   -0.286216
D.(kalama_river_summer,C1-5)             0.054608
D.(kalama_river_winter,C1-5)             0.173239
D.(lower_cowlitz_river_winter,C1-5)      0.255956
D.(south_fork_toutle_river_winter,C1-5) -0.358093
D.(tilton_river_winter,C1-5)            -0.154457
D.(upper_cowlitz_river_winter,C1-5)     -0.001424
D.(washougal_river_winter,C1-5)         -0.363120
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

Evaluating the effect of different cycles

We can then plot the effects of each cycle estimate.

Code
df <- tidy(fit_season) %>%
  subset(stringr::str_sub(term,1,1)=="D")
df$lag <- as.factor(rep(3:5, each=18))
df$river <- as.factor(rep(rownames(dat),3))
df$sc <- rep(rep(c("S","C"), each=9), 3)
df$type <- paste0(df$sc,df$lag)

ggplot(df, aes(x=type, y=estimate, col=lag)) + 
  geom_point() +
  geom_errorbar(aes(ymin=conf.low, ymax=conf.up), width=.2, position=position_dodge(.9)) +
  geom_hline(yintercept = 0) +
  facet_wrap(~river) +
  ggtitle("The cycle estimates with CIs")

These estimates suggest that there is some evidence of cycling at the 5 year interval in the East Fork Lewis, the South Fork Toutle, the Tilton, and the Washougal River steelhead populations.

This, however, increases our AICc quite a bit, from 459.0557 in our original best model to 582.6302. Now let’s try just including 5 year cycles and compare this AICc to our best model.

Comparing models with and without 5 year cycles

Code
mod.list3 = list(
  U = "equal",
  Q = "unconstrained", 
  Z = Z.mat,
  A = "scaling",
  R = "diagonal and equal",
  D = "unconstrained", # We include these in the D matrix because the age structure cycling of salmon typically affects our observations rather than the state
  d = covariates[5:6,]) # Specifying just 5 year cycles 


fit_5season <- MARSS(dat, model=mod.list3)
Success! abstol and log-log tests passed at 386 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.

MARSS fit is
Estimation method: kem 
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 386 iterations. 
Log-likelihood: -187.2444 
AIC: 462.4888   AICc: 482.2888   
 
                                        Estimate
A.kalama_river_winter                    0.24684
A.south_fork_toutle_river_winter         0.36821
A.tilton_river_winter                   -0.78743
A.upper_cowlitz_river_winter            -0.19833
R.diag                                   0.14171
U.1                                      0.03266
Q.(1,1)                                  0.18084
Q.(2,1)                                  0.12142
Q.(3,1)                                 -0.00192
Q.(4,1)                                  0.00550
Q.(5,1)                                  0.11653
Q.(2,2)                                  0.14378
Q.(3,2)                                 -0.05419
Q.(4,2)                                  0.11406
Q.(5,2)                                  0.10159
Q.(3,3)                                  0.08860
Q.(4,3)                                  0.00342
Q.(5,3)                                  0.00379
Q.(4,4)                                  0.41381
Q.(5,4)                                  0.10054
Q.(5,5)                                  0.09813
x0.coweeman                              4.00994
x0.ef_lewis                              3.25543
x0.kalama                                6.23754
x0.cowlitz                               2.83372
x0.washougal                             3.65518
D.(coweeman_river_winter,S1-5)          -0.12553
D.(east_fork_lewis_river_winter,S1-5)   -0.01908
D.(kalama_river_summer,S1-5)             0.12381
D.(kalama_river_winter,S1-5)             0.06702
D.(lower_cowlitz_river_winter,S1-5)      0.01015
D.(south_fork_toutle_river_winter,S1-5)  0.02201
D.(tilton_river_winter,S1-5)            -0.40277
D.(upper_cowlitz_river_winter,S1-5)      0.07656
D.(washougal_river_winter,S1-5)         -0.04601
D.(coweeman_river_winter,C1-5)          -0.12431
D.(east_fork_lewis_river_winter,C1-5)   -0.27756
D.(kalama_river_summer,C1-5)             0.05680
D.(kalama_river_winter,C1-5)             0.16503
D.(lower_cowlitz_river_winter,C1-5)     -0.10268
D.(south_fork_toutle_river_winter,C1-5) -0.34589
D.(tilton_river_winter,C1-5)            -0.09048
D.(upper_cowlitz_river_winter,C1-5)      0.06995
D.(washougal_river_winter,C1-5)         -0.30581
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
(mod_AICc <- c(fit_5season$AICc, best_mod$AICc))
[1] 482.2888 459.0557

Interesting– This greatly improves our AICc, but the model without cycles still test as the best model.

Next we compare the fits of the best spatial model and the spatial model with cycling.

Code
plot(best_mod, plot.type="fitted.ytT")
MARSSresiduals.tT reported warnings. See msg element or attribute of returned residuals object.

plot type =  fitted.ytT  Observations with fitted values
Code
plot(fit_5season, plot.type="fitted.ytT")
MARSSresiduals.tT reported warnings. See msg element or attribute of returned residuals object.

plot type =  fitted.ytT  Observations with fitted values

Adding a 5 year cycling component does not have significant changes to the fit during years of data, but it does have a big impact on the hindcast and forecast fits.

Next we compare the residuals of our best model and our model with 5 year cycles.

Code
# Residuals of best model
par(mfrow = c(3, 3))
resids <- MARSSresiduals(best_mod, type = "tt1")
MARSSresiduals.tt1 reported warnings. See msg element of returned residuals object.
Code
for (i in 1:9) {
    plot(resids$model.residuals[i, ], ylab = "model residuals", 
        xlab = "")
    abline(h = 0)
    title(rownames(dat)[i])
}

Code
# Residuals of best model + cycling
par(mfrow = c(3, 3))
resids <- MARSSresiduals(fit_5season, type = "tt1")
MARSSresiduals.tt1 reported warnings. See msg element of returned residuals object.
Code
for (i in 1:9) {
    plot(resids$model.residuals[i, ], ylab = "model residuals", 
        xlab = "")
    abline(h = 0)
    title(rownames(dat)[i])
}

Code
# ACF of residuals
plot(best_mod, plot.type = "acf.std.model.resids.ytt1")
MARSSresiduals.tt1 reported warnings. See msg element of returned residuals object.

plot type = acf.std.model.resids.ytt1
Code
plot(fit_5season, plot.type = "acf.std.model.resids.ytt1")
MARSSresiduals.tt1 reported warnings. See msg element of returned residuals object.

plot type = acf.std.model.resids.ytt1

When comparing the residual and ACF plots of the spatial model w/o cycling and the spatial model w/ cycling, we do see some of the residual and ACF plots improve. For example, around lag 10 in the Kalama River winter population, the autocorrelation reduces relative the spatial model w/o cycling. The residual plots also remove some of the cycling patterns in a few groups (e.g., Tilton River, South Fork Toutle). However, the cyclic covariates in the process does not completely remove the residuals’ cycling patterns; positive values tend to follow positive ones.

Discussion

Our best model was our “spatial” group model, which was designed based on the spatial proximity of the subwatersheds of study and their steelhead populations. This grouped 9 steelhead populations together to form 5 populations or 5 “hidden states” of steelhead abundance dynamics through time. This implies that steelhead populations located closer in space or within the same catchment share temporal variation in their abundance dynamics, suggesting that populations are largely influenced by similar river or watershed characteristics (e.g., habitat quality or availability) and not so much run timing. All run populations seem to have increased since the earliest year of data, though some populations seem to have decreased slightly in recent years (e.g., Washougal, Cowlitz, East Fork Lewis). Our best performing model seems to fit the data well with no major problems in the residuals, with the exception of cycling.

We found some evidence for 5 year cycles, though it did not lower the AIC relative to the spatial model. This makes sense since steelhead (relative to other salmonids) do not exhibit as much cycling. Even within these 9 runs, we see some runs (e.g. Kalama winter) show more cycling patterns than others (e.g. Upper Cowlitz winter) do. A reason why AIC does not support adding 5 year cycles to all runs is that not all runs need cycling and the addition of parameters for these runs is superfluous. Cycling did somewhat improve autocorrelation in our model residuals by reducing the number of statistically significant ACF lags.

Future improvements to our best performing spatial model would be to include cycling in only populations that strong seasonal patterns. This, however, involves “hand-baking” a series of upper C matrices and was outside the scope of this study.

Description of each team member’s contributions

Liz worked on the model with 9 states, the spatial model, correlation model, model selection process, and cycling model. Terrance worked on the methods, matrix notations, simple model with 1 state, and run timing model. Eric worked on adding the cycling component and model selection of cycling period. All group members worked on the discussion and interpretation.