Dynamic Factor Analysis (DFA)

Author

Nick Chambers, Terrance Wang, Zoe Rand

Published

May 2, 2023


Code
library(tidyverse)
library(zoo)
get_DFA_fits <- function(MLEobj, dd = NULL, alpha = 0.05) {
    ## empty list for results
    fits <- list()
    ## extra stuff for var() calcs
    Ey <- MARSS:::MARSShatyt(MLEobj)
    ## model params
    ZZ <- coef(MLEobj, type = "matrix")$Z
    ## number of obs ts
    nn <- dim(Ey$ytT)[1]
    ## number of time steps
    TT <- dim(Ey$ytT)[2]
    ## get the inverse of the rotation matrix
    H_inv <- varimax(ZZ)$rotmat
    ## check for covars
    if (!is.null(dd)) {
        DD <- coef(MLEobj, type = "matrix")$D
        ## model expectation
        fits$ex <- ZZ %*% H_inv %*% MLEobj$states + DD %*% dd
    } else {
        ## model expectation
        fits$ex <- ZZ %*% H_inv %*% MLEobj$states
    }
    ## Var in model fits
    VtT <- MARSSkfss(MLEobj)$VtT
    VV <- NULL
    for (tt in 1:TT) {
        RZVZ <- coef(MLEobj, type = "matrix")$R - ZZ %*% VtT[, 
            , tt] %*% t(ZZ)
        SS <- Ey$yxtT[, , tt] - Ey$ytT[, tt, drop = FALSE] %*% 
            t(MLEobj$states[, tt, drop = FALSE])
        VV <- cbind(VV, diag(RZVZ + SS %*% t(ZZ) + ZZ %*% t(SS)))
    }
    SE <- sqrt(VV)
    ## upper & lower (1-alpha)% CI
    fits$up <- qnorm(1 - alpha/2) * SE + fits$ex
    fits$lo <- qnorm(alpha/2) * SE + fits$ex
    return(fits)
}

Data

We decided to model 4 groups of zooplankton: Cyclops, Diaptomus, Epischura, and Neomysis. We chose to test environmental covariates (temp, pH, TP), as well as phytoplankton abundances as covariates(cryptomonas, diatoms, greens, unicells, other.algae) since the zooplankton we chose are copepods and mysids, and therefore may be impacted by the type of phytoplankton present. We used data from the start of the time series (1962) until 1985, because the Neomysis data gets very patchy after the 1980s and we didn’t want this to cause issues with model fitting.

Load the data

Code
## load MARSS for data and analyses
library(MARSS)

## load the raw data (there are 3 datasets contained here)
data(lakeWAplankton, package = "MARSS")

## we want `lakeWAplanktonTrans`, which has been transformed
## so the 0's are replaced with NA's and the data z-scored
all_dat <- lakeWAplanktonTrans

Wrangle the data

  • You can choose any taxa to include in your analyses, but make sure there are at least 5 groups. You can mix phytoplankton and zooplankton, if you’d like, but justify any choices you make.

  • You will want to subset the data to a time period with adequate observations for the taxa of interest. Your time window should include at lest 5 years (60 months).

Zooplankton

Code
## plotting the taxa of interest
all_dat %>% as_tibble() %>%
  mutate(Date = as.yearmon(paste(Year, Month), "%Y %m")) %>%
  select(c("Date", "Cyclops", "Diaptomus", "Epischura", "Neomysis")) %>% 
  pivot_longer(-c(Date), names_to = "Species", values_to = "Vals") %>%
  ggplot(aes(x = Date, y = Vals, color = Species)) + geom_point() + 
  geom_line() + facet_wrap(~Species) + scale_x_yearmon(format = "%Y") + 
  theme_minimal() + labs(y = "Abundance Index", x = "Year") + theme(legend.position =  "none")

Code
#using data up to 1985
spp<-c("Cyclops", "Diaptomus", "Epischura", "Neomysis")
dat_1985<-all_dat[all_dat[, "Year"] <= 1985,]
dat_zoo_1985<-dat_1985[, spp]
#head(dat_zoo_1985)
dat_zoo<-t(dat_zoo_1985)

yr_frst <- min(dat_1985[,"Year"])
yr_last <- max(dat_1985[,"Year"])
Code
## get number of time series
N_ts <- dim(dat_zoo)[1]

## get length of time series
TT <- dim(dat_zoo)[2] 

## mean of each taxon
y_bar <- apply(dat_zoo, 1, mean, na.rm = TRUE)

## subtract the means
dat <- dat_zoo - y_bar

## assign new row names
rownames(dat) <- spp

Covariates

Code
#temp, pH, TP, cryptomonas, diatoms, greens, unicells, other.algae 
covs<-colnames(dat_1985)[c(3:8,10:11)]
cov_dat<-t(dat_1985[,covs])
#head(cov_dat)
#dummy seasonal affect
cos_t <- cos(2 * pi * seq(TT) / 12)
sin_t <- sin(2 * pi * seq(TT) / 12)
dd <- rbind(cos_t, sin_t)
Code
#looking at any correlation issues for covariates
pairs(cbind(dat_1985[,covs],t(dd)))

Code
library(corrplot)
corrplot 0.92 loaded
Code
corrplot(cor(cbind(dat_1985[,covs],t(dd)), use = "pairwise.complete.obs"), addCoef.col = "blue")

estimating missing values

Code
# Best way is in Ch13 to fill out NA, https://cran.r-project.org/web/packages/MARSS/vignettes/UserGuide.pdf
# Easy way is just to fill in with mean for purposes of this lab
cov_df <- cbind(dat_1985[,covs],t(dd)) %>% as.data.frame()  %>%
  # replace na values with simply mean
         mutate_all(~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))  

General tasks

Each group has the same general tasks, but you will adapt them as you work on the data.

  1. Find the most parsimonious model among a set that examines the effects of environmental covariates and/or an indicator of seasonality, varying numbers of trends, and different forms of variance-covariance matrices for the observation errors.

  2. Plot trends and individual loadings for the model identified in task (1) above.

  3. Plot model fits and uncertainty intervals for the model identified in task (1) above.

  4. Describe the effects of environmental or dummy variables on (possibly seasonal) patterns in the data.

Methods

Please address the following in your methods:

  • Which plankton taxa did you choose and how did you choose them?

  • What time period(s) did you examine and why?

  • What environmental or dummy variables did you include and why?

  • What forms of models did you fit (ie, write them out in matrix form)?

  • What sort of model diagnostics did you use to examine model assumptions?

Data:

We chose to model four zooplankton taxa :Cyclops, Diaptomus, Epischura, and Neomysis. We chose these because they are similar types of zooplankton (copepods and mysids) which are known to be selective about the types of phytoplankton they consume (Hansen, Bjørnsen, and Hansen 1997). We used data from 1962-1985, because there is very little Neomysis data after 1985. We tested environmental covariates, including temperature, pH, and phosphorus concentrations as well as a seasonal dummy variable. We were also interested in the affect of phytoplankton abundance on these species, so we tested abundance indexes of diatoms, greens, unicells, and other.algae as potential covariates as well.

Model:

Observation Model

\[ \begin{bmatrix} y_1\\ y_2\\ y_3\\ y_4\\ y_5\\ \end{bmatrix}_t= \begin{bmatrix} Z_{11} & Z_{12} & Z_{13}\\ Z_{21} & Z_{22} & Z_{23}\\ Z_{31} & Z_{32} & Z_{33}\\ Z_{41} & Z_{42} & Z_{43}\\ Z_{51} & Z_{52} & Z_{53}\\ \end{bmatrix}* \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix}_t+ \begin{bmatrix} a_{1} \\ a_{2}\\ a_{3}\\ a_{4}\\ a_{5}\\ \end{bmatrix}* \begin{bmatrix} d_{Temp}\\ d_{TP}\\ d_{pH}\\ d_{cos}\\ d_{sin}\\ \end{bmatrix}_t+ \begin{bmatrix} w_1\\ w_2\\ w_3\\ w_4\\ w_5\\ \end{bmatrix}_t \]

\[ \text{Where }w_i \sim MVN \begin{pmatrix} \text{0,}\begin{bmatrix} R&0&0&0&0\\ 0&R&0&0&0\\ 0&0&R&0&0\\ 0&0&0&R&0\\ 0&0&0&0&R\\ \end{bmatrix} \end{pmatrix} \]

ZZ is loadings matrix

Code
ZZ = NULL
# Define different ZZ hypotheses 
# 1 state
Z_vals <- list("z11",
               "z21", 
               "z31",
               "z41")
ZZ[[1]] <- matrix(Z_vals, nrow = N_ts, ncol = 1, byrow = TRUE)
# 2 states
Z_vals <- list("z11",  0  , 
               "z21","z22", 
               "z31","z32",
               "z41","z42")
ZZ[[2]] <- matrix(Z_vals, nrow = N_ts, ncol = 2, byrow = TRUE)
# 3 states
Z_vals <- list("z11",  0  ,  0  ,
               "z21","z22",  0  ,
               "z31","z32","z33",
               "z41","z42","z43")
ZZ[[3]] <- matrix(Z_vals, nrow = N_ts, ncol = 3, byrow = TRUE)

# 4 states--removing this because we only have 4 sp groups
#Z_vals <- list("z11",  0  ,  0  , 0,
               #"z21","z22",  0  , 0,
               #"z31","z32","z33", 0,
               #"z41","z42","z43","z44")
#ZZ[[4]] <- matrix(Z_vals, nrow = N_ts, ncol = 4, byrow = TRUE)

## 'aa' is the offset/scaling
aa <- "zero"

## 'DD' and 'd' are for covariates
DD <- "zero"  # matrix(0,mm,1)
dd <- "zero"  # matrix(0,1,wk_last)

## 'RR' is var-cov matrix for obs errors
RR <- "diagonal and equal"

Process Model

Process model: \[ \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix}_t= \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{bmatrix}* \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix}_{t-1}+ \begin{bmatrix} v_1\\ v_2\\ v_3\\ v_4\\ v_5\\ \end{bmatrix}_t \]

\[ \text{Where }v_i \sim MVN \begin{pmatrix} \text{0,}\begin{bmatrix} 1&0&0\\ 0&1&0\\ 0&0&1\\ \end{bmatrix} \end{pmatrix} \]

Code
## number of processes
mm <- 2 # this is just a placeholder, as we loop through # of states we will update this

## 'BB' is identity: 1's along the diagonal & 0's elsewhere

BB <- "identity"  # diag(mm)

## 'uu' is a column vector of 0's
uu <- "zero"  # matrix(0, mm, 1)

## 'CC' and 'cc' are for covariates
CC <- "zero"  # matrix(0, mm, 1)
cc <- "zero"  # matrix(0, 1, wk_last)

## 'QQ' is identity
QQ <- "identity"  # diag(mm)

Fit with MARSS

Code
## list with specifications for model vectors/matrices
mod_list <- list(Z = ZZ[[2]], A = aa, D = DD, d = dd, R = RR,
                 B = BB, U = uu, C = CC, c = cc, Q = QQ)
## list with model inits
init_list <- list(x0 = matrix(rep(0, mm), mm, 1))
## list with model control parameters
#con_list <- list(maxit = 3000, allow.degen = FALSE) #allow.degen not allowed for BFGS
con_list <- list(maxit = 3000)
Code
Z_AIC <- data.frame(states = NA,AICc=NA)
for (z in 1:length(ZZ)){
## list with specifications for model vectors/matrices
# the first mod_list is the long way of writing it out
# mod_list <- list(Z = ZZ[[z]], A = aa, D = DD, d = dd, R = RR,
#                  B = BB, U = uu, C = CC, c = cc, Q = QQ)  
mm <- z
# model_list specification 
mod_list = list(m = mm, R = "diagonal and equal")
## list with model inits
init_list <- list(x0 = matrix(rep(0, mm), mm, 1))

## list with model control parameters
#con_list <- list(maxit = 3000, allow.degen = TRUE)
con_list <- list(maxit = 3000)

## fit MARSS
dfa <- MARSS(y = dat, model = mod_list, inits = init_list, control = con_list, covariates = t(cov_df),form="dfa",
             method="BFGS")

# get model results
Z_AIC[z,"states"] = z
Z_AIC[z,"AICc"] = dfa$AICc
}
Success! Converged in 249 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 249 iterations. 
Log-likelihood: -1024.354 
AIC: 2138.707   AICc: 2143.636   
 
                           Estimate
Z.11                      -0.409256
Z.21                      -0.085674
Z.31                       0.037819
Z.41                      -0.123284
R.diag                     0.488030
D.(Cyclops,Temp)          -0.065049
D.(Diaptomus,Temp)        -0.523167
D.(Epischura,Temp)         0.316483
D.(Neomysis,Temp)         -0.303331
D.(Cyclops,TP)            -0.095264
D.(Diaptomus,TP)           0.231892
D.(Epischura,TP)           0.033164
D.(Neomysis,TP)            0.131168
D.(Cyclops,pH)             0.056657
D.(Diaptomus,pH)           0.236882
D.(Epischura,pH)           0.029319
D.(Neomysis,pH)            0.160363
D.(Cyclops,Cryptomonas)    0.039220
D.(Diaptomus,Cryptomonas) -0.121052
D.(Epischura,Cryptomonas) -0.112211
D.(Neomysis,Cryptomonas)   0.073532
D.(Cyclops,Diatoms)        0.163895
D.(Diaptomus,Diatoms)     -0.069667
D.(Epischura,Diatoms)     -0.048028
D.(Neomysis,Diatoms)       0.213371
D.(Cyclops,Greens)         0.053764
D.(Diaptomus,Greens)      -0.028448
D.(Epischura,Greens)      -0.113465
D.(Neomysis,Greens)       -0.000335
D.(Cyclops,Unicells)       0.140911
D.(Diaptomus,Unicells)     0.063897
D.(Epischura,Unicells)     0.072608
D.(Neomysis,Unicells)      0.408725
D.(Cyclops,Other.algae)    0.179521
D.(Diaptomus,Other.algae)  0.150101
D.(Epischura,Other.algae)  0.110116
D.(Neomysis,Other.algae)   0.012215
D.(Cyclops,cos_t)         -0.063268
D.(Diaptomus,cos_t)       -0.859093
D.(Epischura,cos_t)       -0.249775
D.(Neomysis,cos_t)         0.164887
D.(Cyclops,sin_t)          0.018376
D.(Diaptomus,sin_t)       -0.166333
D.(Epischura,sin_t)       -0.261342
D.(Neomysis,sin_t)        -0.458475
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

Success! Converged in 149 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 149 iterations. 
Log-likelihood: -996.0389 
AIC: 2088.078   AICc: 2093.698   
 
                          Estimate
Z.11                       0.51102
Z.21                       0.09199
Z.31                      -0.02876
Z.41                       0.17171
Z.22                       0.21251
Z.32                       0.26970
Z.42                       0.35452
R.diag                     0.34476
D.(Cyclops,Temp)          -0.06169
D.(Diaptomus,Temp)        -0.25739
D.(Epischura,Temp)         0.64454
D.(Neomysis,Temp)         -0.20214
D.(Cyclops,TP)            -0.05364
D.(Diaptomus,TP)           0.15449
D.(Epischura,TP)          -0.09795
D.(Neomysis,TP)           -0.11774
D.(Cyclops,pH)             0.08388
D.(Diaptomus,pH)           0.25524
D.(Epischura,pH)           0.04539
D.(Neomysis,pH)            0.12654
D.(Cyclops,Cryptomonas)    0.01336
D.(Diaptomus,Cryptomonas) -0.04458
D.(Epischura,Cryptomonas) -0.01202
D.(Neomysis,Cryptomonas)   0.14311
D.(Cyclops,Diatoms)        0.17870
D.(Diaptomus,Diatoms)     -0.07581
D.(Epischura,Diatoms)     -0.05458
D.(Neomysis,Diatoms)       0.17580
D.(Cyclops,Greens)         0.04206
D.(Diaptomus,Greens)      -0.04019
D.(Epischura,Greens)      -0.12482
D.(Neomysis,Greens)       -0.03046
D.(Cyclops,Unicells)       0.13016
D.(Diaptomus,Unicells)     0.14890
D.(Epischura,Unicells)     0.18534
D.(Neomysis,Unicells)      0.38306
D.(Cyclops,Other.algae)    0.16850
D.(Diaptomus,Other.algae)  0.11865
D.(Epischura,Other.algae)  0.06715
D.(Neomysis,Other.algae)   0.08263
D.(Cyclops,cos_t)         -0.04527
D.(Diaptomus,cos_t)       -0.63731
D.(Epischura,cos_t)        0.02794
D.(Neomysis,cos_t)         0.27837
D.(Cyclops,sin_t)         -0.00718
D.(Diaptomus,sin_t)        0.13341
D.(Epischura,sin_t)        0.11016
D.(Neomysis,sin_t)        -0.55392
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

Success! Converged in 98 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 98 iterations. 
Log-likelihood: -987.4889 
AIC: 2074.978   AICc: 2081.086   
 
                          Estimate
Z.11                       0.56026
Z.21                       0.04680
Z.31                      -0.00770
Z.41                      -0.04569
Z.22                       0.25015
Z.32                       0.19845
Z.42                      -0.01738
Z.33                       0.26252
Z.43                      -0.42960
R.diag                     0.27253
D.(Cyclops,Temp)          -0.03206
D.(Diaptomus,Temp)        -0.14084
D.(Epischura,Temp)         0.60999
D.(Neomysis,Temp)         -0.43617
D.(Cyclops,TP)            -0.07321
D.(Diaptomus,TP)           0.07316
D.(Epischura,TP)          -0.32876
D.(Neomysis,TP)            0.59368
D.(Cyclops,pH)             0.10745
D.(Diaptomus,pH)           0.17570
D.(Epischura,pH)           0.05978
D.(Neomysis,pH)           -0.08374
D.(Cyclops,Cryptomonas)   -0.00454
D.(Diaptomus,Cryptomonas) -0.00743
D.(Epischura,Cryptomonas) -0.04024
D.(Neomysis,Cryptomonas)   0.22687
D.(Cyclops,Diatoms)        0.19107
D.(Diaptomus,Diatoms)     -0.08444
D.(Epischura,Diatoms)     -0.01549
D.(Neomysis,Diatoms)      -0.06037
D.(Cyclops,Greens)         0.03182
D.(Diaptomus,Greens)      -0.03804
D.(Epischura,Greens)      -0.12765
D.(Neomysis,Greens)       -0.08265
D.(Cyclops,Unicells)       0.12570
D.(Diaptomus,Unicells)     0.20328
D.(Epischura,Unicells)     0.16806
D.(Neomysis,Unicells)      0.64863
D.(Cyclops,Other.algae)    0.15741
D.(Diaptomus,Other.algae)  0.11430
D.(Epischura,Other.algae)  0.03705
D.(Neomysis,Other.algae)  -0.09653
D.(Cyclops,cos_t)         -0.00337
D.(Diaptomus,cos_t)       -0.60106
D.(Epischura,cos_t)        0.05494
D.(Neomysis,cos_t)        -0.38579
D.(Cyclops,sin_t)          0.00975
D.(Diaptomus,sin_t)        0.30047
D.(Epischura,sin_t)        0.04593
D.(Neomysis,sin_t)        -0.47729
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
Z_AIC$AICc = Z_AIC$AICc- min(Z_AIC$AICc)
Z_AIC
  states     AICc
1      1 62.55018
2      2 12.61220
3      3  0.00000

Best performing based on AICc is the one with 3 states.

Code
## list with specifications for model vectors/matrices
# mod_list <- list(Z = ZZ[[2]], A = aa, D = DD, d = dd, R = RR,
#                  B = BB, U = uu, C = CC, c = cc, Q = QQ)
mm <- 3
# model_list specification 
mod_list = list(m = mm, R = "diagonal and equal")
## list with model inits
init_list <- list(x0 = matrix(rep(0, mm), mm, 1))


## fit MARSS
dfa_2 <- MARSS(y = dat, model = mod_list, inits = init_list, control = con_list, covariates = t(cov_df),form="dfa",
               method="BFGS")
Success! Converged in 98 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 98 iterations. 
Log-likelihood: -987.4889 
AIC: 2074.978   AICc: 2081.086   
 
                          Estimate
Z.11                       0.56026
Z.21                       0.04680
Z.31                      -0.00770
Z.41                      -0.04569
Z.22                       0.25015
Z.32                       0.19845
Z.42                      -0.01738
Z.33                       0.26252
Z.43                      -0.42960
R.diag                     0.27253
D.(Cyclops,Temp)          -0.03206
D.(Diaptomus,Temp)        -0.14084
D.(Epischura,Temp)         0.60999
D.(Neomysis,Temp)         -0.43617
D.(Cyclops,TP)            -0.07321
D.(Diaptomus,TP)           0.07316
D.(Epischura,TP)          -0.32876
D.(Neomysis,TP)            0.59368
D.(Cyclops,pH)             0.10745
D.(Diaptomus,pH)           0.17570
D.(Epischura,pH)           0.05978
D.(Neomysis,pH)           -0.08374
D.(Cyclops,Cryptomonas)   -0.00454
D.(Diaptomus,Cryptomonas) -0.00743
D.(Epischura,Cryptomonas) -0.04024
D.(Neomysis,Cryptomonas)   0.22687
D.(Cyclops,Diatoms)        0.19107
D.(Diaptomus,Diatoms)     -0.08444
D.(Epischura,Diatoms)     -0.01549
D.(Neomysis,Diatoms)      -0.06037
D.(Cyclops,Greens)         0.03182
D.(Diaptomus,Greens)      -0.03804
D.(Epischura,Greens)      -0.12765
D.(Neomysis,Greens)       -0.08265
D.(Cyclops,Unicells)       0.12570
D.(Diaptomus,Unicells)     0.20328
D.(Epischura,Unicells)     0.16806
D.(Neomysis,Unicells)      0.64863
D.(Cyclops,Other.algae)    0.15741
D.(Diaptomus,Other.algae)  0.11430
D.(Epischura,Other.algae)  0.03705
D.(Neomysis,Other.algae)  -0.09653
D.(Cyclops,cos_t)         -0.00337
D.(Diaptomus,cos_t)       -0.60106
D.(Epischura,cos_t)        0.05494
D.(Neomysis,cos_t)        -0.38579
D.(Cyclops,sin_t)          0.00975
D.(Diaptomus,sin_t)        0.30047
D.(Epischura,sin_t)        0.04593
D.(Neomysis,sin_t)        -0.47729
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

#Rotating trends and loadings

Code
## get the estimated ZZ
Z_est <- coef(dfa_2, type = "matrix")$Z

## get the inverse of the rotation matrix
H_inv <- varimax(Z_est)$rotmat

#Now rotate both Z and x

## rotate factor loadings
Z_rot = Z_est %*% H_inv   

## rotate processes
proc_rot = solve(H_inv) %*% dfa_2$states

#Estimated States and Loadings of Model with 3 states

Code
## plot labels
zooplankton_names = c("Cyclops", "Diaptomus", "Epischura", "Neomysis")
ylbl <- zooplankton_names
w_ts <- seq(dim(dat)[2])

# plot colors
clr <- c("brown", "blue", "darkgreen", "darkred", "purple")

## set up plot area
layout(matrix(c(1,2,3,4, 5, 6), mm, 2), widths = c(2,1))
par(mai = c(0.5, 0.5, 0.5, 0.1), omi = c(0, 0, 0, 0))

## plot the processes
for(i in 1:mm) {
  ylm <- c(-1, 1) * max(abs(proc_rot[i,]))
  ## set up plot area
    plot(w_ts,proc_rot[i,], type = "n", bty = "L",
         ylim = ylm, xlab = "", ylab = "", xaxt = "n")
    ## draw zero-line
    abline(h = 0, col = "gray")
    ## plot trend line
    lines(w_ts, proc_rot[i,], lwd = 2)
    lines(w_ts, proc_rot[i,], lwd = 2)
    ## add panel labels
    mtext(paste("State",i), side = 3, line = 0.5)
    axis(1, 12 * (0:dim(dat)[2]) + 1, yr_frst + 0:dim(dat)[2])
}
## plot the loadings
minZ <- 0
ylm <- c(-1, 1) * max(abs(Z_rot))
for(i in 1:mm) {
  print(i)
  plot(x = c(1:N_ts)[abs(Z_rot[,i])>minZ],
       y = as.vector(Z_rot[abs(Z_rot[,i])>minZ,i]),
       type = "h",
       lwd = 2, xlab = "", ylab = "", xaxt = "n", ylim = ylm,
       xlim = c(0.5, N_ts + 0.5), col = clr)
    for(j in 1:N_ts) {
      if(Z_rot[j,i] > minZ) {text(j, -0.03, ylbl[j], srt = 90, adj = 1, cex = 1.2, col = clr[j])}
      if(Z_rot[j,i] < -minZ) {text(j, 0.03, ylbl[j], srt = 90, adj = 0, cex = 1.2, col = clr[j])}
      abline(h = 0, lwd = 1.5, col = "gray")
      } 
  mtext(paste("Factor loadings on state", i), side = 3, line = 0.5)
}
[1] 1
[1] 2
[1] 3

Code
## list with specifications for model vectors/matrices
# mod_list <- list(Z = ZZ[[3]], A = aa, D = DD, d = dd, R = RR,
#                  B = BB, U = uu, C = CC, c = cc, Q = QQ)
mm <- 3

mod_list = list(m = mm, R = "diagonal and equal")
## list with model inits
init_list <- list(x0 = matrix(rep(0, mm), mm, 1))
## list with model control parameters
#con_list <- list(maxit = 3000, allow.degen = F)
con_list <- list(maxit = 3000)
 
# Forward stepwise model selection
# Create a table of indicators as to whether that predictor is include
#cov.terms <- c(covs,"seasonality")
cov.terms<-covs #seasonality is causing an issue so running without this
mod.ind <- matrix(F, ncol = length(cov.terms), nrow = 1)
mod.tab <- data.frame(mod.ind)
names(mod.tab) <- cov.terms

# Define a columns for storing model selection
mod.tab$AICc <- NA

# to keep track of the best model so far
k_counter <- 1 

# save the fits in a list
DFA_fits <- NULL

for(i in 1:(length(cov.terms)+1)){
  print(i)
  model_i <- i
  ind_on=as.matrix(mod.tab[i,1:length(cov.terms)])
  # ind_off=as.matrix(!mod.tab[i,1:length(mod.terms)])

  # ind_off <- !mod.ind[model_i, ]
  # ind_on <- mod.ind[model_i, ]
  
  # Set which covariates should be included
  if (length(cov.terms[ind_on])==0){
    cov_data = NULL
  }else{
    cov_data =  cov_df %>% select(matches(cov.terms[ind_on])) %>% t()
  }
  if("seasonality" %in% cov.terms[ind_on]){ # special case where you have 2 columns for seasonality
    cov_data = rbind(cov_data,dd)
  }
  
  # fit the model
  dfa_model = MARSS(y = dat, model = mod_list, form = "dfa",inits = init_list, control = con_list,covariates=cov_data,
                    method="BFGS")
  
  #save thi fit
  DFA_fits[[i]] = get_DFA_fits(dfa_model,dd=cov_data)
  
  # add AICc valueto model
  mod.tab[i,]$AICc <- dfa_model$AICc
  
  # add new row and reset the AICc
  mod.tab[i+1,] = mod.tab[i,]
  mod.tab[i+1,]$AICc <- NA
  # turn the next random effect on for next model
  mod.tab[i+1,i] = T
  
  if (i>1){ # don't perform below to populate the 2nd row and 1st random effect
    
  # if model is better than the best performing model so far, keep the current random effect on for next model
  if ((mod.tab$AICc[i]-mod.tab$AICc[i-k_counter])< -4){
    mod.tab[i+1,i-1] = T # redundant but just set up the code
    k_counter = 1
  } else { # if not better
    mod.tab[i+1,i-1] = F # turn off current random effect for next model
    k_counter=k_counter+1
  }
  }
}
[1] 1
Success! Converged in 44 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 44 iterations. 
Log-likelihood: -1121.846 
AIC: 2263.693   AICc: 2263.944   
 
       Estimate
Z.11    -0.6812
Z.21    -0.1991
Z.31    -0.0863
Z.41    -0.3488
Z.22     0.5621
Z.32     0.1966
Z.42     0.1649
Z.33     0.6705
Z.43     0.7343
R.diag   0.1941
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 2
Success! Converged in 38 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 38 iterations. 
Log-likelihood: -1084.73 
AIC: 2197.459   AICc: 2197.942   
 
            Estimate
Z.11         -0.6596
Z.21         -0.2015
Z.31         -0.0140
Z.41         -0.0481
Z.22          0.5552
Z.32          0.1622
Z.42          0.1745
Z.33         -0.4205
Z.43          0.8209
R.diag        0.2209
D.Cyclops     0.1261
D.Diaptomus   0.0688
D.Epischura   0.6571
D.Neomysis   -0.0951
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 3
Success! Converged in 43 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 43 iterations. 
Log-likelihood: -1078.97 
AIC: 2193.941   AICc: 2194.73   
 
                    Estimate
Z.11                0.659277
Z.21                0.194425
Z.31                0.000501
Z.41                0.038917
Z.22                0.558411
Z.32                0.147317
Z.42                0.125531
Z.33               -0.421475
Z.43                0.823533
R.diag              0.215517
D.(Cyclops,Temp)    0.110790
D.(Diaptomus,Temp)  0.050490
D.(Epischura,Temp)  0.622253
D.(Neomysis,Temp)  -0.012807
D.(Cyclops,TP)     -0.121674
D.(Diaptomus,TP)   -0.150452
D.(Epischura,TP)   -0.274848
D.(Neomysis,TP)     0.626019
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 4
Success! Converged in 65 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 65 iterations. 
Log-likelihood: -1049.945 
AIC: 2135.891   AICc: 2136.68   
 
                   Estimate
Z.11                0.65212
Z.21                0.13828
Z.31                0.00429
Z.41                0.32362
Z.22                0.45997
Z.32                0.18022
Z.42                0.19750
Z.33               -0.44988
Z.43               -0.86136
R.diag              0.20655
D.(Cyclops,Temp)    0.06180
D.(Diaptomus,Temp) -0.06508
D.(Epischura,Temp)  0.61027
D.(Neomysis,Temp)   0.27165
D.(Cyclops,pH)      0.20242
D.(Diaptomus,pH)    0.43172
D.(Epischura,pH)    0.12069
D.(Neomysis,pH)     0.22910
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 5
Success! Converged in 73 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 73 iterations. 
Log-likelihood: -1048.094 
AIC: 2140.187   AICc: 2141.36   
 
                          Estimate
Z.11                       0.65214
Z.21                       0.13643
Z.31                       0.00874
Z.41                       0.30205
Z.22                       0.46086
Z.32                       0.17603
Z.42                       0.24398
Z.33                      -0.44613
Z.43                      -0.82999
R.diag                     0.20671
D.(Cyclops,Temp)           0.06537
D.(Diaptomus,Temp)        -0.06469
D.(Epischura,Temp)         0.60517
D.(Neomysis,Temp)          0.27450
D.(Cyclops,pH)             0.20336
D.(Diaptomus,pH)           0.43133
D.(Epischura,pH)           0.12576
D.(Neomysis,pH)            0.20074
D.(Cyclops,Cryptomonas)    0.03145
D.(Diaptomus,Cryptomonas)  0.00741
D.(Epischura,Cryptomonas) -0.04725
D.(Neomysis,Cryptomonas)   0.18123
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 6
Success! Converged in 78 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 78 iterations. 
Log-likelihood: -1039.78 
AIC: 2123.56   AICc: 2124.732   
 
                      Estimate
Z.11                    0.6137
Z.21                    0.1038
Z.31                    0.0101
Z.41                    0.3218
Z.22                    0.4403
Z.32                    0.1903
Z.42                    0.2234
Z.33                   -0.4350
Z.43                   -0.8227
R.diag                  0.2154
D.(Cyclops,Temp)        0.1062
D.(Diaptomus,Temp)     -0.0456
D.(Epischura,Temp)      0.6085
D.(Neomysis,Temp)       0.2877
D.(Cyclops,pH)          0.1463
D.(Diaptomus,pH)        0.4120
D.(Epischura,pH)        0.1242
D.(Neomysis,pH)         0.2033
D.(Cyclops,Diatoms)     0.2421
D.(Diaptomus,Diatoms)   0.1223
D.(Epischura,Diatoms)  -0.0158
D.(Neomysis,Diatoms)    0.0878
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 7
Success! Converged in 84 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 84 iterations. 
Log-likelihood: -1036.829 
AIC: 2125.658   AICc: 2127.292   
 
                      Estimate
Z.11                   0.60938
Z.21                   0.10170
Z.31                   0.02033
Z.41                   0.32875
Z.22                   0.43355
Z.32                   0.19481
Z.42                   0.23205
Z.33                  -0.41443
Z.43                  -0.77637
R.diag                 0.22083
D.(Cyclops,Temp)       0.09683
D.(Diaptomus,Temp)    -0.05462
D.(Epischura,Temp)     0.65018
D.(Neomysis,Temp)      0.32025
D.(Cyclops,pH)         0.14659
D.(Diaptomus,pH)       0.41604
D.(Epischura,pH)       0.12483
D.(Neomysis,pH)        0.18919
D.(Cyclops,Diatoms)    0.24149
D.(Diaptomus,Diatoms)  0.12213
D.(Epischura,Diatoms) -0.00695
D.(Neomysis,Diatoms)   0.09355
D.(Cyclops,Greens)     0.02451
D.(Diaptomus,Greens)   0.02080
D.(Epischura,Greens)  -0.11787
D.(Neomysis,Greens)   -0.07966
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 8
Success! Converged in 55 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 55 iterations. 
Log-likelihood: -1030.414 
AIC: 2112.828   AICc: 2114.462   
 
                       Estimate
Z.11                    0.55888
Z.21                    0.07382
Z.31                   -0.00313
Z.41                   -0.06586
Z.22                    0.31565
Z.32                    0.17463
Z.42                   -0.01113
Z.33                   -0.24226
Z.43                    0.43438
R.diag                  0.30465
D.(Cyclops,Temp)        0.06874
D.(Diaptomus,Temp)     -0.15302
D.(Epischura,Temp)      0.59943
D.(Neomysis,Temp)      -0.27480
D.(Cyclops,pH)          0.10872
D.(Diaptomus,pH)        0.45677
D.(Epischura,pH)        0.03487
D.(Neomysis,pH)        -0.05291
D.(Cyclops,Diatoms)     0.23175
D.(Diaptomus,Diatoms)   0.08356
D.(Epischura,Diatoms)  -0.01452
D.(Neomysis,Diatoms)   -0.03756
D.(Cyclops,Unicells)    0.14242
D.(Diaptomus,Unicells)  0.25199
D.(Epischura,Unicells)  0.14616
D.(Neomysis,Unicells)   0.61045
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.

[1] 9
Success! Converged in 97 iterations.
Function MARSSkfas used for likelihood calculation.

MARSS fit is
Estimation method: BFGS 
Estimation converged in 97 iterations. 
Log-likelihood: -1025.751 
AIC: 2111.502   AICc: 2113.677   
 
                           Estimate
Z.11                       0.546492
Z.21                       0.061655
Z.31                      -0.014601
Z.41                      -0.057123
Z.22                       0.315893
Z.32                       0.166549
Z.42                       0.000764
Z.33                      -0.246129
Z.43                       0.442295
R.diag                     0.302291
D.(Cyclops,Temp)          -0.017451
D.(Diaptomus,Temp)        -0.206436
D.(Epischura,Temp)         0.564237
D.(Neomysis,Temp)         -0.239335
D.(Cyclops,pH)             0.099450
D.(Diaptomus,pH)           0.449551
D.(Epischura,pH)           0.033059
D.(Neomysis,pH)           -0.048971
D.(Cyclops,Diatoms)        0.194209
D.(Diaptomus,Diatoms)      0.058776
D.(Epischura,Diatoms)     -0.029754
D.(Neomysis,Diatoms)      -0.024733
D.(Cyclops,Unicells)       0.121335
D.(Diaptomus,Unicells)     0.238873
D.(Epischura,Unicells)     0.137600
D.(Neomysis,Unicells)      0.630173
D.(Cyclops,Other.algae)    0.165537
D.(Diaptomus,Other.algae)  0.102159
D.(Epischura,Other.algae)  0.065119
D.(Neomysis,Other.algae)  -0.069964
Initial states (x0) defined at t=0

Standard errors have not been calculated. 
Use MARSSparamCIs to compute CIs and bias estimates.
Code
mod.tab
    Temp    TP    pH Cryptomonas Diatoms Greens Unicells Other.algae     AICc
1  FALSE FALSE FALSE       FALSE   FALSE  FALSE    FALSE       FALSE 2263.944
2   TRUE FALSE FALSE       FALSE   FALSE  FALSE    FALSE       FALSE 2197.942
3   TRUE  TRUE FALSE       FALSE   FALSE  FALSE    FALSE       FALSE 2194.730
4   TRUE FALSE  TRUE       FALSE   FALSE  FALSE    FALSE       FALSE 2136.680
5   TRUE FALSE  TRUE        TRUE   FALSE  FALSE    FALSE       FALSE 2141.360
6   TRUE FALSE  TRUE       FALSE    TRUE  FALSE    FALSE       FALSE 2124.732
7   TRUE FALSE  TRUE       FALSE    TRUE   TRUE    FALSE       FALSE 2127.292
8   TRUE FALSE  TRUE       FALSE    TRUE  FALSE     TRUE       FALSE 2114.462
9   TRUE FALSE  TRUE       FALSE    TRUE  FALSE     TRUE        TRUE 2113.677
10  TRUE FALSE  TRUE       FALSE    TRUE  FALSE     TRUE       FALSE    1.000
Code
##ZR:I don't think there should be a model #10 so I'm just removing this...
mod.tab<-mod.tab[-10,]
mod.tab$DeltaAICc<-mod.tab$AICc - min(mod.tab$AICc)
## get model fits & CI's
best_model_i = 8 #number 8 is simpler and similar AICc 
mod_fit <- DFA_fits[[best_model_i]]

## plot the fits
par(mfrow = c(N_ts, 1), mai = c(0.5, 0.7, 0.1, 0.1), omi = c(0, 0, 0, 0))
for(i in 1:N_ts) {
  up <- mod_fit$up[i,]
  mn <- mod_fit$ex[i,]
  lo <- mod_fit$lo[i,]
  plot(w_ts, mn, type = "n",
       xlab = "", ylab = ylbl[i],
       xaxt = "n", cex.lab = 1.2,
       ylim = c(min(lo), max(up)))
  axis(1, 12 * (0:dim(dat)[2]) + 1, yr_frst + 0:dim(dat)[2])
  points(w_ts, dat[i,], pch = 16, col = clr[i])
  lines(w_ts, up, col = "darkgray")
  lines(w_ts, mn, col = "black", lwd = 2)
  lines(w_ts, lo, col = "darkgray")
}

Results

Model results can be found above in the code outputs. Models 8 and 9 had similar AICcs so we chose the simpler model (#8) The best model had 3 states, and included covariates for temperature, pH, diatoms and unicells.

Discussion

Our results suggest that the best DFA model fit to time series of 4 groups of zooplankton(Cyclops, Diaptomus, Epischura, and Neomysis) included 3 states and covariates for temperature pH, diatoms and unicells. Copepods are known to prefer diatoms over other forms of phytoplankton (Kleppel 1993) , so it makes sense that this would be an important covariate. Using a DFA did not greatly reduce the dimensionality of our dataset (we went from 4 species groups to 3), but maybe this affect would have been greater if we had used a larger set of the zooplankton groups.

Team Contributions

All team members helped decide the question and plan for the analysis. ZR did the data wrangling for the zooplankton and the covariates, and checked for correlation issues with the covariates. NC wrote the matrix models in the methods section. NC and TW fit the models for different states using all the covariates. TW did the model selection for the different covariates. ZR helped with debugging the code, and wrote the discussion.

References

Hansen, Per Juel, Peter Koefoed Bjørnsen, and Benni Winding Hansen. 1997. “Zooplankton Grazing and Growth: Scaling Within the 2-2,-Μm Body Size Range.” Limnology and Oceanography 42 (4): 687–704. https://doi.org/10.4319/lo.1997.42.4.0687.
Kleppel, GS. 1993. “On the Diets of Calanoid Copepods.” Marine Ecology Progress Series 99: 183–95. https://doi.org/10.3354/meps099183.