1 Overview

Here we will use MARSS to do Dynamic Factor Analysis (DFA), which allows us to look for a set of common underlying processes among a relatively large set of time series (Zuur et al. 2003). There have been a number of recent applications of DFA to ecological questions surrounding Pacific salmon (Stachura et al. 2014; Jorgensen et al. 2016; Ohlberger et al. 2016) and stream temperatures (Lisi et al. 2015). For a more in-depth treatment of potential applications of MARSS, I direct readers to Chap 9 in the User’s Guide for the package.

2 Background

DFA is conceptually different than what we have been doing in the previous applications. Here we are trying to explain temporal variation in a set of \(n\) observed time series using linear combinations of a set of \(m\) hidden random walks, where \(m << n\). A DFA model is a type of MARSS model with the following structure:

\[ \begin{gathered} \mathbf{x}_t = \mathbf{x}_{t-1}+\mathbf{w}_t \text{ where } \mathbf{w}_t \sim \text{MVN}(0,\mathbf{Q}) \\ \mathbf{y}_t = \mathbf{Z}\mathbf{x}_t+\mathbf{a}+\mathbf{v}_t \text{ where } \mathbf{v}_t \sim \text{MVN}(0,\mathbf{R}) \end{gathered}. \]

This equation should look rather familiar as it is exactly the same form we used for estimating varying number of processes from a set of observations in Lesson II. The difference with DFA is that rather than fixing the elements within \(\mathbf{Z}\) at 1 or 0 to indicate whether an observation does or does not correspond to a trend, we will instead estimate them as “loadings” on each of the states/processes.

2.1 Example of a DFA model

The general idea is that the observations \(\mathbf{y}\) are modeled as a linear combination of hidden processes \((\mathbf{x})\) and factor loadings \((\mathbf{Z})\) plus some offsets \((\mathbf{a})\). Imagine a case where we had a data set with five observed time series \((n=5)\) and we want to fit a model with three hidden processes \((m=3)\). If we write out our DFA model in MARSS matrix form, the process model would look like this:

\[ \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix}_t = \begin{bmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\end{bmatrix}_{t-1} + \begin{bmatrix} w_{1} \\ w_{2} \\ w_{3} \end{bmatrix}_t \]

And the observation equation would look like

\[ \begin{bmatrix} y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \end{bmatrix}_t = \begin{bmatrix} z_{11}&z_{12}&z_{13}\\ z_{21}&z_{22}&z_{23}\\ z_{31}&z_{32}&z_{33}\\ z_{41}&z_{42}&z_{43}\\ z_{51}&z_{52}&z_{53}\end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix}_t + \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} + \begin{bmatrix} v_{1} \\ v_{2} \\ v_{3} \\ v_{4} \\ v_{5} \end{bmatrix}_t. \]

The process errors would be

\[ \begin{bmatrix} w_{1} \\ w_{2} \\ w_{3} \end{bmatrix}_t \sim \text{MVN} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} q_{11}&q_{12}&q_{13}\\ q_{12}&q_{22}&q_{23}\\ q_{13}&q_{23}&q_{33}\end{bmatrix} \end{pmatrix}. \]

And the observation errors would be

\[ \begin{bmatrix} v_{1} \\ v_{2} \\ v_{3} \\ v_{4} \\ v_{5} \end{bmatrix}_t \sim \text{MVN} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} r_{11}&r_{12}&r_{13}&r_{14}&r_{15}\\ r_{12}&r_{22}&r_{23}&r_{24}&r_{25}\\ r_{13}&r_{23}&r_{33}&r_{34}&r_{35}\\ r_{14}&r_{24}&r_{34}&r_{44}&r_{45}\\ r_{15}&r_{25}&r_{35}&r_{45}&r_{55}\end{bmatrix} \end{pmatrix} \]

2.2 Constraining a DFA model

If \(\mathbf{Z}\), \(\mathbf{a}\), and \(\mathbf{Q}\) are not constrained, however, then the DFA model above is unidentifiable. Nevertheless, we can use the following parameter constraints to make the model identifiable:

  • in the first \(m-1\) rows of \(\mathbf{Z}\), the \(z\)-value in the \(j\)-th column and \(i\)-th row is set to zero if \(j > i\);
  • \(\mathbf{a}\) is constrained so that the first \(m\) values are set to zero; and
  • \(\mathbf{Q}\) is set equal to the identity matrix \((\mathbf{I_m})\).

Using these constraints, the process equation for the DFA model above becomes

\[ \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix}_t = \begin{bmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\end{bmatrix}_{t-1} + \begin{bmatrix} w_{1} \\ w_{2} \\ w_{3} \end{bmatrix}_t, \]

and the observation equation becomes

\[ \begin{bmatrix} y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \end{bmatrix}_t = \begin{bmatrix} z_{11}&0&0\\ z_{21}&z_{22}&0\\ z_{31}&z_{32}&z_{33}\\ z_{41}&z_{42}&z_{43}\\ z_{51}&z_{52}&z_{53}\end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix}_t + \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} v_{1} \\ v_{2} \\ v_{3} \\ v_{4} \\ v_{5} \end{bmatrix}_t. \]

The process errors would then become

\[ \begin{bmatrix} w_{1} \\ w_{2} \\ w_{3} \end{bmatrix}_t \sim \text{MVN} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{bmatrix} \end{pmatrix}, \]

but the observation errors would stay the same, such that

\[ \begin{bmatrix} v_{1} \\ v_{2} \\ v_{3} \\ v_{4} \\ v_{5} \end{bmatrix}_t \sim \text{MVN} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} r_{11}&r_{12}&r_{13}&r_{14}&r_{15}\\ r_{12}&r_{22}&r_{23}&r_{24}&r_{25}\\ r_{13}&r_{23}&r_{33}&r_{34}&r_{35}\\ r_{14}&r_{24}&r_{34}&r_{44}&r_{45}\\ r_{15}&r_{25}&r_{35}&r_{45}&r_{55} \end{bmatrix} \end{pmatrix}. \]

2.3 Different error structures

The example observation equation we used above had what we refer to as an “unconstrained” variance-covariance matrix \(\mathbf{R}\) wherein all of the parameters are unique. In certain applications, however, we may want to change our assumptions about the forms for \(\mathbf{R}\). For example, we might have good reason to believe that all of the observations have different error variances and they were independent of one another (e.g., different methods were used for sampling), in which case

\[ \mathbf{R} = \begin{bmatrix} r_1&0&0&0&0 \\ 0&r_2&0&0&0 \\ 0&0&r_3&0&0 \\ 0&0&0&r_4&0 \\ 0&0&0&0&r_5 \end{bmatrix}. \]

Alternatively, we might have a situation where all of the observation errors had the same variance \(r\), but they were not independent from one another. In that case we would have to include a covariance parameter \(k\), such that

\[ \mathbf{R} = \begin{bmatrix} r&k&k&k&k \\ k&r&k&k&k \\ k&k&r&k&k \\ k&k&k&r&k \\ k&k&k&k&r \end{bmatrix}. \]

Any of these options for \(\mathbf{R}\) (and other custom options as well) are available to us in a DFA model, just as they were in the MARSS models in Lesson II.

3 Data retrieval

For this exercise, we will use the Lake Washington phytoplankton data contained in the MARSS package. Let’s begin by reading in the monthly values for all of the data, including metabolism, chemistry, and climate.

Before proceeding further, we’ll load the MARSS package.

## load MARSS pkg
library("MARSS")
## load the data (there are 3 datasets contained here)
data(lakeWAplankton)
## we want lakeWAplanktonTrans, which has been transformed
## so the 0s are replaced with NAs and the data z-scored
all_dat <- lakeWAplanktonTrans
## use only the 10 years from 1980-1989
yr_frst <- 1980
yr_last <- 1989
plank_dat <- all_dat[all_dat[,"Year"]>=yr_frst & all_dat[,"Year"]<=yr_last,]
## create vector of phytoplankton group names
phytoplankton <- c("Cryptomonas", "Diatoms", "Greens",
                   "Unicells", "Other.algae")
## get only the phytoplankton
dat_1980 <- plank_dat[,phytoplankton]

Next, we transpose the data matrix and calculate the number of time series and their length.

## transpose data so time goes across columns
dat_1980 <- t(dat_1980)
## get number of time series
N_ts <- dim(dat_1980)[1]
## get length of time series
TT <- dim(dat_1980)[2] 

Recall from lecture that it would be easier to estimate the real parameters of interest if we de-mean the data, so let’s do that.

y_bar <- apply(dat_1980, 1, mean, na.rm=TRUE)
dat <- dat_1980 - y_bar
rownames(dat) <- rownames(dat_1980)

3.1 Time series plots

Here are time series plots of ll five phytoplankton functional groups.

spp <- rownames(dat_1980)
clr <- c("brown","blue","darkgreen","darkred","purple")
cnt <- 1
par(mfrow=c(N_ts,1), mai=c(0.5,0.7,0.1,0.1), omi=c(0,0,0,0))
for(i in spp){
  plot(dat[i,],xlab="",ylab="Abundance index", bty="L", xaxt="n", pch=16, col=clr[cnt], type="b")
  axis(1,12*(0:dim(dat_1980)[2])+1,yr_frst+0:dim(dat_1980)[2])
  title(i)
  cnt <- cnt + 1
  }

4 DFA models in MARSS

The MARSS package is designed to work with the fully specified matrix form of the multivariate state-space model we wrote out in Sec 3. Thus, we will need to create a model list with forms for each of the vectors and matrices. Note that even though some of the model elements are scalars and vectors, we will need to specify everything as a matrix (or array for time series of matrices).

Notice that the code below uses some of the MARSS shortcuts for specifying forms of vectors and matrices. We will also use the matrix(list(),nrow,ncol) trick we learned previously.

4.1 The process model

We need to specify the explicit form for all of the vectors and matrices in the full form of the MARSS model we defined in Sec 3.1. Note that we do not have to specify anything for the states \((\mathbf{x})\) – those are elements that MARSS will identify and estimate itself based on our definitions of the other vectors and matrices.

## number of processes
mm <- 3
## 'BB' is identity: 1's along the diagonal & 0's elsewhere
BB <- "identity"  # diag(mm)
## 'uu' is a column vector of 0's
uu <- "zero"  # matrix(0,mm,1)
## 'CC' and 'cc' are for covariates
CC <- "zero"  # matrix(0,mm,1)
cc <- "zero"  # matrix(0,1,wk_last)
## 'QQ' is identity
QQ <- "identity"  # diag(mm)

4.2 The observation model

Here we will fit the DFA model above where we have R N_ts observed time series and we want 3 hidden states. Now we need to set up the observation model for MARSS. Here are the vectors and matrices for our first model where each nutrient follows its own process. Recall that we will need to set the elements in the upper R corner of \(\mathbf{Z}\) to 0. We will assume that the observation errors have different variances and they are independent of one another.

## 'ZZ' is loadings matrix
Z_vals <- list("z11",  0  ,  0  ,
               "z21","z22",  0  ,
               "z31","z32","z33",
               "z41","z42","z43",
               "z51","z52","z53")
ZZ <- matrix(Z_vals, nrow=N_ts, ncol=3, byrow=TRUE)
ZZ
##      [,1]  [,2]  [,3] 
## [1,] "z11" 0     0    
## [2,] "z21" "z22" 0    
## [3,] "z31" "z32" "z33"
## [4,] "z41" "z42" "z43"
## [5,] "z51" "z52" "z53"
## 'aa' is the offset/scaling
aa <- "zero"
## 'DD' and 'd' are for covariates
DD <- "zero"  # matrix(0,mm,1)
dd <- "zero"  # matrix(0,1,wk_last)
## 'RR' is var-cov matrix for obs errors
RR <- "diagonal and unequal"

4.3 Fit the model in MARSS

Now it’s time to fit our first DLM with MARSS. To do so, we need to create a three lists that we will need to pass to the MARSS() function:

  1. A list of specifications for the model’s vectors and matrices;
  2. A list of any initial values – MARSS will pick its own otherwise;
  3. A list of control parameters for the MARSS() function.
## list with specifications for model vectors/matrices
mod_list <- list(B=BB, U=uu, C=CC, c=cc, Q=QQ, Z=ZZ, A=aa, D=DD, d=dd, R=RR)
## list with model inits
init_list <- list(x0=matrix(rep(0,mm),mm,1))
## list with model control parameters
con_list <- list(maxit=3000, allow.degen=TRUE)

Now we can fit the model.

## fit MARSS
dfa_1 <- MARSS(y=dat, model=mod_list, inits=init_list, control=con_list)
## Success! abstol and log-log tests passed at 246 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
## 
## MARSS fit is
## Estimation method: kem 
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 246 iterations. 
## Log-likelihood: -692.9795 
## AIC: 1425.959   AICc: 1427.42   
##  
##                             Estimate
## Z.z11                         0.2738
## Z.z21                         0.4487
## Z.z31                         0.3170
## Z.z41                         0.4107
## Z.z51                         0.2553
## Z.z22                         0.3608
## Z.z32                        -0.3690
## Z.z42                        -0.0990
## Z.z52                        -0.3793
## Z.z33                         0.0185
## Z.z43                        -0.1404
## Z.z53                         0.1317
## R.(Cryptomonas,Cryptomonas)   0.1638
## R.(Diatoms,Diatoms)           0.2913
## R.(Greens,Greens)             0.8621
## R.(Unicells,Unicells)         0.3080
## R.(Other.algae,Other.algae)   0.5000
## x0.X1                         0.2218
## x0.X2                         1.8155
## x0.X3                        -4.8097
## 
## Standard errors have not been calculated. 
## Use MARSSparamCIs to compute CIs and bias estimates.

4.3.1 Interpreting the MARSS output

By now the MARSS output should look familiar. The first 12 parameter estimates Z.z## are the loadings of each observed time series on the 3 hidden states. The next 5 estimates R.(,) are the variances of the observation errors \((v_{i,t})\). The last 3 values, x0.X#, are the estimates of the initial states at \(t = 0\).

Recall that the estimates of the processes themselves (i.e., \(\mathbf{x})\) are contained in one of the list elements in our fitted MARSS object. Specifically, they are in mod_fit$states, and their respective standard errors are in mod_fit$states.se. For the names of all of the other objects, type names(dfa_1).

5 Results

Now we can plot the estimated states, factor loadings, and overall model fits to the data.

5.1 Plotting the processes & loadings

Here are plots of the three hidden processes (left column) and the loadings for each of phytoplankton groups (right column).

ylbl <- phytoplankton
w_ts <- seq(dim(dat)[2])
layout(matrix(c(1,2,3,4,5,6),mm,2),widths=c(2,1))
## par(mfcol=c(mm,2), mai=c(0.5,0.5,0.5,0.1), omi=c(0,0,0,0))
par(mai=c(0.5,0.5,0.5,0.1), omi=c(0,0,0,0))
## plot the processes
for(i in 1:mm) {
  ylm <- c(-1,1)*max(abs(proc_rot[i,]))
  ## set up plot area
    plot(w_ts,proc_rot[i,], type="n", bty="L",
         ylim=ylm, xlab="", ylab="", xaxt="n")
    ## draw zero-line
    abline(h=0, col="gray")
    ## plot trend line
    lines(w_ts,proc_rot[i,], lwd=2)
    lines(w_ts,proc_rot[i,], lwd=2)
    ## add panel labels
    mtext(paste("State",i), side=3, line=0.5)
    axis(1,12*(0:dim(dat_1980)[2])+1,yr_frst+0:dim(dat_1980)[2])
}
## plot the loadings
minZ <- 0
ylm <- c(-1,1)*max(abs(Z_rot))
for(i in 1:mm) {
  plot(c(1:N_ts)[abs(Z_rot[,i])>minZ], as.vector(Z_rot[abs(Z_rot[,i])>minZ,i]), type="h",
       lwd=2, xlab="", ylab="", xaxt="n", ylim=ylm, xlim=c(0.5,N_ts+0.5), col=clr)
    for(j in 1:N_ts) {
      if(Z_rot[j,i] > minZ) {text(j, -0.03, ylbl[j], srt=90, adj=1, cex=1.2, col=clr[j])}
      if(Z_rot[j,i] < -minZ) {text(j, 0.03, ylbl[j], srt=90, adj=0, cex=1.2, col=clr[j])}
      abline(h=0, lwd=1.5, col="gray")
      } 
  mtext(paste("Factor loadings on state",i),side=3,line=0.5)
}

It looks like there are strong seasonal cycles in the data, but there is some indication of a phase difference between some of the groups. We can use ccf() to investigate further.

par(mai=c(0.9,0.9,0.1,0.1))
ccf(proc_rot[1,],proc_rot[2,], lag.max = 12, main="")

5.2 Plotting the data & model fits

We can plot the fits for our DFA model along with the data. The following function will return the fitted values ± (1-\(\alpha\))% confidence intervals.

get_DFA_fits <- function(MLEobj,dd=NULL,alpha=0.05) {
  ## empty list for results
  fits <- list()
  ## extra stuff for var() calcs
  Ey <- MARSS:::MARSShatyt(MLEobj)
  ## model params
    ZZ <- coef(MLEobj, type="matrix")$Z
    ## number of obs ts
    nn <- dim(Ey$ytT)[1]
    ## number of time steps
    TT <- dim(Ey$ytT)[2]
    ## get the inverse of the rotation matrix
    H_inv <- varimax(ZZ)$rotmat
    ## check for covars
    if(!is.null(dd)) {
      DD <- coef(MLEobj, type="matrix")$D
      ## model expectation
      fits$ex <- ZZ %*% H_inv %*% MLEobj$states + DD %*% dd
    } else {
      ## model expectation
      fits$ex <- ZZ %*% H_inv %*% MLEobj$states
    }
    ## Var in model fits
    VtT <- MARSSkfss(MLEobj)$VtT
    VV <- NULL
    for(tt in 1:TT) {
      RZVZ <- coef(MLEobj, type="matrix")$R - ZZ%*%VtT[,,tt]%*%t(ZZ)
      SS <- Ey$yxtT[,,tt] - Ey$ytT[,tt,drop=FALSE] %*% t(MLEobj$states[,tt,drop=FALSE])
      VV <- cbind(VV,diag(RZVZ + SS%*%t(ZZ) + ZZ%*%t(SS)))
      }
    SE <- sqrt(VV)
    ## upper & lower (1-alpha)% CI
    fits$up <- qnorm(1-alpha/2)*SE + fits$ex
    fits$lo <- qnorm(alpha/2)*SE + fits$ex
    return(fits)
}

Here are time series of the five phytoplankton groups (points) with the mean of the DFA fits (black line) and the 95% confidence intervals (gray lines).

## get model fits & CI's
mod_fit <- get_DFA_fits(dfa_1)
## plot the fits
ylbl <- phytoplankton
par(mfrow=c(N_ts,1), mai=c(0.5,0.7,0.1,0.1), omi=c(0,0,0,0))
for(i in 1:N_ts) {
  up <- mod_fit$up[i,]
  mn <- mod_fit$ex[i,]
  lo <- mod_fit$lo[i,]
  plot(w_ts,mn,xlab="",ylab=ylbl[i],xaxt="n",type="n", cex.lab=1.2,
       ylim=c(min(lo),max(up)))
  axis(1,12*(0:dim(dat_1980)[2])+1,yr_frst+0:dim(dat_1980)[2])
  points(w_ts,dat[i,], pch=16, col=clr[i])
  lines(w_ts, up, col="darkgray")
  lines(w_ts, mn, col="black", lwd=2)
  lines(w_ts, lo, col="darkgray")
}

6 Covariates in DFA models

It is standard to add covariates to the analysis so that one removes known important drivers. The DFA with covariates is written:

\[ \begin{equation} \begin{gathered} \mathbf{x}_t = \mathbf{x}_{t-1}+\mathbf{w}_t \text{ where } \mathbf{w}_t \sim \text{MVN}(0,\mathbf{Q}) \\ \mathbf{y}_t = \mathbf{Z}\mathbf{x}_t+\mathbf{a}+\mathbf{D}\mathbf{d}_t+\mathbf{v}_t \text{ where } \mathbf{v}_t \sim \text{MVN}(0,\mathbf{R}) \end{gathered} \label{eq:dfa.cov} \end{equation} \]

where the \(q \times 1\) vector \(\mathbf{d}_t\) contains the covariate(s) at time \(t\), and the \(n \times q\) matrix \(\mathbf{D}\) contains the effect(s) of the covariate(s) on the observations. Using form="dfa" and covariates=<covariate name(s)>, we can easily add covariates to our DFA, but this means that the covariates are input, not data, and there can be no missing values (see Chapter 6 for how to include covariates with missing values).

6.1 Example from L Washington

The Lake Washington dataset has two environmental covariates that we might expect to have effects on phytoplankton growth, and hence, abundance: temperature (Temp) and total phosphorous (TP). We need the covariate inputs to have the same number of time steps as the variate data, and thus we limit the covariate data to the years 1980-1994 also.

temp <- t(plank_dat[,"Temp",drop=FALSE])
TP <- t(plank_dat[,"TP",drop=FALSE])

We will now fit 3 different models that each add covariate effects (i.e., Temp, TP, Temp & TP) to our existing model above where \(m\) = 3 and \(\mathbf{R}\) is "diagonal and unequal".

mod_list=list(m=3, R="diagonal and unequal")
dfa_temp <- MARSS(dat, model=mod_list, form="dfa", z.score=FALSE,
                  control=con_list, covariates=temp)
## Success! abstol and log-log tests passed at 356 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
## 
## MARSS fit is
## Estimation method: kem 
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 356 iterations. 
## Log-likelihood: -655.1684 
## AIC: 1354.337   AICc: 1356.103   
##  
##                             Estimate
## Z.11                         0.27811
## Z.21                         0.48712
## Z.31                         0.21689
## Z.41                         0.39254
## Z.51                         0.12172
## Z.22                         0.08616
## Z.32                        -0.06627
## Z.42                        -0.00464
## Z.52                         0.06902
## Z.33                         0.00853
## Z.43                         0.17030
## Z.53                        -0.05958
## R.(Cryptomonas,Cryptomonas)  0.14928
## R.(Diatoms,Diatoms)          0.38009
## R.(Greens,Greens)            0.77169
## R.(Unicells,Unicells)        0.31291
## R.(Other.algae,Other.algae)  0.56541
## D.Cryptomonas                0.01255
## D.Diatoms                   -0.28558
## D.Greens                     0.55504
## D.Unicells                   0.11923
## D.Other.algae                0.50401
## 
## Standard errors have not been calculated. 
## Use MARSSparamCIs to compute CIs and bias estimates.
dfa_TP <- MARSS(dat, model=mod_list, form="dfa", z.score=FALSE,
                control=con_list, covariates=TP)
## Success! abstol and log-log tests passed at 532 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
## 
## MARSS fit is
## Estimation method: kem 
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 532 iterations. 
## Log-likelihood: -683.9993 
## AIC: 1411.999   AICc: 1413.765   
##  
##                             Estimate
## Z.11                          0.2244
## Z.21                          0.5374
## Z.31                          0.1430
## Z.41                          0.3420
## Z.51                          0.0194
## Z.22                          0.1946
## Z.32                         -0.2901
## Z.42                         -0.1317
## Z.52                         -0.1246
## Z.33                          0.0736
## Z.43                         -0.1121
## Z.53                          0.1314
## R.(Cryptomonas,Cryptomonas)   0.1800
## R.(Diatoms,Diatoms)           0.3004
## R.(Greens,Greens)             0.7209
## R.(Unicells,Unicells)         0.3595
## R.(Other.algae,Other.algae)   0.6454
## D.Cryptomonas                -0.3711
## D.Diatoms                     0.1727
## D.Greens                     -2.2355
## D.Unicells                   -0.4322
## D.Other.algae                -1.6528
## 
## Standard errors have not been calculated. 
## Use MARSSparamCIs to compute CIs and bias estimates.
dfa_both <- MARSS(dat, model=mod_list, form="dfa", z.score=FALSE,
                  control=con_list, covariates=rbind(temp,TP))
## Success! abstol and log-log tests passed at 402 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
## 
## MARSS fit is
## Estimation method: kem 
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 402 iterations. 
## Log-likelihood: -652.5771 
## AIC: 1359.154   AICc: 1361.816   
##  
##                             Estimate
## Z.11                         0.26163
## Z.21                         0.45702
## Z.31                         0.22467
## Z.41                         0.41824
## Z.51                         0.09647
## Z.22                         0.08784
## Z.32                        -0.05975
## Z.42                        -0.02996
## Z.52                         0.07791
## Z.33                        -0.00600
## Z.43                         0.19853
## Z.53                        -0.06364
## R.(Cryptomonas,Cryptomonas)  0.15225
## R.(Diatoms,Diatoms)          0.38068
## R.(Greens,Greens)            0.77208
## R.(Unicells,Unicells)        0.27795
## R.(Other.algae,Other.algae)  0.56442
## D.(Cryptomonas,Temp)        -0.05572
## D.(Diatoms,Temp)            -0.43897
## D.(Greens,Temp)              0.51546
## D.(Unicells,Temp)            0.12395
## D.(Other.algae,Temp)         0.42784
## D.(Cryptomonas,TP)          -0.45097
## D.(Diatoms,TP)              -0.99230
## D.(Greens,TP)               -0.24677
## D.(Unicells,TP)              0.00397
## D.(Other.algae,TP)          -0.50023
## 
## Standard errors have not been calculated. 
## Use MARSSparamCIs to compute CIs and bias estimates.

Next we can compare whether the addition of the covariates improves the model fit.

print(cbind(model=c("no covars", "Temp", "TP", "Temp & TP"),
      AICc=round(c(dfa_1$AICc, dfa_temp$AICc, dfa_TP$AICc, dfa_both$AICc))),
      quote=FALSE)
##      model     AICc
## [1,] no covars 1427
## [2,] Temp      1356
## [3,] TP        1414
## [4,] Temp & TP 1362

This suggests that adding temperature or phosphorus to the model, either alone or in combination with one another, does seem to improve overall model fit. If we were truly interested in assessing the “best” model structure that includes covariates, however, we should examine all combinations of 1-4 trends and different structures for \(\mathbf{R}\).

Now let’s try to fit a model with a dummy variable for season, and see how that does.

cos_t <- cos(2 * pi * seq(TT) / 12)
sin_t <- sin(2 * pi * seq(TT) / 12)
dd <- rbind(cos_t,sin_t)
dfa_seas <- MARSS(dat_1980, model=mod_list, form="dfa", z.score=TRUE,
                  control=con_list, covariates=dd)
## Success! abstol and log-log tests passed at 384 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
## 
## MARSS fit is
## Estimation method: kem 
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 384 iterations. 
## Log-likelihood: -713.8464 
## AIC: 1481.693   AICc: 1484.355   
##  
##                             Estimate
## Z.11                         0.49562
## Z.21                         0.27206
## Z.31                         0.03354
## Z.41                         0.51692
## Z.51                         0.18981
## Z.22                         0.05290
## Z.32                        -0.08042
## Z.42                         0.06336
## Z.52                         0.06157
## Z.33                         0.02383
## Z.43                         0.19506
## Z.53                        -0.10800
## R.(Cryptomonas,Cryptomonas)  0.51583
## R.(Diatoms,Diatoms)          0.53296
## R.(Greens,Greens)            0.60329
## R.(Unicells,Unicells)        0.19787
## R.(Other.algae,Other.algae)  0.52977
## D.(Cryptomonas,cos_t)       -0.43973
## D.(Diatoms,cos_t)           -0.44836
## D.(Greens,cos_t)            -0.66003
## D.(Unicells,cos_t)          -0.34898
## D.(Other.algae,cos_t)       -0.42773
## D.(Cryptomonas,sin_t)        0.23672
## D.(Diatoms,sin_t)            0.72062
## D.(Greens,sin_t)            -0.46019
## D.(Unicells,sin_t)          -0.00873
## D.(Other.algae,sin_t)       -0.64228
## 
## Standard errors have not been calculated. 
## Use MARSSparamCIs to compute CIs and bias estimates.
dfa_seas$AICc
## [1] 1484.355

The model with a dummy seasonal factor does much better than the covariate models, but still not as well as the model with only 3 trends. The model fits for the seasonal effects model are shown below.

## get model fits & CI's
mod_fit <- get_DFA_fits(dfa_seas,dd=dd)
## plot the fits
ylbl <- phytoplankton
par(mfrow=c(N_ts,1), mai=c(0.5,0.7,0.1,0.1), omi=c(0,0,0,0))
for(i in 1:N_ts) {
  up <- mod_fit$up[i,]
  mn <- mod_fit$ex[i,]
  lo <- mod_fit$lo[i,]
  plot(w_ts,mn,xlab="",ylab=ylbl[i],xaxt="n",type="n", cex.lab=1.2,
       ylim=c(min(lo),max(up)))
  axis(1,12*(0:dim(dat_1980)[2])+1,yr_frst+0:dim(dat_1980)[2])
  points(w_ts,dat[i,], pch=16, col=clr[i])
  lines(w_ts, up, col="darkgray")
  lines(w_ts, mn, col="black", lwd=2)
  lines(w_ts, lo, col="darkgray")
}

7 Homework problems

For your homework this week, we will continue to investigate common trends in the Lake Washington plankton data.

7.1 Question 1

Fit other DFA models to the phytoplankton data with varying numbers of trends from 1-4 (we fit a 3-trend model above). Do not include any covariates in these models. Using R="diagonal and unequal" for the observation errors, which of the DFA models has the most support from the data?

Plot the model states and loadings as in Sec 5.1. Describe the general patterns in the states and the ways the different taxa load onto those trends.

Also plot the the model fits as in Sec 5.2. Do they reasonable? Are there any particular problems or outliers?

7.2 Question 2

How does the best model from Q1 compare to a DFA model with the same number of trends, but with R="unconstrained"?

Plot the model states and loadings as in Sec 5.1. Describe the general patterns in the states and the ways the different taxa load onto those trends.

Also plot the the model fits as in Sec 5.2. Do they reasonable? Are there any particular problems or outliers?

7.3 Question 3

Fit a DFA model that includes temperature as a covariate and 3 trends (as in Sec 6.1), but withR="unconstrained"? How does this model compare to the model with R="diagonal and unequal"? How does it compare to the model in Q2?

Plot the model states and loadings as in Sec 5.1. Describe the general patterns in the states and the ways the different taxa load onto those trends.

Also plot the the model fits as in Sec 5.2. Do they reasonable? Are there any particular problems or outliers?


References

Jorgensen JC, Ward EJ, Scheuerell MD, Zabel RW (2016) Assessing spatial covariance among time series of abundance. Ecology and Evolution 6:2472–2485. doi: 10.1002/ece3.2031

Lisi PJ, Schindler DE, Cline TJ et al (2015) Watershed geomorphology and snowmelt control stream thermal sensitivity to air temperature. Geophysical Research Letters 42:3380–3388. doi: 10.1002/2015gl064083

Ohlberger J, Scheuerell MD, Schindler DE (2016) Population coherence and environmental impacts across spatial scales: a case study of Chinook salmon. Ecosphere 7:e01333. doi: 10.1002/ecs2.1333

Stachura MM, Mantua NJ, Scheuerell MD (2014) Oceanographic influences on patterns in North Pacific salmon abundance. Canadian Journal of Fisheries and Aquatic Sciences 71:226–235. doi: 10.1139/cjfas-2013-0367

Zuur AF, Fryer RJ, Jolliffe IT et al (2003) Estimating common trends in multivariate time series using dynamic factor analysis. Environmetrics 14:665–685. doi: 10.1002/env.611