8.8 Homework data and discussion

For these problems, use the following code to load in 1980-1994 phytoplankton data, covariates, and z-score all the data. Run the code below and use dat and covars directly in your code.

library(MARSS)
spp <- c("Cryptomonas", "Diatoms", "Greens", "Unicells", "Other.algae", 
    "Daphnia")
yrs <- lakeWAplanktonTrans[, "Year"] %in% 1980:1994
dat <- t(lakeWAplanktonTrans[yrs, spp])
# z-score the data
avg <- apply(dat, 1, mean, na.rm = TRUE)
sd <- sqrt(apply(dat, 1, var, na.rm = TRUE))
dat <- (dat - avg)/sd
rownames(dat) = spp
# always check that the mean and variance are 1 after
# z-scoring
apply(dat, 1, mean, na.rm = TRUE)  #this should be 0
apply(dat, 1, var, na.rm = TRUE)  #this should be 1

For the covariates, you’ll use temperature and TP.

covars <- rbind(Temp = lakeWAplanktonTrans[yrs, "Temp"], TP = lakeWAplanktonTrans[yrs, 
    "TP"])
avg <- apply(covars, 1, mean)
sd <- sqrt(apply(covars, 1, var, na.rm = TRUE))
covars <- (covars - avg)/sd
rownames(covars) <- c("Temp", "TP")
# always check that the mean and variance are 1 after
# z-scoring
apply(covars, 1, mean, na.rm = TRUE)  #this should be 0
apply(covars, 1, var, na.rm = TRUE)  #this should be 1

Here are some guidelines to help you answer the questions:

Use a MARSS model that allows for both observation and process error.
Assume that the observation errors are independent and identically distributed with known variance of 0.10.
Assume that the process errors are independent from one another, but the variances differ by taxon.
Assume that each group is an observation of its own process. This means Z="identity".
Use B="diagonal and unequal". This implies that each of the taxa are operating under varying degrees of density-dependence, and they are not allowed to interact.
All the data have been de-meaned and \(\mathbf{Z}\) is identity, therefore use U="zero" and A="zero". Make sure to check that the means of the data are 0 and the variance is 1.
Use tinitx=1. It makes \(\mathbf{B}\) estimation more stable. It goes in your model list.
Include a plot of residuals versus time and acf of residuals for each question. You only need to show these for the top (best) model if the question involves comparing multiple models.
Use AICc to compare models.
Some of the models may not converge, however use for the purpose of the homework, use the unconverged models. Thus use the output from MARSS() without any additional arguments. If you want, you can try using control=list(maxit=1000) to increase the number of iterations. Or you can try method="BFGS" in your MARSS() call. This will use the BFGS optimization method, however it may throw an error for these data.