require(MARSS)
Loading required package: MARSS
require(forecast)
Loading required package: forecast
phytos = c("Cryptomonas", "Diatoms", "Greens", "Unicells", "Other.algae")
yrs = lakeWAplanktonTrans[, "Year"] %in% 1985:1994
dat = t(lakeWAplanktonTrans[yrs, phytos])
# z-score the data
avg = apply(dat, 1, mean, na.rm = TRUE)
sd = sqrt(apply(dat, 1, var, na.rm = TRUE))
dat = (dat - avg)/sd
rownames(dat) = phytos
# z-score the covariates
covars = rbind(Temp = lakeWAplanktonTrans[yrs, "Temp"], TP = lakeWAplanktonTrans[yrs,
"TP"])
avg = apply(covars, 1, mean)
sd = sqrt(apply(covars, 1, var, na.rm = TRUE))
covars = (covars - avg)/sd
rownames(covars) = c("Temp", "TP")
# always check that the mean and variance are 1 after
# z-scoring
apply(dat, 1, mean, na.rm = TRUE) #this should be 0
Cryptomonas Diatoms Greens Unicells Other.algae
2.329499e-17 1.463504e-17 -2.761472e-19 3.546358e-17 1.507041e-18
apply(dat, 1, var, na.rm = TRUE) #this should be 1
Cryptomonas Diatoms Greens Unicells Other.algae
1 1 1 1 1
Set up constants.
TT = dim(dat)[2] # length of time series
m = n = dim(dat)[1] # number of states & obs
period = 12 # data were collected monthly
Set up the model structures that will be used across all the questions.
common = list(B = "diagonal and unequal", U = "zero", Q = "diagonal and unequal",
Z = "identity", A = "zero", R = "diagonal and equal", tinitx = 0)
ctl = list(maxit = 500) #in case we want to compare
Note the models do not converge in the default of maxit=500
. You could either set maxit
higher by passing in control=list(maxit=2000)
or try method="BFGS"
. However, here results are shown with the defaults and in some cases, the models have not converged.
How does month affect the mean phytoplankton population growth rates? Show a plot of mean growth rate versus month. Estimate seasonal effects without any covariate (Temp, TP) effects.
Since the covariate affects growth rate, it appears in the \(\mathbf{x}\) equation not the \(\mathbf{y}\) equation. Thus we want to set up \(\mathbf{C}\) and \(\mathbf{c}\).
: Treat month as a factor and you will estimate a month effect for each month. You need a row for each month (Jan, Feb, …) and a column for each time point. You put a 1 in the Jan row for each Jan time point, repeat for the other months. The following code will create such a \(\mathbf{c}\) matrix.
c.fac = diag(period)
for (i in 2:(ceiling(TT/period))) {
c.fac = cbind(c.fac, diag(period))
}
the.months = month.abb
rownames(c.fac) = the.months
We want each month to have separate effect on each taxon (60 effects), so our \(\mathbf{C}\) matrix can just be specified as "unconstrained"
. Our covariates are only in \(\mathbf{x}\) so \(\mathbf{D}\) and \(\mathbf{d}\) are zero.
We set up the model list and fit:
C = "unconstrained"
c = c.fac
D = "zero"
d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q1a = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 486 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 486 iterations.
Log-likelihood: -651.766
AIC: 1455.532 AICc: 1478.04
Estimate
R.diag 0.0958
B.(X.Cryptomonas,X.Cryptomonas) 0.3121
B.(X.Diatoms,X.Diatoms) 0.4657
B.(X.Greens,X.Greens) 0.3204
B.(X.Unicells,X.Unicells) 0.6495
B.(X.Other.algae,X.Other.algae) 0.2015
Q.(X.Cryptomonas,X.Cryptomonas) 0.5537
Q.(X.Diatoms,X.Diatoms) 0.2421
Q.(X.Greens,X.Greens) 0.4664
Q.(X.Unicells,X.Unicells) 0.4141
Q.(X.Other.algae,X.Other.algae) 0.4312
x0.X.Cryptomonas -0.0766
x0.X.Diatoms -1.3587
x0.X.Greens 2.7993
x0.X.Unicells 1.2676
x0.X.Other.algae -1.5274
C.(X.Cryptomonas,Jan) -0.7928
C.(X.Diatoms,Jan) -0.0783
C.(X.Greens,Jan) -0.7265
C.(X.Unicells,Jan) -0.2588
C.(X.Other.algae,Jan) -1.0087
C.(X.Cryptomonas,Feb) 0.2178
C.(X.Diatoms,Feb) 0.3275
C.(X.Greens,Feb) -0.5705
C.(X.Unicells,Feb) 0.2111
C.(X.Other.algae,Feb) -0.7783
C.(X.Cryptomonas,Mar) 0.2678
C.(X.Diatoms,Mar) 0.7709
C.(X.Greens,Mar) -0.3081
C.(X.Unicells,Mar) 0.2141
C.(X.Other.algae,Mar) -0.3459
C.(X.Cryptomonas,Apr) 0.7667
C.(X.Diatoms,Apr) 0.7427
C.(X.Greens,Apr) -0.0183
C.(X.Unicells,Apr) 0.7463
C.(X.Other.algae,Apr) -0.0156
C.(X.Cryptomonas,May) 0.5519
C.(X.Diatoms,May) 0.6458
C.(X.Greens,May) 0.3305
C.(X.Unicells,May) -0.1908
C.(X.Other.algae,May) 0.1830
C.(X.Cryptomonas,Jun) 0.2017
C.(X.Diatoms,Jun) -0.0433
C.(X.Greens,Jun) 1.0091
C.(X.Unicells,Jun) -0.2287
C.(X.Other.algae,Jun) -0.1981
C.(X.Cryptomonas,Jul) -0.7215
C.(X.Diatoms,Jul) -0.2118
C.(X.Greens,Jul) 0.6025
C.(X.Unicells,Jul) 0.2395
C.(X.Other.algae,Jul) 0.5061
C.(X.Cryptomonas,Aug) -0.2006
C.(X.Diatoms,Aug) -0.7375
C.(X.Greens,Aug) 0.1143
C.(X.Unicells,Aug) 0.1380
C.(X.Other.algae,Aug) 0.6597
C.(X.Cryptomonas,Sep) -0.1057
C.(X.Diatoms,Sep) -0.9919
C.(X.Greens,Sep) -0.1624
C.(X.Unicells,Sep) -0.0550
C.(X.Other.algae,Sep) 0.5215
C.(X.Cryptomonas,Oct) 0.2331
C.(X.Diatoms,Oct) -0.1915
C.(X.Greens,Oct) 0.0885
C.(X.Unicells,Oct) -0.2098
C.(X.Other.algae,Oct) 0.7259
C.(X.Cryptomonas,Nov) 0.1383
C.(X.Diatoms,Nov) 0.0206
C.(X.Greens,Nov) 0.0318
C.(X.Unicells,Nov) -0.2794
C.(X.Other.algae,Nov) 0.4489
C.(X.Cryptomonas,Dec) -0.5750
C.(X.Diatoms,Dec) -0.2049
C.(X.Greens,Dec) -0.7270
C.(X.Unicells,Dec) -0.4392
C.(X.Other.algae,Dec) -0.6936
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Get the appropriate seasonal effects:
seas.a = coef(q1a, type = "matrix")$C
rownames(seas.a) = phytos
colnames(seas.a) = the.months
We could plot the seasonal effect by species with this code:
matplot(t(seas.a), type = "l", bty = "n", xaxt = "n", ylab = "Fixed monthly",
col = 1:5)
axis(1, labels = the.months, at = 1:12, las = 2, cex.axis = 0.75)
legend("topright", lty = 1:5, legend = phytos, cex = 0.6, col = 1:5)
Note, if we had set U="unequal"
, we would need to set one of the columns of \(\mathbf{C}\) to zero because the we would have created an under-determined problem (infinite solutions). Basically, we have a problem like \(10=y+z\). You can’t solve for both \(y\) and \(z\) in that case.
The factor approach required estimating 60 effects. Another approach is to model the temperature effect as a 3rd order (or higher) polynomial \(b m + c m^2 + d m^3\). This approach has less flexibility but requires only 20 \(\mathbf{C}\) parameters.
First we make month cov matrix with our \(m\), \(m^2\), and \(m^3\) in different rows:
poly.order = 3
month.cov = matrix(1, 1, period)
for (i in 1:poly.order) {
month.cov = rbind(month.cov, (1:12)^i)
}
# for c, month.cov is replicated 10 times (once for each
# year)
c.m.poly = matrix(month.cov, poly.order + 1, TT, byrow = FALSE)
Everything except \(\mathbf{c}\) remains the same.
C = "unconstrained"
c = c.m.poly
D = "zero"
d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q1b = MARSS(dat, model = model.list, control = ctl)
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: -691.2546
AIC: 1454.509 AICc: 1459.266
Estimate
R.diag 0.061143
B.(X.Cryptomonas,X.Cryptomonas) 0.244085
B.(X.Diatoms,X.Diatoms) 0.365194
B.(X.Greens,X.Greens) 0.318704
B.(X.Unicells,X.Unicells) 0.589248
B.(X.Other.algae,X.Other.algae) 0.155030
Q.(X.Cryptomonas,X.Cryptomonas) 0.736705
Q.(X.Diatoms,X.Diatoms) 0.346130
Q.(X.Greens,X.Greens) 0.573557
Q.(X.Unicells,X.Unicells) 0.518047
Q.(X.Other.algae,X.Other.algae) 0.519240
x0.X.Cryptomonas -0.933557
x0.X.Diatoms -1.577440
x0.X.Greens 3.352402
x0.X.Unicells 1.088766
x0.X.Other.algae -2.387367
C.(X.Cryptomonas,1) -1.398504
C.(X.Diatoms,1) -1.219105
C.(X.Greens,1) -1.465674
C.(X.Unicells,1) -0.422665
C.(X.Other.algae,1) -0.961017
C.(X.Cryptomonas,2) 0.970363
C.(X.Diatoms,2) 1.296856
C.(X.Greens,2) 0.587269
C.(X.Unicells,2) 0.338610
C.(X.Other.algae,2) -0.044937
C.(X.Cryptomonas,3) -0.157394
C.(X.Diatoms,3) -0.254391
C.(X.Greens,3) -0.047262
C.(X.Unicells,3) -0.050380
C.(X.Other.algae,3) 0.078025
C.(X.Cryptomonas,4) 0.007135
C.(X.Diatoms,4) 0.012965
C.(X.Greens,4) 0.000351
C.(X.Unicells,4) 0.001860
C.(X.Other.algae,4) -0.005862
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.diag parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
The effect of month \(m\) \(C_1 m + C_2 m^2 + C_3 m^3\). We can make our seasonal effect matrix (like seas above):
C.b = coef(q1b, type = "matrix")$C
seas.b = C.b %*% month.cov
rownames(seas.b) = phytos
colnames(seas.b) = the.months
The factor approach required estimating 60 effects, and the 3rd order polynomial model was an improvement at only 20 parameters. Another option is using a combination of sine and cosine waves, which would require only 10 parameters.
Begin by defining the seasonal covariates as
cos.t = cos(2 * pi * seq(TT)/period)
sin.t = sin(2 * pi * seq(TT)/period)
c.Four = rbind(cos.t, sin.t)
Everything except \(\mathbf{c}\) remains the same.
C = "unconstrained"
c = c.Four
D = "zero"
d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q1c = MARSS(dat, model = model.list, control = ctl)
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: -698.8021
AIC: 1449.604 AICc: 1452.067
Estimate
R.diag 0.0502
B.(X.Cryptomonas,X.Cryptomonas) 0.2940
B.(X.Diatoms,X.Diatoms) 0.3974
B.(X.Greens,X.Greens) 0.2777
B.(X.Unicells,X.Unicells) 0.5876
B.(X.Other.algae,X.Other.algae) 0.1833
Q.(X.Cryptomonas,X.Cryptomonas) 0.7785
Q.(X.Diatoms,X.Diatoms) 0.3569
Q.(X.Greens,X.Greens) 0.5815
Q.(X.Unicells,X.Unicells) 0.5444
Q.(X.Other.algae,X.Other.algae) 0.5759
x0.X.Cryptomonas -2.5409
x0.X.Diatoms -2.4183
x0.X.Greens 2.7543
x0.X.Unicells 1.0305
x0.X.Other.algae -3.9474
C.(X.Cryptomonas,cos.t) -0.2059
C.(X.Diatoms,cos.t) -0.1414
C.(X.Greens,cos.t) -0.5800
C.(X.Unicells,cos.t) -0.2153
C.(X.Other.algae,cos.t) -0.3124
C.(X.Cryptomonas,sin.t) 0.2351
C.(X.Diatoms,sin.t) 0.6836
C.(X.Greens,sin.t) -0.2524
C.(X.Unicells,sin.t) 0.1505
C.(X.Other.algae,sin.t) -0.6188
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.diag parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
Now make our seasonal effect matrix (like seas above). The net seasonal effect is then \(C_a \cos(2 \pi t/12) + C_b \sin(2 \pi t/12)\).
C.c = coef(q1c, type = "matrix")$C
seas.c = C.c %*% c.Four[, 1:period]
rownames(seas.c) = phytos
colnames(seas.c) = the.months
There are two ways that you might approach this. One is to fit a model with both covariates and look at the \(\mathbf{C}\) matrix. The other is to fit different models and compare model support.
Fit a model with both covariates and look at the estimated effects (Figure ). We get the estimates but it is not obvious how to compare which is more important. That takes us to option 2.
C = "unconstrained"
c = covars
D = d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q2.both = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 198 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 198 iterations.
Log-likelihood: -712.7631
AIC: 1477.526 AICc: 1479.989
Estimate
R.diag 0.1074
B.(X.Cryptomonas,X.Cryptomonas) 0.3367
B.(X.Diatoms,X.Diatoms) 0.6430
B.(X.Greens,X.Greens) 0.2871
B.(X.Unicells,X.Unicells) 0.6487
B.(X.Other.algae,X.Other.algae) 0.2224
Q.(X.Cryptomonas,X.Cryptomonas) 0.7409
Q.(X.Diatoms,X.Diatoms) 0.2964
Q.(X.Greens,X.Greens) 0.5782
Q.(X.Unicells,X.Unicells) 0.5031
Q.(X.Other.algae,X.Other.algae) 0.5187
x0.X.Cryptomonas -2.3631
x0.X.Diatoms -1.6727
x0.X.Greens 2.5002
x0.X.Unicells 0.8301
x0.X.Other.algae -3.2930
C.(X.Cryptomonas,Temp) -0.2489
C.(X.Diatoms,Temp) -0.5425
C.(X.Greens,Temp) 0.1097
C.(X.Unicells,Temp) -0.1152
C.(X.Other.algae,Temp) 0.4555
C.(X.Cryptomonas,TP) -0.2215
C.(X.Diatoms,TP) -0.2057
C.(X.Greens,TP) -0.2926
C.(X.Unicells,TP) -0.0738
C.(X.Other.algae,TP) -0.0174
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
plot(c(rep(1, 5), rep(2, 5)), coef(q2.both)$C, xlab = "", ylab = "effect",
xaxt = "n", xlim = c(0, 3))
axis(1, at = c(1, 2), labels = c("Temp", "TP"))
abline(h = 0)
title("effect (C) estimates)")
Use model selection to compare support. The idea here is to fit different MARSS models: one where temperature affects growth rates, one where TP affects growth rates, and one where both affect growth rate. The question did not specify what (if any) lags to use for your covariates. We are going to use lag-0 for both temperature and TP. However we will see at the end this is not a a very good model.
Our \(\mathbf{c}\), \(\mathbf{D}\) and \(\mathbf{d}\) matrices will stay the same, so we set-up those up once.
c = covars
D = d = "zero"
Create a \(\mathbf{C}\) matrix where temperature (only) has a linear effect on population growth. That means the effect of TP is 0. Then fit:
C = matrix(list(0), 5, 2)
C[, 1] = paste("Temp", 1:5, sep = "")
C
[,1] [,2]
[1,] "Temp1" 0
[2,] "Temp2" 0
[3,] "Temp3" 0
[4,] "Temp4" 0
[5,] "Temp5" 0
model.list = c(common, list(C = C, c = c, D = D, d = d))
q2.Temp = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 176 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 176 iterations.
Log-likelihood: -718.6454
AIC: 1479.291 AICc: 1480.898
Estimate
R.diag 0.1386
B.(X.Cryptomonas,X.Cryptomonas) 0.3908
B.(X.Diatoms,X.Diatoms) 0.7173
B.(X.Greens,X.Greens) 0.2706
B.(X.Unicells,X.Unicells) 0.6825
B.(X.Other.algae,X.Other.algae) 0.2323
Q.(X.Cryptomonas,X.Cryptomonas) 0.7180
Q.(X.Diatoms,X.Diatoms) 0.2577
Q.(X.Greens,X.Greens) 0.5835
Q.(X.Unicells,X.Unicells) 0.4660
Q.(X.Other.algae,X.Other.algae) 0.4855
x0.X.Cryptomonas -2.3169
x0.X.Diatoms -1.5995
x0.X.Greens 2.2047
x0.X.Unicells 0.7925
x0.X.Other.algae -3.2470
C.Temp1 -0.0778
C.Temp2 -0.3772
C.Temp3 0.3493
C.Temp4 -0.0631
C.Temp5 0.4649
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Create a model where TP (only) has a linear effect on population growth and fit:
C = matrix(list(0), 5, 2)
C[, 2] = paste("TP", 1:5, sep = "")
C
[,1] [,2]
[1,] 0 "TP1"
[2,] 0 "TP2"
[3,] 0 "TP3"
[4,] 0 "TP4"
[5,] 0 "TP5"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q2.TP = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 249 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 249 iterations.
Log-likelihood: -734.5589
AIC: 1511.118 AICc: 1512.725
Estimate
R.diag 0.0917
B.(X.Cryptomonas,X.Cryptomonas) 0.3545
B.(X.Diatoms,X.Diatoms) 0.7334
B.(X.Greens,X.Greens) 0.3193
B.(X.Unicells,X.Unicells) 0.6505
B.(X.Other.algae,X.Other.algae) 0.3356
Q.(X.Cryptomonas,X.Cryptomonas) 0.7793
Q.(X.Diatoms,X.Diatoms) 0.4221
Q.(X.Greens,X.Greens) 0.5942
Q.(X.Unicells,X.Unicells) 0.5267
Q.(X.Other.algae,X.Other.algae) 0.5910
x0.X.Cryptomonas -2.2095
x0.X.Diatoms -1.4527
x0.X.Greens 2.1540
x0.X.Unicells 0.8032
x0.X.Other.algae -2.4403
C.TP1 -0.0144
C.TP2 0.2629
C.TP3 -0.3691
C.TP4 0.0227
C.TP5 -0.3520
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Create a model where both temperature and TP have linear effects:
C = "unconstrained"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q2.both = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 198 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 198 iterations.
Log-likelihood: -712.7631
AIC: 1477.526 AICc: 1479.989
Estimate
R.diag 0.1074
B.(X.Cryptomonas,X.Cryptomonas) 0.3367
B.(X.Diatoms,X.Diatoms) 0.6430
B.(X.Greens,X.Greens) 0.2871
B.(X.Unicells,X.Unicells) 0.6487
B.(X.Other.algae,X.Other.algae) 0.2224
Q.(X.Cryptomonas,X.Cryptomonas) 0.7409
Q.(X.Diatoms,X.Diatoms) 0.2964
Q.(X.Greens,X.Greens) 0.5782
Q.(X.Unicells,X.Unicells) 0.5031
Q.(X.Other.algae,X.Other.algae) 0.5187
x0.X.Cryptomonas -2.3631
x0.X.Diatoms -1.6727
x0.X.Greens 2.5002
x0.X.Unicells 0.8301
x0.X.Other.algae -3.2930
C.(X.Cryptomonas,Temp) -0.2489
C.(X.Diatoms,Temp) -0.5425
C.(X.Greens,Temp) 0.1097
C.(X.Unicells,Temp) -0.1152
C.(X.Other.algae,Temp) 0.4555
C.(X.Cryptomonas,TP) -0.2215
C.(X.Diatoms,TP) -0.2057
C.(X.Greens,TP) -0.2926
C.(X.Unicells,TP) -0.0738
C.(X.Other.algae,TP) -0.0174
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Use AIC to compare the 3 models:
data.frame(Model = c("Temp", "TP", "Both"), AICc = round(c(q2.Temp$AICc,
q2.TP$AICc, q2.both$AICc), 1))
Model AICc
1 Temp 1480.9
2 TP 1512.7
3 Both 1480.0
The AICc for the Temp model is 31.83 units lower than that for TP, offering support for lag-0 temperature being more important than lag-0 TP in explaining phytoplankton community abundance. The model including both Temp and TP is <2 different in AICc units than the Temp-only model, which suggests that adding an extra covariate (and parameter) to the model does not really improve model fit.
However, none of the Temp or TP models are particularly good compared to any of the purely seasonal (month) effect models. It’s easy to compare all of them using AICc.
mod.names = c("Temp", "TP", "Both", "Fixed", "Cubic", "Fourier")
AICc.all = c(q2.Temp$AICc, q2.TP$AICc, q2.both$AICc, q1a$AICc,
q1b$AICc, q1c$AIC)
delta.AICc = round(AICc.all - min(AICc.all), 1)
# model selection results sorted from best (top) to worst
# (bottom)
df <- data.frame(Model = mod.names, delta.AICc = delta.AICc)[order(delta.AICc),
]
kable(df, row.names = FALSE)
Model | delta.AICc |
---|---|
Fourier | 0.0 |
Cubic | 9.7 |
Fixed | 28.4 |
Both | 30.4 |
Temp | 31.3 |
TP | 63.1 |
Diagnostics Temperature and TP model residuals for all taxa except Greens appear to show significant negative autocorrelation at lag=1 (Figure and Figure ), suggesting that both models are inadequate to capture all of the systematic variation in phytoplankton abundance.
par(mfrow = c(5, 2), mai = c(0.25, 0.6, 0.2, 0.2), omi = c(0,
0, 0, 0))
for (i in 1:5) {
plot.ts(residuals(q2.Temp)$model.residuals[i, ], ylab = "Residual",
main = phytos[i])
abline(h = 0, lty = "dashed", xlab = "")
acf(residuals(q2.Temp)$model.residuals[i, ], xlab = "")
}
par(mfrow = c(5, 2), mai = c(0.25, 0.6, 0.2, 0.2), omi = c(0,
0, 0, 0))
for (i in 1:5) {
plot.ts(residuals(q2.TP)$model.residuals[i, ], ylab = "Residual",
main = phytos[i], xlab = "")
abline(h = 0, lty = "dashed")
acf(residuals(q2.TP)$model.residuals[i, ], xlab = "")
}
Evaluate whether the effect of temperature on phytoplankton manifests itself via their underlying physiology (by affecting algal growth rates and thus abundance) or because physical changes in the water stratification makes them easier/harder to sample in some months. Leave out the seasonal covariates from question 1, i.e. only use Temp and TP as covariates.
The idea here is to fit two different MARSS models: one with the Temp effect in the process eqn and the other with it in the observation equation. We already fit the temperature-effect model for the process in the 1st part of question 2, so now fit one with temperature in the observation equation instead. Now we want \(\mathbf{D}\) and \(\mathbf{d}\) instead of \(\mathbf{C}\) and \(\mathbf{c}\) that we used for the process equation.
D = "unconstrained"
d = covars["Temp", , drop = FALSE]
C = c = "zero"
Create appropriate model list and fit the model.
model.list = c(common, list(C = C, c = c, D = D, d = d))
q3.Tobs = MARSS(dat, model = model.list, control = ctl)
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: -728.5586
AIC: 1499.117 AICc: 1500.724
Estimate
R.diag 0.0276
B.(X.Cryptomonas,X.Cryptomonas) 0.3378
B.(X.Diatoms,X.Diatoms) 0.5965
B.(X.Greens,X.Greens) 0.3415
B.(X.Unicells,X.Unicells) 0.6043
B.(X.Other.algae,X.Other.algae) 0.1790
Q.(X.Cryptomonas,X.Cryptomonas) 0.8485
Q.(X.Diatoms,X.Diatoms) 0.5240
Q.(X.Greens,X.Greens) 0.6475
Q.(X.Unicells,X.Unicells) 0.6067
Q.(X.Other.algae,X.Other.algae) 0.6128
x0.X.Cryptomonas -2.5703
x0.X.Diatoms -2.0283
x0.X.Greens 2.2103
x0.X.Unicells 0.9552
x0.X.Other.algae -3.3616
D.Cryptomonas -0.0505
D.Diatoms -0.3893
D.Greens 0.5072
D.Unicells 0.0923
D.Other.algae 0.5646
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.diag parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
Now use AIC to compare the 2 models.
df <- data.frame(Model = c("proc", "obs"), AICc = round(c(q2.Temp$AICc,
q3.Tobs$AICc), 1))
kable(df, row.names = FALSE)
Model | AICc |
---|---|
proc | 1480.9 |
obs | 1500.7 |
The AICc for the model with a temperature effect in the process equation is 19.83 units lower than the model with the effect in the observation equation. Thus, we conclude that the data support the hypothesis of a temperature effect on the physiology more than an observation/sampling phenomenon.
Diagnostics Similar to the temperature-effect in the process model, residuals for these models also show significant negative autocorrelation at lag=1 (Figure not shown but code is similar to that used in the previous question).
Is there support for temperature or TP affecting all functional groups’ growth rates the same, or are the effects on one taxon different from another?
Option 1: Analyze temperature and TP effects one at a time. The problem with this is that perhaps including both in the model might change the conclusions. The idea here is to fit four different MARSS models (2 each for temperature and TP). Each one will assume covariates affect the process. Two models will have effects that vary by taxon as in Q2 (ie, C = “unconstrained”) whereas the other 2 will assume the covariates affect all taxa equally.
# 1st: the Temp model
C = "equal"
c = covars["Temp", , drop = FALSE]
D = d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q4.Temp = MARSS(dat, model = model.list, control = ctl)
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: -758.4122
AIC: 1550.824 AICc: 1551.881
Estimate
R.diag 0.02395
B.(X.Cryptomonas,X.Cryptomonas) 0.33549
B.(X.Diatoms,X.Diatoms) 0.63113
B.(X.Greens,X.Greens) 0.46323
B.(X.Unicells,X.Unicells) 0.60355
B.(X.Other.algae,X.Other.algae) 0.41653
Q.(X.Cryptomonas,X.Cryptomonas) 0.85369
Q.(X.Diatoms,X.Diatoms) 0.57679
Q.(X.Greens,X.Greens) 0.77365
Q.(X.Unicells,X.Unicells) 0.61444
Q.(X.Other.algae,X.Other.algae) 0.79059
x0.X.Cryptomonas -2.40741
x0.X.Diatoms -1.15107
x0.X.Greens 0.25745
x0.X.Unicells 0.75183
x0.X.Other.algae -3.14216
C.1 -0.00193
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.diag parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
Use AIC to compare the 2 Temp models
data.frame(Effect = c("taxon-specific", "equal"), AICc = round(c(q2.Temp$AICc,
q4.Temp$AICc), 1))
Effect AICc
1 taxon-specific 1480.9
2 equal 1551.9
The AICc for the model with taxon-specific temperature effects is greater than 2 lower than for the model that assumes a equal temperature effect, which supports the hypothesis of varying temperature effects by taxon.
# 2nd: the TP model
C = "equal"
c = covars["TP", , drop = FALSE]
D = d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q4.TP = MARSS(dat, model = model.list, control = ctl)
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: -756.5644
AIC: 1547.129 AICc: 1548.186
Estimate
R.diag 0.0200
B.(X.Cryptomonas,X.Cryptomonas) 0.3138
B.(X.Diatoms,X.Diatoms) 0.6121
B.(X.Greens,X.Greens) 0.4291
B.(X.Unicells,X.Unicells) 0.5728
B.(X.Other.algae,X.Other.algae) 0.3899
Q.(X.Cryptomonas,X.Cryptomonas) 0.8613
Q.(X.Diatoms,X.Diatoms) 0.6241
Q.(X.Greens,X.Greens) 0.7360
Q.(X.Unicells,X.Unicells) 0.6274
Q.(X.Other.algae,X.Other.algae) 0.7506
x0.X.Cryptomonas -2.2167
x0.X.Diatoms -1.0087
x0.X.Greens 0.5386
x0.X.Unicells 0.9780
x0.X.Other.algae -3.0629
C.1 -0.0758
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.diag parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
Now use AIC to compare the 2 TP models
data.frame(Effect = c("taxon-specific", "equal"), AICc = round(c(q2.TP$AICc,
q4.TP$AICc), 1))
Effect AICc
1 taxon-specific 1512.7
2 equal 1548.2
The AICc for the TP model with taxon-specific effects is greater than 2 lower, which strongly supports the hypothesis of non-equal TP effects. Repeating the same diagnostic plot used in question 2 reveals that these models also have autocorrelation in the residuals suggesting that these models are also still inadequate.
*Option 2**: Analyze temperature and TP effects together in one model. The problem with this is that temperature and TP are correlated because they both have strong seasonal cycle. The idea here is to fit four different MARSS models. Each one will assume covariates affect the process. Two models will have effects that vary by taxon as in Q2 for one covariate. The other two models will have unconstrained \(\mathbf{C}\) or \(\mathbf{C}\) equal for both covariates (but not across covariates).
# 1st: unconstrained
C = "unconstrained"
c = covars
D = d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q4.unc = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 198 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 198 iterations.
Log-likelihood: -712.7631
AIC: 1477.526 AICc: 1479.989
Estimate
R.diag 0.1074
B.(X.Cryptomonas,X.Cryptomonas) 0.3367
B.(X.Diatoms,X.Diatoms) 0.6430
B.(X.Greens,X.Greens) 0.2871
B.(X.Unicells,X.Unicells) 0.6487
B.(X.Other.algae,X.Other.algae) 0.2224
Q.(X.Cryptomonas,X.Cryptomonas) 0.7409
Q.(X.Diatoms,X.Diatoms) 0.2964
Q.(X.Greens,X.Greens) 0.5782
Q.(X.Unicells,X.Unicells) 0.5031
Q.(X.Other.algae,X.Other.algae) 0.5187
x0.X.Cryptomonas -2.3631
x0.X.Diatoms -1.6727
x0.X.Greens 2.5002
x0.X.Unicells 0.8301
x0.X.Other.algae -3.2930
C.(X.Cryptomonas,Temp) -0.2489
C.(X.Diatoms,Temp) -0.5425
C.(X.Greens,Temp) 0.1097
C.(X.Unicells,Temp) -0.1152
C.(X.Other.algae,Temp) 0.4555
C.(X.Cryptomonas,TP) -0.2215
C.(X.Diatoms,TP) -0.2057
C.(X.Greens,TP) -0.2926
C.(X.Unicells,TP) -0.0738
C.(X.Other.algae,TP) -0.0174
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
# 2nd: the TP unconstrained, Temp constrained
C = matrix("Temp", 5, 2)
C[, 2] = paste("TP", 1:5)
c = covars
D = d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q4.TPunc = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 219 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 219 iterations.
Log-likelihood: -732.5287
AIC: 1509.057 AICc: 1510.82
Estimate
R.diag 0.1449
B.(X.Cryptomonas,X.Cryptomonas) 0.3634
B.(X.Diatoms,X.Diatoms) 0.7502
B.(X.Greens,X.Greens) 0.3762
B.(X.Unicells,X.Unicells) 0.6672
B.(X.Other.algae,X.Other.algae) 0.3999
Q.(X.Cryptomonas,X.Cryptomonas) 0.7032
Q.(X.Diatoms,X.Diatoms) 0.3011
Q.(X.Greens,X.Greens) 0.5516
Q.(X.Unicells,X.Unicells) 0.4603
Q.(X.Other.algae,X.Other.algae) 0.5629
x0.X.Cryptomonas -2.1869
x0.X.Diatoms -1.3909
x0.X.Greens 1.8029
x0.X.Unicells 0.8766
x0.X.Other.algae -2.1783
C.Temp -0.1350
C.TP 1 -0.1229
C.TP 2 0.1613
C.TP 3 -0.4573
C.TP 4 -0.0862
C.TP 5 -0.4469
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
# 3rd: the TP constrained, Temp unconstrained
C = matrix("TP", 5, 2)
C[, 1] = paste("Temp", 1:5)
c = covars
D = d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q4.Tempunc = MARSS(dat, model = model.list, control = ctl)
Success! abstol and log-log tests passed at 182 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 182 iterations.
Log-likelihood: -714.4338
AIC: 1472.868 AICc: 1474.631
Estimate
R.diag 0.116
B.(X.Cryptomonas,X.Cryptomonas) 0.352
B.(X.Diatoms,X.Diatoms) 0.661
B.(X.Greens,X.Greens) 0.281
B.(X.Unicells,X.Unicells) 0.633
B.(X.Other.algae,X.Other.algae) 0.238
Q.(X.Cryptomonas,X.Cryptomonas) 0.731
Q.(X.Diatoms,X.Diatoms) 0.284
Q.(X.Greens,X.Greens) 0.578
Q.(X.Unicells,X.Unicells) 0.499
Q.(X.Other.algae,X.Other.algae) 0.515
x0.X.Cryptomonas -2.348
x0.X.Diatoms -1.651
x0.X.Greens 2.348
x0.X.Unicells 0.936
x0.X.Other.algae -2.853
C.Temp 1 -0.201
C.Temp 2 -0.505
C.Temp 3 0.218
C.Temp 4 -0.180
C.Temp 5 0.335
C.TP -0.159
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
# 4th: both constrained
C = matrix("TP", 5, 2)
C[, 1] = "Temp"
c = covars
D = d = "zero"
model.list = c(common, list(C = C, c = c, D = D, d = d))
q4.cons = MARSS(dat, model = model.list, control = ctl)
Warning! Abstol convergence only. Maxit (=500) reached before log-log convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=500) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Log-likelihood: -754.1381
AIC: 1544.276 AICc: 1545.46
Estimate
R.diag 0.0247
B.(X.Cryptomonas,X.Cryptomonas) 0.3067
B.(X.Diatoms,X.Diatoms) 0.5926
B.(X.Greens,X.Greens) 0.4713
B.(X.Unicells,X.Unicells) 0.5693
B.(X.Other.algae,X.Other.algae) 0.4251
Q.(X.Cryptomonas,X.Cryptomonas) 0.8366
Q.(X.Diatoms,X.Diatoms) 0.5598
Q.(X.Greens,X.Greens) 0.7464
Q.(X.Unicells,X.Unicells) 0.6145
Q.(X.Other.algae,X.Other.algae) 0.7965
x0.X.Cryptomonas -2.3718
x0.X.Diatoms -1.0903
x0.X.Greens 0.4251
x0.X.Unicells 0.9384
x0.X.Other.algae -2.8902
C.Temp -0.1469
C.TP -0.1799
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Convergence warnings
Warning: the R.diag parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
df <- data.frame(Effect = c("unconstrained", "TP constrained",
"Temp constrained", "both constrained"), AICc = round(c(q4.unc$AICc,
q4.Tempunc$AICc, q4.TPunc$AICc, q4.cons$AICc), 1))
kable(df, row.names = FALSE)
Effect | AICc |
---|---|
unconstrained | 1480.0 |
TP constrained | 1474.6 |
Temp constrained | 1510.8 |
both constrained | 1545.5 |
Compare your results for questions 2-4 using a observation error only model, by using the lm()
function.
This question is open-ended. We were looking for you to try to answer questions 2-4 using lm()
. There are many approaches you could take.
if you are investigating a temperature (T) and TP effect. Notice I did not include a intercept since all the data are de-meaned.
A process model would try to explain the error between \(y_t\) and \(\beta y_{t-1}\). \[\begin{equation} y(t) = \beta y(t-1) + \alpha_{month} + w(t) \end{equation}\] if you are investigating a month effect or \[\begin{equation} y(t) = \beta y(t-1) + \beta_{temp} T + \beta_{tp}TP + w(t) \end{equation}\]lm()
works best with a dataframe. We will have one column for the response variable (phytoplankton abundance), a factor column for taxon, a factor column for month, and two columns for temperature and TP (which are the same for each taxon). I need the abundance in the previous month as one of my predictors. Always be careful to match your covariate \(t\) to the right response \(t\). How to match them up depends on your problem or question.
dat.frame = data.frame(abund = as.vector(t(dat)), abund.lag1 = as.vector(t(cbind(NA,
dat[, 1:119]))), taxon = rep(phytos, each = 120), month = rep(month.abb,
5 * 10), temp = rep(covars[1, ], 5), tp = rep(covars[2, ],
5))
Take a look at the data frame as it changes to another taxon. Notice how the abundance in the previous month is set to NA as it should be in Jan at the start of the time series.
kable(dat.frame[118:125, 1:5], row.names = FALSE)
abund | abund.lag1 | taxon | month | temp |
---|---|---|---|---|
-0.8623450 | -0.2102360 | Cryptomonas | Oct | 0.8466326 |
0.7092607 | -0.8623450 | Cryptomonas | Nov | 0.0015140 |
-0.7070257 | 0.7092607 | Cryptomonas | Dec | -0.9059344 |
-0.7413776 | NA | Diatoms | Jan | -1.2330640 |
0.2085792 | -0.7413776 | Diatoms | Feb | -1.4642953 |
1.0206142 | 0.2085792 | Diatoms | Mar | -1.3406134 |
0.8039275 | 1.0206142 | Diatoms | Apr | -0.8464835 |
1.5061781 | 0.8039275 | Diatoms | May | -0.1651877 |
Question 1 was not part of Problem 5, but let’s address it. Q1 asks us to estimate the season effect on population growth rate. We will just do this for the case where we treat season as a factor. First let’s set up a holder for the season effects.
seaslm = matrix(NA, 5, 12, dimnames = list(phytos, month.abb))
Here is how we fit an lm() model with a month effect for each taxon. The taxon:abund.lag1
part is the \(b\) estimates and the taxon:month
part is the month effects by taxon.
fit = lm(abund ~ -1 + taxon:month, data = dat.frame)
This gives us the month effects for each taxon but they are in alphabetical order. I’m going to use the stringr package to help get these coefficients out. Also I need to re-order the months from alphabetical to temporal.
require(stringr)
Loading required package: stringr
mon.order = c(5, 4, 8, 1, 9, 7, 6, 2, 12, 11, 10, 3)
for (taxon in phytos) {
taxoncoef = str_detect(names(coef(fit)), paste(taxon, ":month",
sep = ""))
tmp = coef(fit)[taxoncoef]
# order the months
seaslm[taxon, ] = tmp[mon.order]
}
We could plot the seasonal effect by species with this code. Figure compares the estimates with the state-space model.
matplot(t(seaslm), type = "l", bty = "n", xaxt = "n", ylab = "Fixed monthly",
col = 1:5)
axis(1, labels = month.abb, at = 1:12, las = 2, cex.axis = 0.75)
legend("topright", lty = 1:5, legend = phytos, cex = 0.6, col = 1:5)
The results are quite similar to the state-space model.
par(mfrow = c(2, 1), mar = c(2, 2, 2, 2))
matplot(t(seas.a), type = "l", bty = "n", xaxt = "n", ylab = "Fixed monthly",
col = 1:5, ylim = c(-1.1, 1.1))
abline(h = 0)
axis(1, labels = month.abb, at = 1:12, las = 2, cex.axis = 0.75)
legend("topright", lty = 1:5, legend = phytos, cex = 0.6, col = 1:5)
title("MARSS estimates")
matplot(t(seaslm), type = "l", bty = "n", xaxt = "n", ylab = "Fixed monthly",
col = 1:5, ylim = c(-1.1, 1.1))
abline(h = 0)
axis(1, labels = month.abb, at = 1:12, las = 2, cex.axis = 0.75)
title("lm estimates")
For this one, we will fit models with temperature and TP as explanatory variables. First let’s set up holders for the temperature and TP effects.
Clm = matrix(NA, 5, 2, dimnames = list(phytos, c("Temp", "TP")))
Then fit to the differenced data and store the estimates. Figure compares the estimates from the state-space model versus lm()
. They are very similar.
fit = lm(abund ~ -1 + taxon:temp + taxon:tp, data = dat.frame)
require(stringr)
taxoncoef = str_detect(names(coef(fit)), ":temp")
Clm[, 1] = coef(fit)[taxoncoef]
taxoncoef = str_detect(names(coef(fit)), ":tp")
Clm[, 2] = coef(fit)[taxoncoef]
plot(c(rep(1, 5), rep(2, 5)), coef(q2.both)$C, xlab = "", ylab = "effect",
xaxt = "n", xlim = c(0, 3), ylim = c(-1, 1))
points(c(rep(1, 5), rep(2, 5)) + 0.25, Clm, xlab = "", col = "red")
axis(1, at = c(1, 2), labels = c("Temp", "TP"))
abline(h = 0)
title("effect (C) estimates)")
legend("topright", c("state-space", "lm"), pch = 1, col = c("black",
"red"))
Another approach would be to compare AICs. We’ll compare models with taxon specific effects of temperature and TP and non-specific effects.
fit = lm(abund ~ -1 + taxon:temp + taxon:tp, data = dat.frame)
fit.temp = lm(abund ~ -1 + taxon:temp, data = dat.frame)
fit.tp = lm(abund ~ -1 + taxon:tp, data = dat.frame)
fit.common = lm(abund ~ -1 + temp + tp, data = dat.frame)
fit.temp.common = lm(abund ~ -1 + temp, data = dat.frame)
fit.tp.common = lm(abund ~ -1 + tp, data = dat.frame)
df <- data.frame(model = c("taxon-specific both", "taxon-specific temp only",
"taxon-specific TP only", "common both", "common temp only",
"common TP only"), AIC = c(AIC(fit), AIC(fit.temp), AIC(fit.tp),
AIC(fit.common), AIC(fit.temp.common), AIC(fit.tp.common)))
kable(df, row.names = FALSE)
model | AIC |
---|---|
taxon-specific both | 1584.879 |
taxon-specific temp only | 1610.999 |
taxon-specific TP only | 1638.629 |
common both | 1659.863 |
common temp only | 1680.348 |
common TP only | 1660.665 |
We might address this by fitting a model with \(y_{t-1}\) as a covariate and comparing to the model without. We remove the NAs in the data frame so that we make sure to keep the response variable exactly the same between our fits. That allows us to compare AICs. The model with \(y_{t-1}\) as a covariate is much better.
dat.frame.no.NAs = na.omit(dat.frame)
fit.pros = lm(abund ~ -1 + taxon:abund.lag1 + taxon:temp + taxon:tp,
data = dat.frame.no.NAs)
fit.obs = lm(abund ~ -1 + taxon:temp + taxon:tp, data = dat.frame.no.NAs)
c(process = AIC(fit.pros), obs = AIC(fit.obs))
process obs
1459.625 1569.131
To answer this question, we can use AIC again. We’ll use the model with \(y_{t-1}\) as a covariate since it is much better.
fit1 = lm(abund ~ -1 + taxon:abund.lag1 + taxon:temp + taxon:tp,
data = dat.frame)
fit2 = lm(abund ~ -1 + taxon:abund.lag1 + taxon:temp + tp, data = dat.frame)
fit3 = lm(abund ~ -1 + taxon:abund.lag1 + temp + taxon:tp, data = dat.frame)
fit4 = lm(abund ~ -1 + taxon:abund.lag1 + temp + tp, data = dat.frame)
The best model is the same as what we got using the state-space model.
AIC.lm = c(AIC(fit1), AIC(fit2), AIC(fit3), AIC(fit4))
AICc.ss = c(q4.unc$AICc, q4.Tempunc$AICc, q4.TPunc$AICc, q4.cons$AICc)
df <- data.frame(Effect = c("unconstrained", "TP effect shared",
"Temp effect shared", "both effects shared"), del.AIC.lm = round(AIC.lm -
min(AIC.lm), 1), del.AICc.ss = round(AICc.ss - min(AICc.ss),
1))
kable(df, row.names = FALSE)
Effect | del.AIC.lm | del.AICc.ss |
---|---|---|
unconstrained | 4.7 | 5.4 |
TP effect shared | 0.0 | 0.0 |
Temp effect shared | 31.6 | 36.2 |
both effects shared | 60.4 | 70.8 |
This homework is to get you digging into a data set and trying to tackle some semi-open-ended questions. There is no one way to answer the questions. The approaches taken in this answer key build on the material you have been learning so far in the class.
However, there are some problems with the approaches taken here. First, in all the analyses using temperature and TP, we did not account for seasonality in the data and the covariates. This seasonality causes high correlation between the two covariates which will problems in any analysis that includes both.
Here are two common approaches to this problem:
If we use models with one covariate, say temperature only without TP, we don’t have a problem of correlation between our covariates because there is only one of them. However even when we use models with one covariate, there are problems with leaving out the seasonality. You might think that by leaving out a season term in our model, we are exploring whether temperature or TP is more important in driving the seasonality in phytoplankton. But we saw in our seasonality analysis that growth rates are highest in spring and lowest in summer and winter. So growth rates are low when the temperature is both low and high. Clearly a model where temperature (and TP) has a linear effect doesn’t make sense. We could try using a non-linear effect using polynomials of the covariates. What about using a temperature and month effect? In lm notation this would be a month:temperture
term where month is a factor. Now instead of estimating one temperature effect per taxon, we estimate 12 (one for each month). That is worth exploring. That would allow unconstrained non-linear temperature effects.
We only used lag-0 in our models. That is we used temperature at time \(t\) to drive growth rate from \(t-1\) to \(t\). Almost certainly some kind of cumulative temperature (degree days) is a better covariate. It may also that temperature and TP simply have a lagged response, so lag-2 or lag-3 would be better. You could use to explore this (ccf()
), but you would have to deal with the seasonality first. The seasonality will mask other lagged correlations that you are looking for.
The similarity between the lm()
and state-space model suggests some problems and mysteries: