8.6 Including seasonal effects in MARSS models
Time-series data are often collected at intervals with some implicit ``seasonality.’’ For example, quarterly earnings for a business, monthly rainfall totals, or hourly air temperatures. In those cases, it is often helpful to extract any recurring seasonal patterns that might otherwise mask some of the other temporal dynamics we are interested in examining.
Here we show a few approaches for including seasonal effects using the Lake Washington plankton data, which were collected monthly. The following examples will use all five phytoplankton species from Lake Washington. First, let’s set up the data.
<- fulldat[, "Year"] >= 1965 & fulldat[, "Year"] < 1975
years <- c("Diatoms", "Greens", "Bluegreens", "Unicells", "Other.algae")
phytos <- t(fulldat[years, phytos])
dat
# z.score data because we changed the mean when we
# subsampled
<- apply(dat, 1, mean, na.rm = TRUE)
the.mean <- sqrt(apply(dat, 1, var, na.rm = TRUE))
the.sigma <- (dat - the.mean) * (1/the.sigma)
dat # number of time periods/samples
<- dim(dat)[2] TT
8.6.1 Seasonal effects as fixed factors
One common approach for estimating seasonal effects is to treat each one as a fixed factor, such that the number of parameters equals the number of ``seasons’’ (e.g., 24 hours per day, 4 quarters per year). The plankton data are collected monthly, so we will treat each month as a fixed factor. To fit a model with fixed month effects, we create a \(12 \times T\) covariate matrix \(\mathbf{c}\) with one row for each month (Jan, Feb, …) and one column for each time point. We put a 1 in the January row for each column corresponding to a January time point, a 1 in the February row for each column corresponding to a February time point, and so on. All other values of \(\mathbf{c}\) equal 0. The following code will create such a \(\mathbf{c}\) matrix.
# number of 'seasons' (e.g., 12 months per year)
<- 12
period # first 'season' (e.g., Jan = 1, July = 7)
.1st <- 1
per# create factors for seasons
<- diag(period)
c.in for (i in 2:(ceiling(TT/period))) {
<- cbind(c.in, diag(period))
c.in
}# trim c.in to correct start & length
<- c.in[, (1:TT) + (per.1st - 1)]
c.in # better row names
rownames(c.in) <- month.abb
Next we need to set up the form of the \(\mathbf{C}\) matrix which defines any constraints we want to set on the month effects. \(\mathbf{C}\) is a \(5 \times 12\) matrix. Five taxon and 12 month effects. If we wanted each taxon to have the same month effect, i.e. there is a common month effect across all taxon, then we have the same value in each \(\mathbf{C}\) column:
<- matrix(month.abb, 5, 12, byrow = TRUE)
C C
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[2,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[3,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[4,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
[5,] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Notice, that \(\mathbf{C}\) only has 12 values in it, the 12 common month effects. However, for this example, we will let each taxon have a different month effect thus allowing different seasonality for each taxon. For this model, we want each value in \(\mathbf{C}\) to be unique:
<- "unconstrained" C
Now \(\mathbf{C}\) has 5 \(\times\) 12 = 60 separate effects.
Then we set up the form for the rest of the model parameters. We make the following assumptions:
# Each taxon has unique density-dependence
<- "diagonal and unequal"
B # Assume independent process errors
<- "diagonal and unequal"
Q # We have demeaned the data & are fitting a mean-reverting
# model by estimating a diagonal B, thus
<- "zero"
U # Each obs time series is associated with only one process
<- "identity"
Z # The data are demeaned & fluctuate around a mean
<- "zero"
A # We assume observation errors are independent, but they
# have similar variance due to similar collection methods
<- "diagonal and equal"
R # We are not including covariate effects in the obs
# equation
<- "zero"
D <- "zero" d
Now we can set up the model list for MARSS and fit the model (results are not shown since they are verbose with 60 different month effects).
<- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R,
model.list C = C, c = c.in, D = D, d = d)
.1 <- MARSS(dat, model = model.list, control = list(maxit = 1500))
seas.mod
# Get the estimated seasonal effects rows are taxa, cols
# are seasonal effects
.1 <- coef(seas.mod.1, type = "matrix")$C
seasrownames(seas.1) <- phytos
colnames(seas.1) <- month.abb
The top panel in Figure 8.2 shows the estimated seasonal effects for this model. Note that if we had set U=“unequal,” we would need to set one of the columns of \(\mathbf{C}\) to zero because the model would be under-determined (infinite number of solutions). If we substracted the mean January abundance off each time series, we could set the January column in \(\mathbf{C}\) to 0 and get rid of 5 estimated effects.
8.6.2 Seasonal effects as a polynomial
The fixed factor approach required estimating 60 effects. Another approach is to model the month effect as a 3rd-order (or higher) polynomial: \(a+b\times m + c\times m^2 + d \times m^3\) where \(m\) is the month number. This approach has less flexibility but requires only 20 estimated parameters (i.e., 4 regression parameters times 5 taxa). To do so, we create a \(4 \times T\) covariate matrix \(\mathbf{c}\) with the rows corresponding to 1, \(m\), \(m^2\), and \(m^3\), and the columns again corresponding to the time points. Here is how to set up this matrix:
# number of 'seasons' (e.g., 12 months per year)
<- 12
period # first 'season' (e.g., Jan = 1, July = 7)
.1st <- 1
per# order of polynomial
<- 3
poly.order # create polynomials of months
<- matrix(1, 1, period)
month.cov for (i in 1:poly.order) {
= rbind(month.cov, (1:12)^i)
month.cov
}# our c matrix is month.cov replicated once for each year
<- matrix(month.cov, poly.order + 1, TT + period, byrow = FALSE)
c.m.poly # trim c.in to correct start & length
<- c.m.poly[, (1:TT) + (per.1st - 1)]
c.m.poly
# Everything else remains the same as in the previous
# example
<- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R,
model.list C = C, c = c.m.poly, D = D, d = d)
.2 <- MARSS(dat, model = model.list, control = list(maxit = 1500)) seas.mod
The effect of month \(m\) for taxon \(i\) is \(a_i + b_i \times m + c_i \times m^2 + d_i \times m^3\), where \(a_i\), \(b_i\), \(c_i\) and \(d_i\) are in the \(i\)-th row of \(\mathbf{C}\). We can now calculate the matrix of seasonal effects as follows, where each row is a taxon and each column is a month:
.2 = coef(seas.mod.2, type = "matrix")$C
C.2 = C.2 %*% month.cov
seasrownames(seas.2) <- phytos
colnames(seas.2) <- month.abb
The middle panel in Figure 8.2 shows the estimated seasonal effects for this polynomial model.
Note: Setting the covariates up like this means that our covariates are collinear since \(m\), \(m^2\) and \(m^3\) are correlated, obviously. A better approach is to use the poly()
function to create an orthogonal polynomial covariate matrix c.m.poly.o
:
<- cbind(1, poly(1:period, poly.order))
month.cov.o <- matrix(t(month.cov.o), poly.order + 1, TT + period,
c.m.poly.o byrow = FALSE)
<- c.m.poly.o[, (1:TT) + (per.1st - 1)] c.m.poly.o
8.6.3 Seasonal effects as a Fourier series
The factor approach required estimating 60 effects, and the 3rd order polynomial model was an improvement at only 20 parameters. A third option is to use a discrete Fourier series, which is combination of sine and cosine waves; it would require only 10 parameters. Specifically, the effect of month \(m\) on taxon \(i\) is \(a_i \times \cos(2 \pi m/p) + b_i \times \sin(2 \pi m/p)\), where \(p\) is the period (e.g., 12 months, 4 quarters), and \(a_i\) and \(b_i\) are contained in the \(i\)-th row of \(\mathbf{C}\).
We begin by defining the \(2 \times T\) seasonal covariate matrix \(\mathbf{c}\) as a combination of 1 cosine and 1 sine wave:
<- cos(2 * pi * seq(TT)/period)
cos.t <- sin(2 * pi * seq(TT)/period)
sin.t <- rbind(cos.t, sin.t) c.Four
Everything else remains the same and we can fit this model as follows:
<- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R,
model.list C = C, c = c.Four, D = D, d = d)
.3 <- MARSS(dat, model = model.list, control = list(maxit = 1500)) seas.mod
We make our seasonal effect matrix as follows:
.3 <- coef(seas.mod.3, type = "matrix")$C
C# The time series of net seasonal effects
.3 <- C.3 %*% c.Four[, 1:period]
seasrownames(seas.3) <- phytos
colnames(seas.3) <- month.abb
The bottom panel in Figure 8.2 shows the estimated seasonal effects for this seasonal-effects model based on a discrete Fourier series.
Rather than rely on our eyes to judge model fits, we should formally assess which of the 3 approaches offers the most parsimonious fit to the data. Here is a table of AICc values for the 3 models:
data.frame(Model = c("Fixed", "Cubic", "Fourier"), AICc = round(c(seas.mod.1$AICc,
.2$AICc, seas.mod.3$AICc), 1)) seas.mod
Model AICc
1 Fixed 1188.4
2 Cubic 1144.9
3 Fourier 1127.4
The model selection results indicate that the model with monthly seasonal effects estimated via the discrete Fourier sequence is the best of the 3 models. Its AICc value is much lower than either the polynomial or fixed-effects models.