9.7 Fitting with MARSS()
Now let’s go ahead and analyze the DLM specified in Equations (9.19)–(9.21). We begin by loading the data set (which is in the MARSS package). The data set has 3 columns for 1) the year the salmon smolts migrated to the ocean (year
), 2) logit-transformed survival (logit.s
), and 3) the coastal upwelling index for April (CUI.apr
). There are 42 years of data (1964–2005).
## load the data
data(SalmonSurvCUI, package = "MARSS")
## get time indices
<- SalmonSurvCUI[, 1]
years ## number of years of data
<- length(years)
TT ## get response variable: logit(survival)
<- matrix(SalmonSurvCUI[, 2], nrow = 1) dat
As we have seen in other case studies, standardizing our covariate(s) to have zero-mean and unit-variance can be helpful in model fitting and interpretation. In this case, it’s a good idea because the variance of CUI.apr
is orders of magnitude greater than logit.s
.
## get predictor variable
<- SalmonSurvCUI[, 3]
CUI ## z-score the CUI
<- matrix((CUI - mean(CUI))/sqrt(var(CUI)), nrow = 1)
CUI_z ## number of regr params (slope + intercept)
<- dim(CUI_z)[1] + 1 m
Plots of logit-transformed survival and the \(z\)-scored April upwelling index are shown in Figure 9.1.
Next, we need to set up the appropriate matrices and vectors for MARSS. Let’s begin with those for the process equation because they are straightforward.
## for process eqn
<- diag(m) ## 2x2; Identity
B <- matrix(0, nrow = m, ncol = 1) ## 2x1; both elements = 0
U <- matrix(list(0), m, m) ## 2x2; all 0 for now
Q diag(Q) <- c("q.alpha", "q.beta") ## 2x2; diag = (q1,q2)
Defining the correct form for the observation model is a little more tricky, however, because of how we model the effect(s) of predictor variables. In a DLM, we need to use \(\mathbf{Z}_t\) (instead of \(\mathbf{d}_t\)) as the matrix of predictor variables that affect \(\mathbf{y}_t\), and we use \(\mathbf{x}_t\) (instead of \(\mathbf{D}_t\)) as the regression parameters. Therefore, we need to set \(\mathbf{Z}_t\) equal to an \(n\times m\times T\) array, where \(n\) is the number of response variables (= 1; \(y_t\) is univariate), \(m\) is the number of regression parameters (= intercept + slope = 2), and \(T\) is the length of the time series (= 42).
## for observation eqn
<- array(NA, c(1, m, TT)) ## NxMxT; empty for now
Z 1, 1, ] <- rep(1, TT) ## Nx1; 1's for intercept
Z[1, 2, ] <- CUI_z ## Nx1; predictor variable
Z[<- matrix(0) ## 1x1; scalar = 0
A <- matrix("r") ## 1x1; scalar = r R
Lastly, we need to define our lists of initial starting values and model matrices/vectors.
## only need starting values for regr parameters
<- list(x0 = matrix(c(0, 0), nrow = m))
inits_list
## list of model matrices & vectors
<- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R) mod_list
And now we can fit our DLM with MARSS.
## fit univariate DLM
<- MARSS(dat, inits = inits_list, model = mod_list) dlm_1
Success! abstol and log-log tests passed at 115 iterations.
Alert: conv.test.slope.tol is 0.5.
Test with smaller values (<0.1) to ensure convergence.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
Estimation converged in 115 iterations.
Log-likelihood: -40.03813
AIC: 90.07627 AICc: 91.74293
Estimate
R.r 0.15708
Q.q.alpha 0.11264
Q.q.beta 0.00564
x0.X1 -3.34023
x0.X2 -0.05388
Initial states (x0) defined at t=0
Standard errors have not been calculated.
Use MARSSparamCIs to compute CIs and bias estimates.
Notice that the MARSS output does not list any estimates of the regression parameters themselves. Why not? Remember that in a DLM the matrix of states \((\mathbf{x})\) contains the estimates of the regression parameters \((\boldsymbol{\theta})\). Therefore, we need to look in dlm_1$states
for the MLEs of the regression parameters, and in dlm_1$states.se
for their standard errors.
Time series of the estimated intercept and slope are shown in Figure 9.2. It appears as though the intercept is much more dynamic than the slope, as indicated by a much larger estimate of process variance for the former (Q.q1
). In fact, although the effect of April upwelling appears to be increasing over time, it doesn’t really become important as a predictor variable until about 1990 when the approximate 95% confidence interval for the slope no longer overlaps zero.