Recall from lecture that a DLM is a state-space model that can be written in MARSS form:
\[\begin{equation} y_t = \mathbf{F}^{\top}_t \boldsymbol{\theta}_t + e_t \\ \boldsymbol{\theta}_t = \mathbf{G} \boldsymbol{\theta}_{t-1} + \mathbf{w}_t \\ \Downarrow \\ y_t = \mathbf{Z}_t \mathbf{x}_t + v_t \\ \mathbf{x}_t = \mathbf{B} \mathbf{x}_{t-1} + \mathbf{w}_t \end{equation}\]
The most simple DLM is a stochastic level model, where the level is a random walk without drift, and this level is observed with error. We will write it first using regression notation where the intercept is \(\alpha\) and then in MARSS notation. In the latter, \(x_t = \alpha_t\).
\[\begin{equation} y_t = \alpha_t + e_t \\ \alpha_t = \alpha_{t-1} + w_t \\ \Downarrow \\ y_t = x_t + v_t \\ x_t = x_{t-1} + w_t \end{equation}\]
We can model flow of the Nile River using a stochastic level model using MARSS()
. Here’s a plot of the data.
## load MARSS package
library(MARSS)
## load Nile flow data
data(Nile, package = "datasets")
## plot the flow
plot.ts(Nile, las = 1, lwd = 2,
xlab = "Year", ylab = "Flow of the River Nile")
To fit the model with MARSS()
, we need to define our model list as with other MARSS models. This model is simply a random walk with observation error, so the definition is pretty straightforward. However, because the data have a non-zero mean, we need an observation model where
\[ y_t = x_t + a + v_t. \]
## define model list
mod_list <- list(B = "identity", U = "zero", Q = matrix("q"),
Z = "identity", A = matrix("a"), R = matrix("r"))
## fit the model with MARSS
fit <- MARSS(matrix(Nile, nrow = 1), mod_list)
## Success! abstol and log-log tests passed at 82 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 82 iterations.
## Log-likelihood: -637.7569
## AIC: 1283.514 AICc: 1283.935
##
## Estimate
## A.a -0.338
## R.r 15135.796
## Q.q 1381.153
## x0.x0 1111.791
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
## plot the flow and fit
plot.ts(Nile, las = 1, lwd = 2,
xlab = "Year", ylab = "Flow of the River Nile")
lines(seq(start(Nile)[1], end(Nile)[1]),
lwd = 2, t(fit$states), col = "blue")
lines(seq(start(Nile)[1], end(Nile)[1]), t(fit$states + 2*fit$states.se),
lwd = 2, lty = "dashed", col = "blue")
lines(seq(start(Nile)[1], end(Nile)[1]), t(fit$states - 2*fit$states.se),
lwd = 2, lty = "dashed", col = "blue")
We can add a drift term to the stochastic level model above to allow the level to tend upward or downward with a deterministic rate \(\eta\). This is simply another name for a random walk with bias.
\[\begin{equation} y_t = \alpha_t + e_t \\ \alpha_t = \alpha_{t-1} + \eta + w_t \\ \Downarrow \\ y_t = x_t + v_t \\ x_t = x_{t-1} + u + w_t \end{equation}\]
In a DLM we can allow the drift term \(\eta\) to evolve over time along with the level. In this case, \(\eta\) is modeled as a random walk along with \(\alpha\). This model is
\[\begin{equation} y_t = \alpha_t + e_t \\ \alpha_t = \alpha_{t-1} + \eta_{t-1} + w_{\alpha,t} \\ \eta_t = \eta_{t-1} + w_{\eta,t} \end{equation}\]
This equation can be written in matrix form as:
\[\begin{equation} y_t = \begin{bmatrix}1&0\end{bmatrix}\begin{bmatrix} \alpha \\ \eta \end{bmatrix}_t + v_t \\ \begin{bmatrix} \alpha \\ \eta \end{bmatrix}_t = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} \alpha \\ \eta \end{bmatrix}_{t-1} + \begin{bmatrix} w_{\alpha} \\ w_{\eta} \end{bmatrix}_t \end{equation}\]
and in the more common MARSS form as
\[\begin{equation} y_t = \mathbf{Z}\mathbf{x}_t + v_t \\ \mathbf{x}_t = \mathbf{B}\mathbf{x}_{t-1} + \mathbf{w}_t \end{equation}\]
where \(\mathbf{B}=\begin{bmatrix} 1 & 1 \\ 0 & 1\end{bmatrix}\), \(\mathbf{x}=\begin{bmatrix}\alpha \\ \eta\end{bmatrix}\), and \(\mathbf{Z}=\begin{bmatrix}1&0\end{bmatrix}\).
We can use a model with a stochastic level and drift to fit data with an apparent exponential increase (or decrease) over time. Here are some example data.
## some example data
exp_dat <- c(11.68, 7.68, 7.56, 7.73, 7.34, 8.75, 12.09, 12.77,
6.57, 16.77, 15.45, 15.06, 11.23, 10.58, 14.02, 17.81,
15.8, 15.31, 15.17, 19.87, 20.03, 21.23, 21.41, 21.12,
21.91, 21.33, 28.72, 27.77, 25.68, 28.82, 32.51, 32.14,
35.21, 35.35, 36.19, 37.28, 38.59, 38.58, 43.84, 45.59)
## plot them
plot.ts(exp_dat, las = 1, lwd = 2,
ylab = expression(italic(y[t])))
Fitting this model with MARSS()
is a bit tricky in that we can’t used a “canned” form for \(\mathbf{B}\), and we have to specify \(\mathbf{Z}\) as well. The rest is pretty straightforward, though.
In addition, we’ll need to give MARSS()
some initial conditions in order for it to get started properly.
## B matrix
BB <- matrix(c(1, 0, 1, 1), 2, 2)
## Z matrix (vector)
ZZ <- matrix(c(1, 0), 1, 2)
## define model list
mod_list <- list(B = BB, U = "zero", Q = "diagonal and equal",
Z = ZZ, A = matrix(0), R = matrix("r"))
## initial starting values for parameters
inits_list <- list(x0 = matrix(c(0, 0), 2, 1))
## fit the model with MARSS
fit_2 <- MARSS(matrix(exp_dat, nrow = 1),
model = mod_list,
inits = inits_list)
## Success! abstol and log-log tests passed at 146 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 146 iterations.
## Log-likelihood: -93.27837
## AIC: 194.5567 AICc: 195.6996
##
## Estimate
## R.r 4.6768
## Q.diag 0.0146
## x0.X1 7.9831
## x0.X2 0.3084
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
Interpreting the output from this model is a bit tricky because the first state is the sum of the stochastic level and the drift terms. Here are plots of the estimated level and drift time series.
## get estimates of alpha
alpha_hat <- c(fit_2$states[1,1],
fit_2$states[1,-1] - fit_2$states[2,length(exp_dat)])
## get estimates of eta
eta_hat <- fit_2$states[2,]
## plot the estimated level and drift
par(mfrow = c(2,1), mai = c(0.8, 0.8, 0.2, 0.2), omi = c(0, 0, 0, 0))
## plot alpha
plot.ts(alpha_hat, las = 1, lwd = 2, col = "blue",
ylab = expression(alpha[t]))
## plot eta
plot.ts(eta_hat, las = 1, lwd = 2, col = "blue",
ylab = expression(eta[t]))
Here is a plot of the original data with the model fit and ~95% confidence interval.
## plot the data and fit
plot.ts(exp_dat, las = 1, lwd = 2,
ylab = expression(italic(y[t])))
lines(seq(40), fit_2$states[1,],
lwd = 2, col = "blue")
lines(seq(40), fit_2$states[1,] + 2*fit_2$states.se[1,],
lwd = 2, lty = "dashed", col = "blue")
lines(seq(40), fit_2$states[1,] - 2*fit_2$states.se[1,],
lwd = 2, lty = "dashed", col = "blue")
The stochastic level models above do not have any predictor variables (covariates). Let’s add one predictor variable \(f_t\) and write a simple DLM where the intercept \(\alpha\) and slope \(\beta\) are stochastic. We will specify that \(\alpha\) and \(\beta\) evolve according to a simple random walk. Normally \(x\) is used for the predictor variables in a regression model, but we will avoid that since we are using \(x\) for the state equation in a state-space model. This model is
\[\begin{equation} y_t = \alpha_t + \beta_t f_t + v_t \\ \alpha_t = \alpha_{t-1} + w_{\alpha,t} \\ \beta_t = \beta_{t-1} + w_{\beta,t} \end{equation}\]
Written in matrix form, the model is
\[\begin{equation} y_t = \begin{bmatrix} 1 & f_t \end{bmatrix}\begin{bmatrix} \alpha \\ \beta \end{bmatrix}_t + v_t \\ \begin{bmatrix} \alpha \\ \beta \end{bmatrix}_t = \begin{bmatrix} \alpha \\ \beta \end{bmatrix}_{t-1} + \begin{bmatrix} w_{\alpha} \\ w_{\beta} \end{bmatrix}_t \end{equation}\]
This equation is a MARSS model where
\[\begin{equation} y_t = \mathbf{Z}\mathbf{x}_t + v_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \end{equation}\]
and \(\mathbf{x}=\begin{bmatrix}\alpha \\ \beta\end{bmatrix}\) and \(\mathbf{Z}=\begin{bmatrix}1&f_t\end{bmatrix}\).
Let’s see an example of a DLM used to analyze real data from the literature. Scheuerell and Williams (2005) used a DLM to examine the relationship between marine survival of Chinook salmon and an index of ocean upwelling strength along the west coast of the USA. Upwelling brings cool, nutrient-rich waters from the deep ocean to shallower coastal areas. Scheuerell and Williams hypothesized that stronger upwelling in April should create better growing conditions for phytoplankton, which would then translate into more zooplankton. In turn, juvenile salmon (“smolts”) entering the ocean in May and June should find better foraging opportunities. Thus, for smolts entering the ocean in year \(t\),
\[\begin{equation} \text{logit}(s_t) = \alpha_t + \beta_t f_t + v_t ~ \text{ with } ~ v_{t}\sim\text{N}(0,r), \end{equation}\]
where \(s_t\) is the survival for migration year \(t\), and \(f_t\) is the coastal upwelling index (cubic meters of seawater per second per 100 m of coastline) for the month of April in year \(t\).
Both the intercept and slope are time varying, so
\[\begin{align} \alpha_t &= \alpha_{t-1} + w_{\alpha,t} ~ \text{ with } ~ w_{\alpha,t} \sim \text{N}(0,q_{\alpha}) \\ \beta_t &= \beta_{t-1} + w_{\beta,t} ~ \text{ with } ~ w_{\beta,t} \sim \text{N}(0,q_{\beta}). \end{align}\]
Let’s define the following terms so we can write out the full DLM as a state-space model:
\[\begin{equation} \boldsymbol{\theta}_t = \begin{bmatrix}\alpha \\ \beta\end{bmatrix}_t \\ \mathbf{G}_t = \mathbf{I} \\ \mathbf{w}_t = \begin{bmatrix} w_\alpha \\ w_\beta\end{bmatrix}_t \\ \mathbf{Q} = \begin{bmatrix}q_\alpha& 0 \\ 0&q_\beta\end{bmatrix} \\ y_t = \text{logit}(s_t) \\ \mathbf{F}_t = \begin{bmatrix}1 \\ f_t\end{bmatrix} \end{equation}\]
That leaves us with with the following form:
\[\begin{equation} y_t = \mathbf{F}_t^{\top}\boldsymbol{\theta}_t + v_t \text{ with } v_t\sim\text{N}(0,r)\\ \boldsymbol{\theta}_t = \boldsymbol{\theta}_{t-1} + \mathbf{w}_t \text{ with } \mathbf{w}_t \sim \text{MVN}(\mathbf{0},\mathbf{Q}) \end{equation}\]
This equation is equivalent to our standard MARSS model:
\[\begin{equation} \mathbf{y}_t = \mathbf{Z}_t\mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \text{ with } \mathbf{v}_t \sim \text{MVN}(0,\mathbf{R}_t)\\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{u} + \mathbf{w}_t \text{ with } \mathbf{w}_t \sim \text{MVN}(0,\mathbf{Q}_t) \end{equation}\]
where \(\mathbf{x}_t = \boldsymbol{\theta}_t\), \(\mathbf{y}_t = y_t\) (i.e., \(\mathbf{y}_t\) is 1 \(\times\) 1), \(\mathbf{Z}_t = \mathbf{F}_t^{\top}\), \(\mathbf{a} = \mathbf{u} = \mathbf{0}\), and \(\mathbf{R}_t = r\) (i.e., \(\mathbf{R}_t\) is 1 \(\times\) 1).
Let’s go ahead and fit the DLM specified above. Begin by loading the data set (which is in the MARSS package). The data set has 3 columns for
year
: the year the salmon smolts migrated to the ocean
logit.s
: logit-transformed survival
CUI.apr
: the coastal upwelling index for April
There are 42 years of data (1964–2005).
## load the data
data(SalmonSurvCUI, package = "MARSS")
## get time indices
years <- SalmonSurvCUI[, 1]
## number of years of data
TT <- length(years)
## get response variable: logit(survival)
dat <- matrix(SalmonSurvCUI[,2], nrow = 1)
As we have seen in other case studies, standardizing our covariate(s) to have zero-mean and unit-variance can be helpful in model fitting and interpretation. In this case, it’s a good idea because the variance of CUI.apr
is orders of magnitude greater than logit.s
.
## get predictor variable
CUI <- SalmonSurvCUI[,3]
## z-score the CUI
CUI_z <- matrix((CUI - mean(CUI)) / sqrt(var(CUI)), nrow = 1)
## number of regr params (slope + intercept)
m <- dim(CUI_z)[1] + 1
Plots of logit-transformed survival and the \(z\)-scored April upwelling index are shown in the figure below.
Next, we need to set up the appropriate matrices and vectors for MARSS. Let’s begin with those for the process equation because they are straightforward.
## for process eqn
B <- diag(m) ## 2x2; Identity
U <- matrix(0, nrow = m, ncol = 1) ## 2x1; both elements = 0
Q <- matrix(list(0), m, m) ## 2x2; all 0 for now
diag(Q) <- c("q.alpha", "q.beta") ## 2x2; diag = (q1,q2)
Defining the correct form for the observation model is a little more tricky, however, because of how we model the effect(s) of predictor variables. In a DLM, we need to use \(\mathbf{Z}_t\) (instead of \(\mathbf{d}_t\)) as the matrix of predictor variables that affect \(\mathbf{y}_t\), and we use \(\mathbf{x}_t\) (instead of \(\mathbf{D}_t\)) as the regression parameters. Therefore, we need to set \(\mathbf{Z}_t\) equal to an \(n\times m\times T\) array, where \(n\) is the number of response variables (= 1; \(y_t\) is univariate), \(m\) is the number of regression parameters (= intercept + slope = 2), and \(T\) is the length of the time series (= 42).
## for observation eqn
Z <- array(NA, c(1, m, TT)) ## NxMxT; empty for now
Z[1,1,] <- rep(1, TT) ## Nx1; 1's for intercept
Z[1,2,] <- CUI_z ## Nx1; predictor variable
A <- matrix(0) ## 1x1; scalar = 0
R <- matrix("r") ## 1x1; scalar = r
Lastly, we need to define our lists of initial starting values and model matrices/vectors.
## only need starting values for regr parameters
inits_list <- list(x0 = matrix(c(0, 0), nrow = m))
## list of model matrices & vectors
mod_list <- list(B = B, U = U, Q = Q, Z = Z, A = A, R = R)
And now we can fit our DLM with MARSS.
## fit univariate DLM
dlm_s <- MARSS(dat, model = mod_list, inits = inits_list)
## Success! abstol and log-log tests passed at 115 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 115 iterations.
## Log-likelihood: -40.03813
## AIC: 90.07627 AICc: 91.74293
##
## Estimate
## R.r 0.15708
## Q.q.alpha 0.11264
## Q.q.beta 0.00564
## x0.X1 -3.34023
## x0.X2 -0.05388
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
Notice that the MARSS output does not list any estimates of the regression parameters themselves. Why not? Remember that in a DLM the matrix of states \((\mathbf{x})\) contains the estimates of the regression parameters \((\boldsymbol{\theta})\). Therefore, we need to look in dlm_s$states
for the MLEs of the regression parameters, and in dlm_s$states.se
for their standard errors.
Time series of the estimated intercept and slope are shown in the figure below. It appears as though the intercept is much more dynamic than the slope, as indicated by a much larger estimate of process variance for the former (Q.q1
). In fact, although the effect of April upwelling appears to be increasing over time, it doesn’t really become important as a predictor variable until about 1990 when the approximate 95% confidence interval for the slope no longer overlaps zero.
Forecasting from a DLM involves two steps:
Get an estimate of the regression parameters at time \(t\) from data up to time \(t-1\). These are also called the one-step ahead forecast (or prediction) of the regression parameters.
Make a prediction of \(y\) at time \(t\) based on the predictor variables at time \(t\) and the estimate of the regression parameters at time \(t\) (step 1). This is also called the one-step ahead forecast (or prediction) of the observation.
For step 1, we want to compute the distribution of the regression parameters at time \(t\) conditioned on the data up to time \(t-1\), also known as the one-step ahead forecasts of the regression parameters. Let’s denote \(\boldsymbol{\theta}_{t-1}\) conditioned on \(y_{1:t-1}\) as \(\boldsymbol{\theta}_{t-1|t-1}\) and denote \(\boldsymbol{\theta}_{t}\) conditioned on \(y_{1:t-1}\) as \(\boldsymbol{\theta}_{t|t-1}\). We will start by defining the distribution of \(\boldsymbol{\theta}_{t|t}\) as follows
\[\begin{equation} \boldsymbol{\theta}_{t|t} \sim \text{MVN}(\boldsymbol{\pi}_t,\boldsymbol{\Lambda}_t) \end{equation}\] where \(\boldsymbol{\pi}_t = \text{E}(\boldsymbol{\theta}_{t|t})\) and \(\mathbf{\Lambda}_t = \text{Var}(\boldsymbol{\theta}_{t|t})\).
Now we can compute the distribution of \(\boldsymbol{\theta}_{t}\) conditioned on \(y_{1:t-1}\) using the process equation for \(\boldsymbol{\theta}\):
\[\begin{equation} \boldsymbol{\theta}_{t} = \mathbf{G}_t \boldsymbol{\theta}_{t-1} + \mathbf{w}_t ~ \text{with} ~ \mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q}) \\ \end{equation}\]
The expected value of \(\boldsymbol{\theta}_{t|t-1}\) is thus
\[\begin{equation} \text{E}(\boldsymbol{\theta}_{t|t-1}) = \mathbf{G}_t \text{E}(\boldsymbol{\theta}_{t-1|t-1}) = \mathbf{G}_t \boldsymbol{\pi}_{t-1} \end{equation}\]
The variance of \(\boldsymbol{\theta}_{t|t-1}\) is
\[\begin{equation} \text{Var}(\boldsymbol{\theta}_{t|t-1}) = \mathbf{G}_t \text{Var}(\boldsymbol{\theta}_{t-1|t-1}) \mathbf{G}_t^{\top} + \mathbf{Q} = \mathbf{G}_t \mathbf{\Lambda}_{t-1} \mathbf{G}_t^{\top} + \mathbf{Q} \end{equation}\]
Thus the distribution of \(\boldsymbol{\theta}_{t}\) conditioned on \(y_{1:t-1}\) is
\[\begin{equation} \text{E}(\boldsymbol{\theta}_{t|t-1}) \sim \text{MVN}(\mathbf{G}_t \boldsymbol{\pi}_{t-1}, \mathbf{G}_t \mathbf{\Lambda}_{t-1} \mathbf{G}_t^{\top} + \mathbf{Q}) \end{equation}\]
For step 2, we make the prediction of \(y_{t}\) given the predictor variables at time \(t\) and the estimate of the regression parameters at time \(t\). This is called the one-step ahead prediction for the observation at time \(t\). We will denote the prediction of \(y\) as \(\hat{y}\) and we want to compute its distribution (mean and variance). We do this using the equation for \(y_t\) but substituting the expected value of \(\boldsymbol{\theta}_{t|t-1}\) for \(\boldsymbol{\theta}_t\).
\[\begin{equation} \hat{y}_{t|t-1} = \mathbf{F}^{\top}_{t} \text{E}(\boldsymbol{\theta}_{t|t-1}) + e_{t} ~ \text{with} ~ e_{t} \sim \text{N}(0, r) \\ \end{equation}\]
Our prediction of \(y\) at \(t\) has a normal distribution with mean (expected value) and variance. The expected value of \(\hat{y}_{t|t-1}\) is
\[\begin{equation} \text{E}(\hat{y}_{t|t-1}) = \mathbf{F}^{\top}_{t} \text{E}(\boldsymbol{\theta}_{t|t-1}) = \mathbf{F}^{\top}_{t} (\mathbf{G}_t \boldsymbol{\pi}_{t-1}) \\ \end{equation}\]
and the variance of \(\hat{y}_{t|t-1}\) is
\[\begin{align} \text{Var}(\hat{y}_{t|t-1}) &= \mathbf{F}^{\top}_{t} \text{Var}(\boldsymbol{\theta}_{t|t-1}) \mathbf{F}_{t} + r \\ &= \mathbf{F}^{\top}_{t} (\mathbf{G}_t \mathbf{\Lambda}_{t-1} \mathbf{G}_t^{\top} + \mathbf{Q}) \mathbf{F}_{t} + r \\ \end{align}\]
The expectations and variance of \(\boldsymbol{\theta}_t\) conditioned on \(y_{1:t}\) and \(y_{1:t-1}\) are standard output from the Kalman filter. Thus, to produce the predictions, all we need to do is run our DLM state-space model through a Kalman filter to get \(\text{E}(\boldsymbol{\theta}_{t|t-1})\) and \(\text{Var}(\boldsymbol{\theta}_{t|t-1})\) and then use the above equations to compute the mean prediction and its variance. The Kalman filter will need \(\mathbf{F}_t\), \(\mathbf{G}_t\) and estimates of \(\mathbf{Q}\) and \(r\).
Let’s see an example with the salmon survival DLM. We will use the Kalman filter function in the MARSS package and the DLM fit from MARSS()
.
Scheuerell and Williams (2005) were interested in how well upwelling could be used to actually forecast expected survival of salmon, so let’s look at how well our model does in that context. To do so, we need the predictive distribution for the survival at time \(t\) given the upwelling at time \(t\) and the predicted regression parameters at \(t\).
In the salmon survival DLM, the \(\mathbf{G}_t\) matrix is the identity matrix, thus the mean and variance of the one-step ahead predictive distribution for the observation at time \(t\) reduces to
\[\begin{equation} \text{E}(\hat{y}_{t|t-1}) = \mathbf{F}^{\top}_{t} \text{E}(\boldsymbol{\theta}_{t|t-1}) \\ \text{Var}(\hat{y}_{t|t-1}) = \mathbf{F}^{\top}_{t} \text{Var}(\boldsymbol{\theta}_{t|t-1}) \mathbf{F}_{t} + \hat{r} \end{equation}\]
where
\[ \mathbf{F}_{t}=\begin{bmatrix}1 \\ f_{t}\end{bmatrix} \]
and \(f_{t}\) is the upwelling index at \(t+1\). \(\hat{r}\) is the estimated observation variance from our model fit.
For the expectation, we need \(\mathbf{F}_{t}^\top\text{E}(\boldsymbol{\theta}_{t|t-1})\). \(\mathbf{F}_t^\top\) is called \(\mathbf{Z}_t\) in MARSS notation. The one-step ahead forecasts of the regression parameters at time \(t\), the \(\text{E}(\boldsymbol{\theta}_{t|t-1})\), are calculated as part of the Kalman filter algorithm—they are termed \(\tilde{x}_t^{t-1}\) in MARSS notation and stored as xtt1
in the list produced by the MARSSkfss()
Kalman filter function.
Using the Z
array defined above, we compute the mean forecast as follows:
## get list of Kalman filter output
kf_out <- MARSSkfss(dlm_s)
## forecasts of regr parameters; 2xT matrix
eta <- kf_out$xtt1
## ts of E(forecasts)
fore_mean <- vector()
for(t in 1:TT) {
fore_mean[t] <- Z[,,t] %*% eta[, t, drop = FALSE]
}
For the variance of the forecasts, we need \(\mathbf{F}^{\top}_{t} \text{Var}(\boldsymbol{\theta}_{t|t-1}) \mathbf{F}_{t} + \hat{r}\). As with the mean, \(\mathbf{F}^\top_t \equiv \mathbf{Z}_t\). The variances of the one-step ahead forecasts of the regression parameters at time \(t\), \(\text{Var}(\boldsymbol{\theta}_{t|t-1})\), are also calculated as part of the Kalman filter algorithm—they are stored as Vtt1
in the list produced by the MARSSkfss()
function. Lastly, the observation variance \(\hat{r}\) was estimated when we fit the DLM to the data using MARSS()
and can be extracted from the fitted object dlm_s
.
Putting this together, we can compute the forecast variance:
## variance of regr parameters; 1x2xT array
Phi <- kf_out$Vtt1
## obs variance; 1x1 matrix
R_est <- coef(dlm_s, type="matrix")$R
## ts of Var(forecasts)
fore_var <- vector()
for(t in 1:TT) {
tZ <- matrix(Z[, , t], m, 1) ## transpose of Z
fore_var[t] <- Z[, , t] %*% Phi[, , t] %*% tZ + R_est
}
Plots of the model mean forecasts with their estimated uncertainty are shown in the figure below. Nearly all of the observed values fall within the approximate prediction interval. Notice that we have a forecasted value for the first year of the time series (1964), which may seem at odds with our notion of forecasting at time \(t\) based on data available only through time \(t-1\). In this case, however, MARSS is actually estimating the states at \(t=0\) (\(\boldsymbol{\theta}_0\)), which allows us to compute a forecast for the first time point.
Although our model forecasts look reasonable in logit-space, it is worthwhile to examine how well they look when the survival data and forecasts are back-transformed onto the interval [0,1]. In this case, the accuracy does not seem to be affected, but the precision appears much worse, especially during the early and late portions of the time series when survival is changing rapidly.
Notice that we passed the DLM that was fit to all of the data to MARSSkfss()
. This meant that the Kalman filter used estimates of \(\mathbf{Q}\) and \(r\) using all the data in the xtt1
and Vtt1
calculations. Thus our predictions at time \(t\) are not entirely based on only data up to time \(t-1\) since the \(\mathbf{Q}\) and \(r\) estimates were from all the data from 1964 to 2005.
As with other time series models, evaluation of a DLM should include diagnostics. In a forecasting context, we are often interested in the forecast errors, which are simply the observed data minus the forecasts \(e_t = y_t - \text{E}(y_t|y_{1:t-1})\). In particular, the following assumptions should hold true for \(e_t\):
In the literature on state-space models, the set of \(e_t\) are commonly referred to as “innovations”. MARSS()
calculates the innovations as part of the Kalman filter algorithm—they are stored as Innov
in the list produced by the MARSSkfss()
function.
## forecast errors
innov <- kf_out$Innov
Let’s see if our innovations meet the model assumptions. We can use a Q-Q plot to see whether the innovations are normally distributed with a mean of zero. We’ll use the qqnorm()
function to plot the quantiles of the innovations on the \(y\)-axis versus the theoretical quantiles from a Normal distribution on the \(x\)-axis. If the 2 distributions are similar, the points should fall on the line defined by \(y = x\).
## Q-Q plot of innovations
qqnorm(t(innov), main = "", pch = 16, col = "blue")
## add y=x line for easier interpretation
qqline(t(innov))
The Q-Q plot indicates that the innovations appear to be more-or-less normally distributed (i.e., most points fall on the line). Furthermore, it looks like the mean of the innovations is about 0, but we should use a more reliable test than simple visual inspection. We can formally test whether the mean of the innovations is significantly different from 0 by using a one-sample \(t\)-test. based on a null hypothesis of \(\E(e_t)=0\). To do so, we will use the function t.test()
and base our inference on a significance value of \(\alpha = 0.05\).
## p-value for t-test of H0: E(innov) = 0
t.test(t(innov), mu = 0)$p.value
## [1] 0.4840901
The \(p\)-value \(>>\) 0.05 so we cannot reject the null hypothesis that \(\E(e_t) = 0\).
Moving on to assumption (2), we can use the sample autocorrelation function (ACF) to examine whether the innovations covary with a time-lagged version of themselves. Using the acf()
function, we can compute and plot the correlations of \(e_t\) and \(e_{t-k}\) for various values of \(k\). Assumption (2) will be met if none of the correlation coefficients exceed the 95% confidence intervals defined by \(\pm \, z_{0.975} / \sqrt{n}\).
## plot ACF of innovations
acf(t(innov), lag.max = 10)
The ACF plot shows no significant autocorrelation in the innovations at lags 1–10, so it looks like both of our model assumptions have indeed been met.