11.1 Covariates with missing values or observation error

The specific formulation of Equation (8.1) creates restrictions on the assumptions regarding the covariate data. You have to assume that your covariate data has no error, which is probably not true. You cannot have missing values in your covariate data, again unlikely. You cannot combine instrument time series; for example, if you have two temperature recorders with different error rates and biases. Also, what if you have one noisy temperature sensor in the first part of your time series and then you switch to a much better sensor in the second half of your time series? All these problems require pre-analysis massaging of the covariate data, leaving out noisy and gappy covariate data, and making what can feel like arbitrary choices about which covariate time series to include.

To circumvent these potential problems and allow more flexibility in how we incorporate covariate data, one can instead treat the covariates as components of an auto-regressive process by including them in both the process and observation models. Beginning with the process equation, we can write \[\begin{equation} \begin{gathered} \begin{bmatrix}\mathbf{x}^{(v)} \\ \mathbf{x}^{(c)}\end{bmatrix}_t = \begin{bmatrix}\mathbf{B}^{(v)} & \mathbf{C} \\ 0 & \mathbf{B}^{(c)}\end{bmatrix} \begin{bmatrix}\mathbf{x}^{(v)} \\ \mathbf{x}^{(c)}\end{bmatrix}_{t-1} + \begin{bmatrix}\mathbf{u}^{(v)} \\ \mathbf{u}^{(c)} \end{bmatrix} + \mathbf{w}_t,\\ \mathbf{w}_t \sim \,\text{MVN}\begin{pmatrix}0,\begin{bmatrix}\mathbf{Q}^{(v)} & 0 \\ 0 & \mathbf{Q}^{(c)} \end{bmatrix} \end{pmatrix} \end{gathered} \tag{11.1} \end{equation}\] The elements with superscript \({(v)}\) are for the \(k\) variate states and those with superscript \({(c)}\) are for the \(q\) covariate states. The dimension of \(\mathbf{x}^{(c)}\) is \(q \times 1\) and \(q\) is not necessarily equal to \(p\), the number of covariate observation time series in your dataset. Imagine, for example, that you have two temperature sensors and you are combining these data. Then you have two covariate observation time series (\(p=2\)) but only one underlying covariate state time series (\(q=1\)). The matrix \(\mathbf{C}\) is dimension \(k \times q\), and \(\mathbf{B}^{(c)}\) and \(\mathbf{Q}^{(c)}\) are dimension \(q \times q\). The dimension of \(\mathbf{x}^{(v)}\) is \(k \times 1\), and \(\mathbf{B}^{(v)}\) and \(\mathbf{Q}^{(v)}\) are dimension \(k \times k\). The dimension of \(\mathbf{x}\) is always denoted \(m\). If your process model includes only variates, then \(k=m\), but now your process model includes \(k\) variates and \(q\) covariate states so \(m=k+q\).

Next, we can write the observation equation in an analogous manner, such that \[\begin{equation} \begin{gathered} \begin{bmatrix} \mathbf{y}^{(v)} \\ \mathbf{y}^{(c)} \end{bmatrix}_t = \begin{bmatrix}\mathbf{Z}^{(v)} & \mathbf{D} \\ 0 & \mathbf{Z}^{(c)} \end{bmatrix} \begin{bmatrix}\mathbf{x}^{(v)} \\ \mathbf{x}^{(c)} \end{bmatrix}_t + \begin{bmatrix} \mathbf{a}^{(v)} \\ \mathbf{a}^{(c)} \end{bmatrix} + \mathbf{v}_t,\\ \mathbf{v}_t \sim \,\text{MVN}\begin{pmatrix}0,\begin{bmatrix}\mathbf{R}^{(v)} & 0 \\ 0 & \mathbf{R}^{(c)} \end{bmatrix} \end{pmatrix} \end{gathered} \tag{11.2} \end{equation}\] The dimension of \(\mathbf{y}^{(c)}\) is \(p \times 1\), where \(p\) is the number of covariate observation time series in your dataset. The dimension of \(\mathbf{y}^{(v)}\) is \(l \times 1\), where \(l\) is the number of variate observation time series in your dataset. The total dimension of \(\mathbf{y}\) is \(l+p\). The matrix \(\mathbf{D}\) is dimension \(l \times q\), \(\mathbf{Z}^{(c)}\) is dimension \(p \times q\), and \(\mathbf{R}^{(c)}\) are dimension \(p \times p\). The dimension of \(\mathbf{Z}^{(v)}\) is dimension \(l \times k\), and \(\mathbf{R}^{(v)}\) are dimension \(l \times l\).

The \(\mathbf{D}\) matrix would presumably have a number of all zero rows in it, as would the \(\mathbf{C}\) matrix. The covariates that affect the states would often be different than the covariates that affect the observations. For example, mean annual temperature might affect population growth rates for many species while having little or no affect on observability, and turbidity might strongly affect observability in many types of aquatic surveys but have little affect on population growth rate.

Our MARSS model with covariates now looks on the surface like a regular MARSS model: \[\begin{equation} \begin{gathered} \mathbf{x}_t = \mathbf{B}\mathbf{x}_{t-1} + \mathbf{u} + \mathbf{w}_t, \text{ where } \mathbf{w}_t \sim \,\text{MVN}(0,\mathbf{Q}) \\ \mathbf{y}_t = \mathbf{Z}\mathbf{x}_t + \mathbf{a} + \mathbf{v}_t, \text{ where } \mathbf{v}_t \sim \,\text{MVN}(0,\mathbf{R}) \end{gathered} \end{equation}\] with the \(\mathbf{x}_t\), \(\mathbf{y}_t\) and parameter matrices redefined as in Equations (11.1) and (11.2): \[\begin{equation} \begin{gathered} \mathbf{x}=\begin{bmatrix}\mathbf{x}^{(v)}\\ \mathbf{x}^{(c)}\end{bmatrix} \quad \mathbf{B}=\begin{bmatrix}\mathbf{B}^{(v)} & \mathbf{C} \\ 0 & \mathbf{B}^{(c)}\end{bmatrix} \quad \mathbf{u}=\begin{bmatrix}\mathbf{u}^{(v)}\\ \mathbf{u}^{(c)}\end{bmatrix} \quad \mathbf{Q}=\begin{bmatrix}\mathbf{Q}^{(v)} & 0 \\ 0 & \mathbf{Q}^{(c)}\end{bmatrix} \\ \mathbf{y}=\begin{bmatrix}\mathbf{y}^{(v)}\\ \mathbf{y}^{(c)}\end{bmatrix} \quad \mathbf{Z}=\begin{bmatrix}\mathbf{Z}^{(v)} & \mathbf{D} \\ 0 & \mathbf{Z}^{(c)}\end{bmatrix} \quad \mathbf{a}=\begin{bmatrix}\mathbf{a}^{(v)}\\ \mathbf{a}^{(c)}\end{bmatrix} \quad \mathbf{R}=\begin{bmatrix}\mathbf{R}^{(v)} & 0 \\ 0 & \mathbf{R}^{(c)}\end{bmatrix} \end{gathered} \tag{11.3} \end{equation}\] Note \(\mathbf{Q}\) and \(\mathbf{R}\) are written as block diagonal matrices, but you could allow covariances if that made sense. \(\mathbf{u}\) and \(\mathbf{a}\) are column vectors here. We can fit the model (Equation (11.3)) as usual using the MARSS() function.

The log-likelihood that is returned by MARSS will include the log-likelihood of the covariates under the covariate state model. If you want only the the log-likelihood of the non-covariate data, you will need to subtract off the log-likelihood of the covariate model: \[\begin{equation} \begin{gathered} \mathbf{x}^{(c)}_t = \mathbf{B}^{(c)}\mathbf{x}_{t-1}^{(c)} + \mathbf{u}^{(c)} + \mathbf{w}_t, \text{ where } \mathbf{w}_t \sim \,\text{MVN}(0,\mathbf{Q}^{(c)}) \\ \mathbf{y}^{(c)}_t = \mathbf{Z}^{(c)}\mathbf{x}_t^{(c)} + \mathbf{a}^{(c)} + \mathbf{v}_t, \text{ where } \mathbf{v}_t \sim \,\text{MVN}(0,\mathbf{R}^{(c)}) \end{gathered} \tag{11.4} \end{equation}\] An easy way to get this log-likelihood for the covariate data only is use the augmented model (Equation (11.2) with terms defined as in Equation (11.3) but pass in missing values for the non-covariate data. The following code shows how to do this.

y.aug = rbind(data, covariates)
fit.aug = MARSS(y.aug, model = model.aug)

fit.aug is the MLE object that can be passed to MARSSkf(). You need to make a version of this MLE object with the non-covariate data filled with NAs so that you can compute the log-likelihood without the covariates. This needs to be done in the marss element since that is what is used by MARSSkf(). Below is code to do this.

fit.cov = fit.aug
fit.cov$marss$data[1:dim(data)[1], ] = NA
extra.LL = MARSSkf(fit.cov)$logLik

Note that when you fit the augmented model, the estimates of \(\mathbf{C}\) and \(\mathbf{B}^{(c)}\) are affected by the non-covariate data since the model for both the non-covariate and covariate data are estimated simultaneously and are not independent (since the covariate states affect the non-covariates states). If you want the covariate model to be unaffected by the non-covariate data, you can fit the covariate model separately and use the estimates for \(\mathbf{B}^{(c)}\) and \(\mathbf{Q}^{(c)}\) as fixed values in your augmented model.