Deterministic vs stochastic elements
Regression with autocorrelated errors
Regression with temporal random effects
Dynamic Factor Analysis (DFA)
Forms of covariance matrix
Constraints for model fitting
Interpretation of results
28 January 2021
Deterministic vs stochastic elements
Regression with autocorrelated errors
Regression with temporal random effects
Dynamic Factor Analysis (DFA)
Forms of covariance matrix
Constraints for model fitting
Interpretation of results
Consider this simple model, consisting of a mean \(\mu\) plus error
\[ y_i = \mu + e_i ~ \text{with} ~ e_i \sim \text{N}(0,\sigma^2) \]
The right-hand side of the equation is composed of deterministic and stochastic pieces
\[ y_i = \underbrace{\mu}_{\text{deterministic}} + \underbrace{e_i}_{\text{stochastic}} \]
Sometime these pieces are referred to as fixed and random
\[ y_i = \underbrace{\mu}_{\text{fixed}} + \underbrace{e_i}_{\text{random}} \]
This can also be seen by rewriting the model
\[ y_i = \mu + e_i ~ \text{with} ~ e_i \sim \text{N}(0,\sigma^2) \]
as
\[ y_i \sim \text{N}(\mu,\sigma^2) \]
We can expand the deterministic part of the model, as with linear regression
\[ y_i = \underbrace{\alpha + \beta x_i}_{\text{mean}} + e_i ~ \text{with} ~ e_i \sim \text{N}(0,\sigma^2) \]
so
\[ y_i \sim \text{N}(\alpha + \beta x_i,\sigma^2) \]
Consider a simple model with a mean \(\mu\) plus white noise
\[ y_t = \mu + e_t ~ \text{with} ~ e_t \sim \text{N}(0,\sigma^2) \]
We can expand the deterministic part of the model, as before with linear regression
\[ y_t = \underbrace{\alpha + \beta x_t}_{\text{mean}} + e_t ~ \text{with} ~ e_t \sim \text{N}(0,\sigma^2) \]
so
\[ y_t \sim \text{N}(\alpha + \beta x_t,\sigma^2) \]
These do not look like white noise!
There is significant autocorrelation at lag = 1
We can expand the stochastic part of the model to have autocorrelated errors
\[ y_t = \alpha + \beta x_t + e_t \\ e_t = \phi e_{t-1} + w_t \]
with \(w_t \sim \text{N}(0,\sigma^2)\)
We can expand the stochastic part of the model to have autocorrelated errors
\[ y_t = \alpha + \beta x_t + e_t \\ e_t = \phi e_{t-1} + w_t \]
with \(w_t \sim \text{N}(0,\sigma^2)\)
We can write this model as our standard state-space model
\[ \begin{align} y_t &= \alpha + \beta x_t + e_t \\ &= e_t + \alpha + \beta x_t\\ &\Downarrow \\ y_t &= x_t + a + D d_t + v_t\\ \end{align} \]
with
\(x_t = e_t\), \(a = \alpha\), \(D = \beta\), \(d_t = x_t\), \(v_t = 0\)
\[ \begin{align} e_t &= \phi e_{t-1} + w_t \\ &\Downarrow \\ x_t &= B x_t + w_t\\ \end{align} \]
with
\(x_t = e_t\) and \(B = \phi\)
\[ y_t = \alpha + \beta x_t + e_t \\ e_t = \phi e_{t-1} + w_t \\ \Downarrow \\ y_t = a + D d_t + x_t\\ x_t = B x_t + w_t \]
MARSS()
\[ y_t = a + D d_t + x_t \\ \Downarrow \\ y_t = Z x_t + a + D d_t + v_t \]
y = data ## [1 x T] matrix of data a = matrix("a") ## intercept D = matrix("D") ## slope d = covariate ## [1 x T] matrix of measured covariate Z = matrix(1) ## no multiplier on x R = matrix(0) ## v_t ~ N(0,R); want v_t = 0 for all t
MARSS()
\[ x_t = B x_t + w_t \\ \Downarrow \\ x_t = B x_t + u + C c_t + w_t \]
B = matrix("b") ## AR(1) coefficient for model errors Q = matrix("q") ## w_t ~ N(0,Q); var for model errors u = matrix(0) ## u = 0 C = matrix(0) ## C = 0 c = matrix(0) ## c_t = 0 for all t
MORE RANDOM EFFECTS
Recall our simple model
\[ y_t = \underbrace{\mu}_{\text{fixed}} + \underbrace{e_t}_{\text{random}} \]
We can expand the random portion
\[ y_t = \underbrace{\mu}_{\text{fixed}} + ~ \underbrace{f_t + e_t}_{\text{random}} \]
\[ e_t \sim \text{N}(0, \sigma) \\ f_t \sim \text{N}(f_{t-1}, \gamma) \]
We can expand the random portion
\[ y_t = \underbrace{\mu}_{\text{fixed}} + ~ \underbrace{f_t + e_t}_{\text{random}} \]
\[ e_t \sim \text{N}(0, \sigma) \\ f_t \sim \text{N}(f_{t-1}, \gamma) \]
This is simply a random walk observed with error
\[ y_t = \mu + f_t + e_t ~ \text{with} ~ e_t \sim \text{N}(0, \sigma) \\ f_t = f_{t-1} + w_t ~ \text{with} ~ w_t \sim \text{N}(0, \gamma) \\ \Downarrow \\ y_t = a + x_t + v_t ~ \text{with} ~ v_t \sim \text{N}(0, R) \\ x_t = x_{t-1} + w_t ~ \text{with} ~ w_t \sim \text{N}(0, Q) \]
We can expand the fixed portion
\[ y_t = \underbrace{\alpha + \beta x_t}_{\text{fixed}} + ~ \underbrace{f_t + e_t}_{\text{random}} \]
\[ e_t \sim \text{N}(0, \sigma) \\ f_t \sim \text{N}(f_{t-1}, \gamma) \]
\[ y_t = \alpha + \beta x_t + f_t + e_t ~ \text{with} ~ e_t \sim \text{N}(0, \sigma) \\ f_t = f_{t-1} + w_t ~ \text{with} ~ w_t \sim \text{N}(0, \gamma) \\ \Downarrow \\ y_t = a + D d_t + x_t + v_t ~ \text{with} ~ v_t \sim \text{N}(0, R) \\ x_t = x_{t-1} + w_t ~ \text{with} ~ w_t \sim \text{N}(0, Q) \]
MULTIPLE TIME SERIES
\[ y_{i,t} = x_{i,t} + a_i + v_{i,t} \\ x_{i,t} = x_{i,t-1} + w_{i,t} \]
with
\(v_{i,t} \sim \text{N}(0, R)\)
\(w_{i,t} \sim \text{N}(0, Q)\)
\[ y_{1,t} = x_{1,t} + a_1 + v_{1,t} \\ y_{2,t} = x_{2,t} + a_2 + v_{2,t} \\ \vdots \\ y_{n,t} = x_{n,t} + a_2 + v_{n,t} \\ \]
\[ x_{1,t} = x_{1,t-1} + w_{1,t} \\ x_{2,t} = x_{2,t-1} + w_{2,t} \\ \vdots \\ x_{n,t} = x_{n,t-1} + w_{n,t} \]
\[ \mathbf{y}_t = \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
with
\(\mathbf{v}_t \sim \text{MVN}(\mathbf{0}, \mathbf{R})\)
\(\mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q})\)
We often observe covariance among environmental time series, especially for those collected close to one another in space
Are there some common patterns here?
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
We can make (test) assumptions by specifying different forms for \(\mathbf{Z}\)
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}_t = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t + \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \\ v_5 \end{bmatrix}_t \]
\[ \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t = \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_{t-1} + \begin{bmatrix} w_{JF} \\ w_N \\ w_S \end{bmatrix}_t \]
What if our observations were instead a mixture of 2+ states?
For example, we sampled haul-outs located between several breeding sites
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \end{bmatrix}_t = \begin{bmatrix} 0.8 & 0.2 & 0 \\ 0.2 & 0.7 & 0.1 \\ 0 & 0.9 & 0.1 \\ 0 & 0.3 & 0.7 \\ 0 & 0.1 & 0.9 \\ \end{bmatrix} \times \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t + \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \\ v_5 \end{bmatrix}_t \]
\[ \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_t = \begin{bmatrix} x_{JF} \\ x_N \\ x_S \end{bmatrix}_{t-1} + \begin{bmatrix} w_{JF} \\ w_N \\ w_S \end{bmatrix}_t \]
What if our observations were a mixture of states, but we didn’t know how many or the weightings?
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
What are the dimensions of \(\mathbf{Z}\)?
What are the elements within \(\mathbf{Z}\)?
DFA is a dimension reduction technique, which models \(n\) observed time series as a function of \(m\) hidden states (patterns), where \(n \gg m\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
data: \(\mathbf{y}_t\) is \(n \times 1\)
loadings: \(\mathbf{Z}\) is \(n \times m\) with \(n > m\)
states: \(\mathbf{x}_t\) is \(m \times 1\)
Goal is to reduce some large number of correlated variates into a few uncorrelated factors
Calculating the principal components requires us to estimate the covariance of the data
\[ \text{PC} = \text{eigenvectors}(\text{cov}(\mathbf{y})) \]
There will be \(n\) principal components (eigenvectors) for an \(n \times T\) matrix \(\mathbf{y}\)
We reduce the dimension by selecting a subset of the components that explain much of the variance (eg, the first 2)
We need to estimate the covariance in the data \(\mathbf{y}\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t, ~ \text{with} ~ \mathbf{v}_t \sim \text{MVN}(\mathbf{0}, \mathbf{R}) \]
so
\[ \text{cov}(\mathbf{y}_t) = \mathbf{Z} \text{cov}(\mathbf{x}_t) \mathbf{Z}^\top + \mathbf{R} \] In PCA, we require \(\mathbf{R}\) to be diagonal, but not so in DFA
\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & 0 & 0 & 0 \\ 0 & \sigma & 0 & 0 \\ 0 & 0 & \sigma & 0 \\ 0 & 0 & 0 & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \\ 0 & 0 & 0 & \sigma_4 \end{bmatrix} \]
\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & 0 & 0 & 0 \\ 0 & \sigma & 0 & 0 \\ 0 & 0 & \sigma & 0 \\ 0 & 0 & 0 & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \\ 0 & 0 & 0 & \sigma_4 \end{bmatrix} \]
\[ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma & \gamma & \gamma & \gamma \\ \gamma & \sigma & \gamma & \gamma \\ \gamma & \gamma & \sigma & \gamma \\ \gamma & \gamma & \gamma & \sigma \end{bmatrix} ~\text{or}~~ \mathbf{R} \stackrel{?}{=} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & \gamma_{2,4} \\ 0 & 0 & \sigma_3 & 0 \\ 0 & \gamma_{2,4} & 0 & \sigma_4 \end{bmatrix} \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
What form should we use for \(\mathbf{Z}\)?
\[ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_1 \\ z_2 \\ z_3 \\ z_4 \\ z_5 \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \\ z_{1,4} & z_{2,4} \\ z_{1,5} & z_{2,5} \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} & z_{3,1} \\ z_{1,2} & z_{2,2} & z_{3,2} \\ z_{1,3} & z_{2,3} & z_{3,3} \\ z_{1,4} & z_{2,4} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
What form should we use for \(\mathbf{Z}\)?
\[ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_1 \\ z_2 \\ z_3 \\ \vdots \\ z_5 \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \\ \vdots & \vdots \\ z_{1,n} & z_{2,n} \end{bmatrix} ~\text{or}~~ \mathbf{Z} \stackrel{?}{=} \begin{bmatrix} z_{1,1} & z_{2,1} & z_{3,1} \\ z_{1,2} & z_{2,2} & z_{3,2} \\ z_{1,3} & z_{2,3} & z_{3,3} \\ \vdots & \vdots & \vdots \\ z_{1,n} & z_{2,n} & z_{3,n} \end{bmatrix} \]
We’ll use model selection criteria to choose (eg, AICc)
Unless \(\mathbf{Z}\) is unconstrained in some manner, there are an infinite number of combinations of \(\mathbf{Z}\) and \(\mathbf{x}\) that will equal \(\mathbf{y}\)
Therefore we need to impose some constraints on the model
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_n \end{bmatrix} \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ \vdots \\ a_n \end{bmatrix} \]
We will set the first \(m\) elements of \(\mathbf{a}\) to 0
For example, if \(n = 5\) and \(m = 2\)
\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \]
For example, if \(n = 5\) and \(m = 2\)
\[ \mathbf{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ a_3 \\ a_4 \\ a_5 \end{bmatrix} \Rightarrow \mathbf{a} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} \]
Note, however, that this causes problems for the EM algorithm so we will often de-mean the data and set \(a_i = 0\) for all \(i\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & z_{2,1} & \dots & z_{m,1} \\ z_{1,2} & z_{2,2} & \dots & z_{m,2} \\ z_{1,3} & z_{2,3} & \dots & z_{m,3} \\ \vdots & \vdots & \ddots & z_{m,4} \\ z_{1,n} & z_{2,n} & \dots & z_{m,n} \end{bmatrix} \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & z_{2,1} & \dots & z_{m,1} \\ z_{1,2} & z_{2,2} & \dots & z_{m,2} \\ z_{1,3} & z_{2,3} & \dots & z_{m,3} \\ \vdots & \vdots & \ddots & z_{m,4} \\ z_{1,n} & z_{2,n} & \dots & z_{m,n} \end{bmatrix} \]
We will set the upper right triangle of \(\mathbf{Z}\) to 0
For example, if \(n = 5\) and \(m = 3\)
\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & 0 & 0 \\ z_{1,2} & z_{2,2} & 0 \\ z_{1,3} & z_{2,3} & z_{3,3} \\ z_{1,4} & z_{2,3} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]
For the first \(m - 1\) rows of \(\mathbf{Z}\), \(z_{i,j} = 0\) if \(j > i\)
An additional constraint is necessary in a Bayesian context
\[ \mathbf{Z} = \begin{bmatrix} \underline{z_{1,1}} & 0 & 0 \\ z_{1,2} & \underline{z_{2,2}} & 0 \\ z_{1,3} & z_{2,3} & \underline{z_{3,3}} \\ z_{1,4} & z_{2,3} & z_{3,4} \\ z_{1,5} & z_{2,5} & z_{3,5} \end{bmatrix} \]
Diagonal of \(\mathbf{Z}\) is positive: \(z_{i,j} > 0\) if \(i = j\)
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q}) \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{w}_t \sim \text{MVN}(\mathbf{0}, \mathbf{Q}) \]
We will set \(\mathbf{Q}\) equal to the Identity matrix \(\mathbf{I}\)
For example, if \(m = 4\)
\[ \mathbf{Q} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \]
This allows our random walks to have a lot of flexibility
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \underline{\mathbf{D} \mathbf{d}_t} + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\(\mathbf{d}_t\) is a \(p \times 1\) vector of covariates at time \(t\)
\(\mathbf{D}\) is an \(n \times p\) matrix of covariate effects
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \underline{\mathbf{D}} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
Careful thought must be given a priori as to the form for \(\mathbf{D}\)
Should the effect(s) vary by site, species, etc?
For example, given 2 covariates, \(\text{Temp}\) and \(\text{Salinity}\)
\[ \mathbf{D} = \underbrace{ \begin{bmatrix} d_{\text{Temp}} & d_{\text{Salinity}} \\ d_{\text{Temp}} & d_{\text{Salinity}} \\ \vdots & \vdots \\ d_{\text{Temp}} & d_{\text{Salinity}} \\ \end{bmatrix} }_{\text{effects same by site/species}} ~~~ \text{or} ~~~ \mathbf{D} = \underbrace{ \begin{bmatrix} d_{\text{Temp}, 1} & d_{\text{Salinity}, 1} \\ d_{\text{Temp}, 2} & d_{\text{Salinity}, 2} \\ \vdots & \vdots \\ d_{\text{Temp}, n} & d_{\text{Salinity}, n} \\ \end{bmatrix} }_{\text{effects differ by site/species}} \]
Earlier we saw that we could use model selection criteria to help us choose among the different forms for \(\mathbf{Z}\)
However, caution must be given when comparing models with and without covariates, and varying numbers of states
Think about the DFA model form
\[ \mathbf{y}_t = \mathbf{Z} \underline{\mathbf{x}_t} + \mathbf{a} + \mathbf{D} \underline{\mathbf{d}_t} + \mathbf{v}_t \\ \]
\(\mathbf{x}_t\) are undetermined random walks
\(\mathbf{d}_t\) are predetermined covariates
Model 1 has 2 trends and no covariates
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}_t = \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}_t + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}_t \]
Model 2 has 1 trend and 1 covariate
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}_t = \begin{bmatrix} z_1 \\ z_2 \\ z_3 \end{bmatrix} \begin{bmatrix} x \end{bmatrix}_t + \begin{bmatrix} D_1 \\ D_2 \\ D_3 \end{bmatrix} \begin{bmatrix} d \end{bmatrix}_t + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}_t \]
Model 1 has 2 trends and no covariates
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}_t = \begin{bmatrix} z_{1,1} & z_{2,1} \\ z_{1,2} & z_{2,2} \\ z_{1,3} & z_{2,3} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}_t + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}_t \]
Model 2 has 1 trend and 1 covariate
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}_t = \begin{bmatrix} z_1 \\ z_2 \\ z_3 \end{bmatrix} \begin{bmatrix} x \end{bmatrix}_t + \begin{bmatrix} D_1 \\ D_2 \\ D_3 \end{bmatrix} \begin{bmatrix} d \end{bmatrix}_t + \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}_t \]
Unless \(\mathbf{d}\) is highly correlated with \(\mathbf{y}\), Model 1 will be favored
fit the most complex model you can envision based on all of your possible covariates and random factors (states)
keep the covariates fixed and choose the number of trends (states) using AICc
keep the covariates & states fixed and choose the form for \(\mathbf{R}\)
sort out the covariates while keeping the states & \(\mathbf{R}\) fixed
Recall that we had to constrain the form of \(\mathbf{Z}\) to fit the model
\[ \mathbf{Z} = \begin{bmatrix} z_{1,1} & 0 & \dots & 0 \\ z_{1,2} & z_{2,2} & \ddots & 0 \\ \vdots & \vdots & \ddots & 0 \\ \vdots & \vdots & \vdots & z_{m,m} \\ \vdots & \vdots & \vdots & \vdots \\ z_{1,n} & z_{2,n} & z_{3,n} & z_{m,n} \end{bmatrix} \]
So, the 1st common factor is determined by the 1st variate, the 2nd common factor by the first two variates, etc.
To help with this, we can use a basis rotation to maximize the loadings on a few factors
If \(\mathbf{H}\) is an \(m \times m\) non-singular matrix, these 2 DFA models are equivalent
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_t + \mathbf{a} + \mathbf{D} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{x}_t = \mathbf{x}_{t-1} + \mathbf{w}_t \]
\[ \mathbf{y}_t = \mathbf{Z} \mathbf{H}^{-1} \mathbf{x}_t + \mathbf{a} + \mathbf{D} \mathbf{d}_t + \mathbf{v}_t \\ \mathbf{H} \mathbf{x}_t = \mathbf{H} \mathbf{x}_{t-1} + \mathbf{H} \mathbf{w}_t \]
How should we choose \(\mathbf{H}\)?
A varimax rotation will maximize the variance of the loadings in \(\mathbf{Z}\) along a few of the factors
Deterministic vs stochastic elements
Regression with autocorrelated errors
Regression with temporal random effects
Dynamic Factor Analysis (DFA)
Forms of covariance matrix
Constraints for model fitting
Interpretation of results