21 Feb 2019

Topics for today

Covariates

  • Why include covariates?

  • Multivariate linear regression on time series data

  • Covariates in MARSS models

    • Seasonality in MARSS models

    • Missing covariates

Why include covariates in a model?

  • We are often interested in knowing the cause of variation
  • Covariates can explain the process that generated the patterns

Why include covariates in a model?

  • You want to forecast something using covariates

Covariates -> Forecast

Covariates in time series models

  • Multivariate linear regression for time series data
  • ARIMA models
    • Linear regression with ARIMA errors
    • ARMAX
  • MARSS models with covariates
    • We will focus on this. Related to ARMAX

What about ETS and covariates? Wouldn’t make sense.

Multivariate linear regression for time series data

Can you do a linear regression with time series data (response and predictors)? Yes, but you need to be careful. Read Chapter 5 in Hyndman and Athanasopoulos 2018

  • Diagnostics that need to be satisfied
    • Residuals are temporally uncorrelated
    • Residuals are not correlated with the predictor variables
  • Be careful regarding spurious correlation if both response and predictor variables have trends

Autocorrelated response and predictor variables

Imagine that your data looked like so where the line is the data and the color represents your covariate.

Autocorrelated response and predictor variables

  • Do you really have 20 independent data points for estimating the covariate effect?
  • Why are the data correlated? It is only because of the covariate?
  • How many covariates did you look at?
  • This can be another type of spurious correlation.

Linear regression with autocorrelated errors

The xreg argument in Arima() and arima() allows you to fit linear regressions with autocorrelated errors. Read Chapter 9 in Hyndman and Athanasopoulos 2018 on Dynamic Regression.

A linear regression with autocorrelated errors is for example:

\[y_t = \alpha + \beta d_t + \nu_t \\ \nu_t = \theta_1 \nu_{t-1} + \theta_2 \nu_{t-2} + e_t\]

Fitting in R

Arima()

fit <- Arima(y, xreg=d, order=c(1,1,0))

auto.arima()

fit <- auto.arima(y, xreg=x)

Example from Hyndman and Athanasopoulos 2018

A simple regression has problems

y <- uschange[,"Consumption"]; d <- uschange[,"Income"]
fit <- lm(y~d)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 27.584, df = 10, p-value = 0.002104

Try AR(1) errors

fit <- Arima(y, xreg=d, order=c(1,0,0))
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,0) errors
## Q* = 20.485, df = 5, p-value = 0.001013
## 
## Model df: 3.   Total lags used: 8

Let auto.arima() find best ARMA model

fit <- auto.arima(y, xreg=d) # It finds a ARMA(1,0,2) is best.
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,2) errors
## Q* = 5.8916, df = 3, p-value = 0.117
## 
## Model df: 5.   Total lags used: 8

Collinearity

The is a big issue. If you are thinking about stepwise variable selection, do a literature search on the issue. Read the chapter in Holmes 2018: Chap 6 on catch forecasting models using multivariate regression for a discussion of

  • Stepwise variable regression in R
  • Cross-validation for regression models
  • Penalized regression in R
    • Lasso
    • Ridge
    • Elastic Net
  • Diagnostics

Covariates in MARSS models

We are trying to explain the ERRORS with our covariates.

\[\mathbf{x}_t = \mathbf{B} \mathbf{x}_{t-1} + \mathbf{C} \mathbf{c}_t + \mathbf{w}_t \\ \mathbf{y}_t = \mathbf{Z} \mathbf{x}_{t} + \mathbf{D} \mathbf{d}_t + \mathbf{v}_t\]