Hidden Markov Models

Author

Maria Kuruvilla, Miranda Mudge

Published

May 9, 2023

Data

Here we will examine a model fitting 3 different local copepod metrics - copepod richness, northern copepod biomass, and southern copepod biomass - across the entire available time series. After spending some time looking at the time-series patterns prior to model fitting, we would also like to evaluate the effect of Pacific Decadal Oscillation (PDO) from December to March as a covariate on both the state and transition. We will evaluate both a 2 and 3 state model with 1 covariate.

Note

The variables we chose were already de-meaned. We did not zscore the data.

Load the data

Wrangle the data

Code
rownames(stoplight) <- stoplight[,1]

subset_data <- stoplight[c("Copepod_richness","N_copepod","S_copepod","PDO_DecMarch"),3:26]

#column names should be years
colnames(subset_data) <- seq(1998,2021)

par(mfrow = c(2,1), mai = c(0.5, 1, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot(seq(1998,2021),subset_data["Copepod_richness",], type = "l", xlab = "years", ylab ="Copepod richness")
plot(seq(1998,2021),subset_data["N_copepod",], type = "l", xlab = "years", ylab ="N_copepod")

Code
plot(seq(1998,2021),subset_data["S_copepod",], type = "l", xlab = "years", ylab ="S_copepod")
plot(seq(1998,2021),subset_data["PDO_DecMarch",], type = "l", xlab = "years", ylab ="PDO_DecMarch")

Univariate test

We will start by looking at copepod richness.

Code
data <- data.frame(t(subset_data))
fitmod <- depmix(Copepod_richness ~1,
                 nstates = 2,
                 transition = ~ 1,
                 family = gaussian(),
                 data = data)
Code
set.seed(123)
fit1 <- fit(fitmod)
converged at iteration 18 with logLik: -54.68034 
Code
pars <- getpars(fit1)
matrix(pars[3:6], 2, 2, byrow = TRUE)
          [,1]      [,2]
[1,] 0.7536262 0.2463738
[2,] 0.4021802 0.5978198
Code
AIC1 <- AIC(fitmod)

Here we look at the effect of 1 covariate on copepod richness.

Code
fitmod_cov <- depmix(Copepod_richness ~1 + PDO_DecMarch,
                 nstates = 2,
                 transition = ~ 1,
                 family = gaussian(),
                 data = data)
Code
set.seed(123)
fit_cov <- fit(fitmod_cov)
converged at iteration 40 with logLik: -41.20702 
Code
pars_cov <- getpars(fit_cov)
matrix(pars_cov[3:6], 2, 2, byrow = TRUE)
          [,1]      [,2]
[1,] 0.4533026 0.5466974
[2,] 0.3606362 0.6393638
Code
AIC_cov <- AIC(fitmod_cov)

Reframe multivariate dataset

Wrangle data to multivariate subset.

Code
subset_data_l <- subset_data
subset_data_l$ID <- rownames(subset_data_l)

subset_data_long <- subset_data_l %>%
  pivot_longer(!ID, names_to = "variable",
               values_to = "value")


subset_multi <-pivot_wider(subset_data_long, names_from = ID, values_from = value, names_repair = "check_unique")

HMM on 2 states

Basic 2 state model

Code
set.seed(123)
mod2_1 = depmix(list(Copepod_richness ~ 1, N_copepod~1, S_copepod~1), 
             nstates = 2, 
             family = list(gaussian(),gaussian(),gaussian()),
             data=subset_multi)
fitmod2_1 = fit(mod2_1)
converged at iteration 6 with logLik: -36.64126 
Code
AIC2_1 = AIC(fitmod2_1)
Code
prstates = apply(posterior(fitmod2_1)[,c("S1","S2")], 1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
plot(prstates, type="b", xlab="Time", ylab="State")

Code
plot(subset_multi$variable, scale(apply(posterior(fitmod2_1) [,c("S1", "S2")], 1, which.max)),
     ylab="probability",
frame=FALSE, ylim = c(-1,4), type = "l")
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
points(subset_multi$variable,scale(subset_multi$Copepod_richness))

HMM with 2 states and covariate

Code
set.seed(123)
mod2_2 = depmix(list(Copepod_richness ~ PDO_DecMarch, N_copepod~ PDO_DecMarch, 
                     S_copepod~PDO_DecMarch), 
             nstates = 2, 
             family = list(gaussian(),gaussian(),gaussian()),
             data=subset_multi)
fitmod2_2 = fit(mod2_2)
converged at iteration 23 with logLik: -16.51386 
Code
AIC2_2 = AIC(fitmod2_2)

Covariate in the transition probabilty

Code
set.seed(123)
mod2_3 = depmix(list(Copepod_richness ~ 1, N_copepod ~ 1, 
                     S_copepod ~ 1), 
             nstates = 2, 
             transition = ~PDO_DecMarch,
             family = list(gaussian(),gaussian(),gaussian()),
             data=subset_multi)
fitmod2_3 = fit(mod2_3)
converged at iteration 6 with logLik: -35.52724 
Code
AIC2_3 = AIC(fitmod2_3)

HMM on 3 states

Basic 3 state model

Code
set.seed(123)
mod3_1 = depmix(list(Copepod_richness ~ 1, N_copepod~1, S_copepod~1), 
             nstates = 3, 
             family = list(gaussian(),gaussian(),gaussian()),
             data=subset_multi)
fitmod3_1 = fit(mod3_1)
converged at iteration 13 with logLik: -14.21542 
Code
AIC3_1 <- AIC(fitmod3_1)

Effect of covariate on transition parameters

3 state model with covariate effects on transition parameters

Code
set.seed(123)
mod3_2 = depmix(list(Copepod_richness ~ 1, N_copepod~1, S_copepod~1), 
             nstates = 3, 
             family = list(gaussian(),gaussian(),gaussian()),
             transition = ~scale(PDO_DecMarch),
             data=subset_multi)
fitmod3_2 = fit(mod3_2)
Warning in em.depmix(object = object, maxit = emcontrol$maxit, tol =
emcontrol$tol, : Log likelihood decreased on iteration 7 from -9.99923565186817
to -28.8461160684358
Code
summary(fitmod3_2, which = "transition")
Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~scale(PDO_DecMarch)
Coefficients: 
                    St1       St2       St3
(Intercept)           0 -2.050474 -3.214950
scale(PDO_DecMarch)   0  2.953207 -1.822716
Probalities at zero values of the covariates.
0.8555555 0.1100877 0.03435684 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~scale(PDO_DecMarch)
Coefficients: 
                    St1       St2       St3
(Intercept)           0 -54.41624  37.84036
scale(PDO_DecMarch)   0  41.09343 -66.67456
Probalities at zero values of the covariates.
3.682484e-17 8.579654e-41 1 

Transition model for state (component) 3 
Model of type multinomial (mlogit), formula: ~scale(PDO_DecMarch)
Coefficients: 
                    St1       St2       St3
(Intercept)           0  168.3578 -23.28767
scale(PDO_DecMarch)   0 -549.4447  22.56400
Probalities at zero values of the covariates.
7.640487e-74 1 5.880508e-84 
Code
summary(fitmod3_2, which = "response")
Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1          -1.329  1.066           0.281  0.142          -0.145  0.102
St2           5.458  2.010          -0.682  0.220           0.474  0.103
St3           2.302  0.650           0.018  0.105           0.142  0.063
Code
AIC3_2 <- AIC(fitmod3_2)

3 state model with covariate effects on state parameters

Code
set.seed(123)
mod3_3 = depmix(list(Copepod_richness ~ 1 +PDO_DecMarch, 
                     N_copepod~1 +PDO_DecMarch,
                     S_copepod~1+PDO_DecMarch), 
             nstates = 3, 
             family = list(gaussian(),gaussian(),gaussian()),
             data=subset_multi)
fitmod3_3 = fit(mod3_3)
converged at iteration 11 with logLik: 2.504683 
Code
summary(fitmod3_3, which = "transition")
Transition matrix 
        toS1  toS2 toS3
fromS1 0.666 0.334  0.0
fromS2 0.500 0.000  0.5
fromS3 0.400 0.400  0.2
Code
summary(fitmod3_3, which = "response")
Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.PDO_DecMarch Re1.sd Re2.(Intercept) Re2.PDO_DecMarch
St1          -1.082            0.449  0.553           0.246           -0.033
St2           1.358            0.623  0.540           0.148           -0.045
St3           2.330            0.622  1.165          -0.385           -0.087
    Re2.sd Re3.(Intercept) Re3.PDO_DecMarch Re3.sd
St1  0.153          -0.108            0.030  0.081
St2  0.055           0.022            0.047  0.073
St3  0.127           0.201            0.063  0.121
Code
AIC3_3 <- AIC(fitmod3_3)

3 state model with covariate effects on state and transition parameters

Code
set.seed(123)
mod3_4 = depmix(list(Copepod_richness ~ 1 +PDO_DecMarch, 
                     N_copepod~1 +PDO_DecMarch,
                     S_copepod~1+PDO_DecMarch), 
             nstates = 3, 
             family = list(gaussian(),gaussian(),gaussian()),
             transition = ~scale(PDO_DecMarch),
             data=subset_multi)
fitmod3_4 = fit(mod3_4)
converged at iteration 9 with logLik: 6.771336 
Code
summary(fitmod3_4, which = "transition")
Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~scale(PDO_DecMarch)
Coefficients: 
                    St1       St2       St3
(Intercept)           0 -17.85489 -4.781335
scale(PDO_DecMarch)   0  17.24089 -3.743735
Probalities at zero values of the covariates.
0.9916849 1.7462e-08 0.008315076 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~scale(PDO_DecMarch)
Coefficients: 
                    St1       St2       St3
(Intercept)           0 8.9955148 10.143213
scale(PDO_DecMarch)   0 0.4125162  0.253309
Probalities at zero values of the covariates.
2.986338e-05 0.2409025 0.7590676 

Transition model for state (component) 3 
Model of type multinomial (mlogit), formula: ~scale(PDO_DecMarch)
Coefficients: 
                    St1       St2        St3
(Intercept)           0 -356.4094 -42.687862
scale(PDO_DecMarch)   0 -235.1345   2.774884
Probalities at zero values of the covariates.
1 1.634432e-155 2.889994e-19 
Code
summary(fitmod3_4, which = "response")
Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.PDO_DecMarch Re1.sd Re2.(Intercept) Re2.PDO_DecMarch
St1          -0.644            0.643  1.205           0.212           -0.041
St2           0.898            0.987  0.784           0.160           -0.125
St3           1.908            0.536  0.724          -0.331           -0.109
    Re2.sd Re3.(Intercept) Re3.PDO_DecMarch Re3.sd
St1  0.155          -0.100            0.044  0.086
St2  0.203           0.083            0.062  0.004
St3  0.158           0.210            0.075  0.099
Code
AIC3_4 <- AIC(fitmod3_4)
Code
AIC2_1
[1] 103.2825
Code
AIC2_2
[1] 75.02772
Code
AIC2_3
[1] 105.0545
Code
AIC3_1
[1] 80.43083
Code
AIC3_2
[1] 121.6922
Code
AIC3_3 # best model 
[1] 64.99063
Code
AIC3_4
[1] 68.45733
Code
print(fitmod3_3)
Convergence info: Log likelihood converged to within tol. (relative change) 
'log Lik.' 2.504683 (df=35)
AIC:  64.99063 
BIC:  106.2225 

Plotting the 3 state model

Code
prstates = apply(posterior(fitmod3_3)[,c("S1","S2", "S3")], 1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
plot(prstates, type="b", xlab="Time", ylab="State")

Based on the AIC values, the best model for assessing our subset of copepod data is the multivariate 3 state HMM model with covariate effects on the state parameters.

Results / Discussion

Does it converge?

  • Yes, all our models converge.

How many states seem to be most supported?

  • An HMM model with 3 states is the most supported by AIC comparison but the fit looks better with a 2 state model.

Plot the time series of estimated states.

Code
prstates = apply(posterior(fitmod3_3)[,c("S1","S2", "S3")], 1, which.max)
plot(subset_multi$variable,scale(apply(posterior(fitmod3_3)[,c("S1","S2", "S3")], 1, which.max)), xlab="Time", ylab="State", type = "b")
points(subset_multi$variable,scale(subset_multi$Copepod_richness), pch = 19)

What does this mean? - The 3 state model shows that shows a smooth transition between states 1 - 3. While the transition probabilities below show that it is possible for the model to jump from state 3 to state 1, the opposite is not true (probability of moving from state 1 to state 3 is zero). This is reflected when plotting the likely states. It is more likely to switch smoothly between the states over time, which makes sense when considering the influence of PDO - which indicates an overall warm or cool phase of ocean activity. It is more likely that temperature fluctuations have smooth transitions than drastic jumps over a large time scale, and this is reflected in the 3 state model. It also explains why the 2 state model with the same covariate effects also fits the data well - eliminating the middle state from a time series with smooth transitions shouldn’t drastically effect the model.

What are the transition probabilities? - Our transition matrix for the 3 state model:

\[\begin{bmatrix} 0.66 & 0.33 & 0 \\ 0.5 & 0 & 0.5 \\ 0.4 & 0.4 & 0.2 \\ \end{bmatrix}\]

If included, what are the covariate effects? What do these mean? - We evaluated the effect of the climate covariate Pacific Decadal Oscillation (PDO) from December to March and found that PDO allowed the HMM model to have a better fit to the data when influencing the state process.

Anything else interesting that you’ve discovered? - In both the 2 and 3 state HMM models, the covariate effects were best explained as state effects.

Team contributions

Miranda and Maria both decided on the variables and covariates to test. Maria fit the 2 state model and Miranda fit the 3 state model. Miranda drafted the contributions into a final report. Maria wrote the discussion and finalized the report.