An introduction to hidden markov models for time series

Overview of today’s material

Gentle introduction to HMMs
Theory and notation
Examples of univariate HMMs in R
Examples of multivariate HMMs in R

Overview of today’s material

For additional background, see

Zucchini et al. (2008, 2016) “Hidden Markov Models for Time Series: An Introduction Using R”
Jackson (2011) “Multi-State Models for Panel Data: The msm Package for R”
Visser and Speekenbrink (2010) “depmixS4: An R Package for Hidden Markov Models”
McClintock et al. (2020) “Uncovering ecological state dynamics with hidden Markov models”
Michelot (2022) “hmmTMB: Hidden Markov models with flexible covariate effects in R”

State space models

We’ve already discussed state space models. These models include

a latent process model (we don’t directly observe)
a data or observation model
we’ve generally assumed both models be normal (or multivariate normal)

State space models

Process model: \[{ x }_{ t }={ x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]

Observation model: \[{ y }_{ t }={ x }_{ t }+{ \delta }_{ t }\]

where ${ \varepsilon }_{ t } \sim Normal\left( 0,{ \sigma }_{ \varepsilon } \right)$ and ${ \delta }_{ t } \sim Normal\left( 0,{ \sigma }_{ \delta } \right)$

State space models

Adding AR coefficients can make these models stationary with respect to the mean,

\[{ x }_{ t }={ p\cdot x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]

however they may not be able to explain some datasets very well.

Specifically, these models are not well designed to model regimes

Regimes

Example from Moore (2018): “Predicting tipping points in complex environmental systems”

Regimes / behavioral states

Many examples of time series in ecology & fisheries may alternate between multiple states (2+)

Vert-pre et al. (2013) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562848/
Francis et al. (2012) https://onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2486.2012.02702.x

Additional ecological exmaples:

Mark recapture models with latent states
Identifying changing behavior or habitat use:
Tennessen et al. (2019) “Hidden Markov models reveal temporal patterns and sex differences in killer whale behavior”
Leos-Barajas et al. (2017) “Analysis of animal accelerometer data using hidden Markov models”
momentuHMM: R package for analysis of telemetry data using generalized multivariate hidden Markov models of animal movement

HMM Examples (this class):

Rand et al. 2024 https://doi.org/10.1016/j.ecolmodel.2024.110800
Hodgson et al. 2020 https://doi.org/10.1016/j.biocon.2020.108685

Regimes

Lots of non-HMM approaches for detecting regimes

STARS algorithm
- Sequential t-test approach for detecting changes in the mean
- Rodionov (2015) https://www.mdpi.com/2225-1154/3/3/474
Brute force model selection approach
- iterate through change points, evaluating data support for each
- how do we do change points with regression? ${ Y }_{ t }=B{ X }_{ t }+{ \varepsilon }_{ t }$

Regimes: Nile River

Breakpoint / segmented regression

time of breaks passed in as data (as can other covariates)
all variance = observation error

State space model

time of breaks + other covariates can be passed in or not
change is smooth
variance separated into observation, process error

Hidden Markov Model

covariates can be passed in as data, time of breaks not
variance separated into observation, process

Regimes: simulating data

Fitting these data with state space models

They can actually fit the data from a regime model quite well.

Via MARSS()

Limitations of state space models with continuous states

What’s lacking is inference about:

What’s the probability of transition between regimes?
How long are regimes expected to last?
What regimes might we expect in the future?

Lots of applications: speech recognition, bioinformatics, animal movement, environmental data (rain), finance

HMM: theory

Markov Process

time may be discrete or continuous (we’ll focus on discrete)
Markov if at each time step, the state $x_{t}$ is only dependent on the previous state $x_{t-1}$
$x_{t}$ does not depend on future values
these ideas should be familiar, with the exception of states changing from continuous to discrete

Entire history can be written as \[\left\{ { x }_{ 1 },{ x }_{ 2 },{ x }_{ 3 },...,{ x }_{ T } \right\}\]

HMM: theory

A key piece of the Markov process are the transition probabilities.

The probability of transitioning from state $j$ to $i$ given the current state is \[P\left( { x }_{ t+1 }=j | { x }_{ t }=i \right) ={ \gamma }_{ ij }\]
And then these probabilities can be summarized in a transition matrix, \[\Gamma =\left[ \begin{matrix} { \gamma }_{ 11 } & { \gamma }_{ 12 } & { \gamma }_{ 13 } \\ { \gamma }_{ 21 } & { \gamma }_{ 22 } & { \gamma }_{ 23 } \\ { \gamma }_{ 31 } & { \gamma }_{ 32 } & { \gamma }_{ 33 } \end{matrix} \right]\]
Elements of $\Gamma$ can be fixed a priori (0, etc)

HMMs theory

Let’s try to simulate this in R

# initial state
x <- sample(1:2, size = 1)
# transition matrix
Gamma <- matrix(c(0.9, 0.1, 0.2, 0.8), 2, 2, byrow = TRUE)
for(t in 2:100) {
  x[t] <- sample(1:2, size = 1, prob = Gamma[x[t-1],])
}

HMMs theory

HMM theory

Matrix $\Gamma$ is the 1-step transitions. However k-step transition probabilities can be generated,

$\Gamma(k) = \Gamma^{k}$

From this, we can also calculate the stationary distribution of the chain
- These are the long term probabilities of being in each state
- See Zucchini et al. (2006) Chapter 2

HMM theory

There are two general flavours of transition matrices:

Homogenous (or stationary)
- transition probabilities don’t depend on $t$
Non-homogeneous
- time-varying transition probabilities
In this class, we’ll mostly consider homogeneous cases

HMM theory

Covariates may enter HMMs in several ways

Effects on the mean
Effects on the transition matrix $\Gamma$
For effects on $\Gamma$, probabilities are constrained (0,1) and constrained to sum to 1
- multivariate logit regression used to relate covariates to elements of $\Gamma$
- reference / base case is fixed at zero (Agresti 2002)
Covariates can also be included on observation model, not discussed in this class

HMMs theory

Returning to our simulated example, let’s include a temporal trend in the transition probabilities
We assume trend is linear in logit space
For simplicity, we assume that slope is same on ${ \gamma }_{ 12 }$ and ${ \gamma }_{ 21 }$

plogis(logit(0.1) + 0.015*1:100)

HMMs theory

# initial state
x <- sample(1:2, size = 1)
# transition matrix
Gamma <- matrix(c(0.9, 0.1, 0.2, 0.8), 2, 2, byrow = TRUE)
for(t in 2:100) {
  # update transition probabilities
  Gamma[1,2] = plogis(logit(0.1) + 0.015*t)
  Gamma[1,1] = 1 - Gamma[1,2]
  Gamma[2,1] = plogis(logit(0.2) + 0.015*t)
  Gamma[2,2] = 1 - Gamma[2,1]
  # simulate next state
  x[t] <- sample(1:2, size = 1,prob = Gamma[x[t-1],])
}

HMMs theory

HMM: theory

Observations: observable data ${Y}_{ i=1,...,N }$
States: latent (discrete) variables that are not directly observed
- ${x}_{ t=1,...,T }$
Transition probabilities: transition matrix representing the probability of transitioning between states in the Markov chain
- $\Gamma$ and ${ \gamma }_{ ij }$
Emission probabilities: how likely the states are at any particular timestep
- ${ \theta }_{ i=1,...,N }$

HMM: theory

Our simulations haven’t yet included an observation component. Let’s add observed data that’s normally distributed

# initial state
x <- sample(1:2, size=1)
# transition matrix
Gamma <- matrix(c(0.9, 0.1, 0.2, 0.8), 2, 2, byrow = TRUE)
for(t in 2:100) {
  x[t] <- sample(1:2, size = 1, prob = Gamma[x[t-1],])
}

u = c(1.2, 5.4)
y = rnorm(length(x), u[x], 1)

HMM: theory

One quantity of interest from HMMs is the marginal distribution, $P(Y_{t})$

Output from the model will give us the conditional distribution, $P(Y_{t}|x_{t}=i)$. But we need to marginalize over all possible states

Solution: \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ P({ x }_{ t }=i)P({ Y }_{ t }|{ x }_{ t }=i) }\] which can also be expressed as \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ { \delta }_{ i }P({ Y }_{ t }|{ x }_{ t }=i) }\] where $\delta$ represents the stationary distribution (Zucchini et al. 2006)

HMM: theory

Estimation of HMM parameters can be quite complicated

Dealing with joint likelihood of observed data and unobserved states

\[P(x,Y)=\prod _{ i=1 }^{ T }{ P({ x }_{ t }|{ x }_{ t-1 })P({ Y }_{ t }|{ x }_{ t }) }\]

HMM: theory

Estimation most commonly done with maximum likelihood

Forward-backward algorithm: calculate probability of observed data sequence
Viterbi algorithm: used to generate most likely states
EM-algorithm: estimate / update parameters

For forward-backward algorithm we can factor conditional state probabilties

\[P\left( { x }_{ t }|{ Y }_{ 1:T } \right) =P({ x }_{ t }|{ Y }_{ 1:t },{ Y }_{ t+1:T })=P({ x }_{ t }|{ Y }_{ 1:t })P({ Y }_{ t+1:T }|{ x }_{ t })\] * Probability of state given the data is the probability of the state given the data up to that time step multiplied by the probability of future data given state

HMM: theory

Forward-backward algorithm has 3 steps (sometimes 2/3 combined):

Forward probabilities: from the last slide, this step calculates $P({ x }_{ t }|{ Y }_{ 1:t })$
Backward probabilities: from the last slide, this step calculates $P({ Y }_{ t+1:T }|{ x }_{ t })$
Smoothing: compute marginal likelihood of sequence of state variables $P(x_{t}|Y)$

Examples of univariate HMMs in R

As a first example, let’s use an example of rainfall data in Seattle from the end of 2018 and beginning of 2019 (accessed from wunderground.com)

Examples of univariate HMMs in R

We could model rainfall directly, but for starters let’s just model whether or not it rained on a given day (0/1).

rain$rained = ifelse(rain$Rainfall > 0, 1, 0)

Examples of univariate HMMs in R

Questions that we might be interested in:

Conditional probabilities
- What’s the probability of rain tomorrow given that it’s raining today?
- What’s the probability of rain tomorrow given that it’s not raining today?
Persistence
- On average, how long might we expect to go without rain?

Examples of univariate HMMs in R

We don’t really need HMMs to address these questions

daily rainfall may be measured with a tiny amount of uncertainty
probably safe to say that there’s almost no uncertainty in whether or not it rained on a given day

Examples of univariate HMMs in R

Transition probabilities can be just calculated directly,

$P(rain_{t+1} | rain_{t})$

\[\frac{\text{# consecutive rainy days}}{\text{# rainy days}}\]

For example, we can create a differenced variable to indicate no change (0), and compute

rain$diff = c(diff(rain$rained), NA)
p_rain_rain = length(which(rain$diff == 0 & 
    rain$rained==1)) / length(which(rain$rained==1))

Examples of univariate HMMs in R

Ok, now let’s assume that there’s some minor observation or measurement error in the rain gauge data. For this case, it’s best to use a HMM.

Let’s start with using the R package hmmTMB. The main package vignette is linked here,

https://github.com/TheoMichelot/hmmTMB

Examples of univariate HMMs in R

There’s three main steps in fitting a HMM with hmmTMB. We’ll step through each of these in the following slides

Hidden state model

MarkovChain$...

Observation model

Observation$...

Construct HMM

HMM$...

Examples of univariate HMMs in R

First we need to specify the number of latent / hidden states, and will start with 2 (rained / didn’t rain)

hidden <- MarkovChain$new(data = rain, n_states = 2)

Examples of univariate HMMs in R

The observation model family represents the distribution of responses conditioned on the states. For this example, we’re modeling rainfall as a binary response

# options for other families in vignette
dists <- list(rained = "binom") 
# named list of starting values -- varies by family
par0 <- list(rained = list(size = c(1,1), prob = c(0.1, 0.9)))
# create observation model
obs_model <- Observation$new(data = rain,
                             dists = dists,
                             n_states = 2,
                             par = par0)

Examples of univariate HMMs in R

Next we can construct the model, bringing the hidden and observation models together

hmm <- HMM$new(obs = obs_model, 
               hid = hidden)

Fitting the model can be done by calling fit (without assigning a new object)

hmm$fit(silent=TRUE)

Examples of univariate HMMs in R

Now that we’ve set up the model and have done the estimation we can examine the output. A basic summary can be printed with

hmm

Examples of univariate HMMs in R

Or a more accessible approach is with

hmm$par()

Here,

hmm$tpm contains the transition matrix

Examples of univariate HMMs in R

If we color observations by the most likely state sequence (Viterbi algorithm), it looks like the observations are being perfectly classified

rain$est_state <- factor(paste0("State", hmm$viterbi()))
ggplot(rain, aes(Time, rained, col = est_state)) + 
  geom_point() + ylab("Rained?") + theme_bw()

Examples of univariate HMMs in R

viterbi() returns the most likely state sequence over the time series

state_probs() returns a matrix time steps x states, with the estimated probabilities of each

sample_states() returns posterior samples of states, with uncertainty

Examples of univariate HMMs in R

The transition matrices (tpm) are also easily accessible

	To State 1	To State 2
From State 1	0.59	0.41
From State 2	0.23	0.77

Examples of univariate HMMs in R

As an aside, a general challenge with HMMs is that the results / best model might be sensitive to

Random number seed used
Starting values for estimated parameters

Examples of univariate HMMs in R

There’s a couple practical ways to try to overcome these issues. Pseudocode:

best = 1.0e10
best_model = NA
for(i in 1:iter) {
  set.seed(i)
  p <- runif(1)# randomize initial values
  par0 <- list(rained = list(size = c(1, 1), 
                             prob = c(p, 1-p)))
  [re-fit the model here]
  if(hmm$AIC_conditional() < best) {
    best_model = fitmod #? best solution?
    best = hmm$AIC_conditional()
  }
}

Examples of univariate HMMs in R

Let’s move on to a more complex example.

We’ll pull some data from the CalCOFI ichthyoplankton cruises in Southern California. Some species have been used to indicate cool versus warm regimes.

http://calcofi.org/publications/calcofireports/v58/Vol58-State_of_the_Current_pages_1-55.pdf

For this example, we’ll focus on the California smoothtongue (Leuroglossus stilbius)

Examples of univariate HMMs in R

Caveat: the survey is spatially gridded, and you’d want to perform index standardization or use spatial models to generate indices. For simplicty, we’re just taking the mean abundance across stations in April-May.

Examples of univariate HMMs in R

We’ll start with fitting a 2-state model with. Assumptions:

Model fit to ln transformed CPUE data, assumed Gaussian

set.seed(123)
calcofi$ln_abund <- log(calcofi$m)
hidden <- MarkovChain$new(data = calcofi, n_states = 2)
dists <- list(ln_abund = "norm") 
par0 <- list(ln_abund = list(mean = c(0,3), sd = c(0.1, 0.1)))
obs_model <- Observation$new(data = calcofi,
                             dists = dists,
                             n_states = 2,
                             par = par0)
hmm <- HMM$new(obs = obs_model, 
               hid = hidden)
hmm$fit(silent=TRUE)

Examples of univariate HMMs in R

This ends up being a case where 2+ states aren’t supported – most observations are classified to a single state

calcofi$est_state <- factor(paste0("State", hmm$viterbi()))
ggplot(calcofi, aes(year, ln_abund, col = est_state)) + 
  geom_point() + ylab("Ln CPUE") + xlab("Year") + theme_bw()

Model selection & diagnostics

In comparing 2 vs 3 state HMM, one could compare AIC with hmm$AIC_marginal() or hmm$conditional()
However, opposite is often true – more states fit better, and result in lower AIC
How to decide? Occam’s razor & metrics of predictive ability
Pohle et al. 2017. “Selecting the Number of States in Hidden Markov Models: Pragmatic Solutions Illustrated Using Animal Movement”

Model selection & diagnostics

We can also look at residual QQ plots as a goodness of fit metric (as you would do with DHARMa or other packages). With hmmTMB

pseudo <- hmm$pseudores()
ggplot(as.data.frame(pseudo), aes(sample = ln_abund)) +
  geom_qq() +
  geom_qq_line(col="red") +
  ggtitle("QQ Plot")

Examples of multivariate HMMs in R

If the univariate examples make sense, it’s not that different to extend these models for multiple time series.

In this setting, the assumption is that the latent state process is the same for all time series, though the time series may differ

their lengths
their error distributions

Examples of multivariate HMMs in R

We’ll extend our CalCOFI example to include 2 species now

Examples of multivariate HMMs in R

Fitting a 2-state model with 2 species as responses. First we have to reshape the data,

calcofi$ln = log(calcofi$m) # ln transform
calcofi <- dcast(melt(calcofi[,c("year","ln","species")], 
            id.vars = c("year", "species")), 
            year ~ species)
names(calcofi)[2:3] = c("L_stilbius","L_ochotensis")
head(calcofi)

##   year  L_stilbius L_ochotensis
## 1 1980  0.85449030   -0.8414039
## 2 1981  3.13964169    1.6941728
## 3 1982  3.70359056    3.5435378
## 4 1983 -0.03344793    0.4947281
## 5 1984  3.00082695    2.4346651
## 6 1985  0.24911445   -0.8471716

Examples of multivariate HMMs in R

Note the only thing we have to change are the named family list dists and the initial parameters par0

set.seed(123)
hidden <- MarkovChain$new(data = calcofi, n_states = 2)
dists <- list(L_stilbius = "norm", L_ochotensis = "norm") 
par0 <- list(L_stilbius = list(mean = c(0,3), sd = c(0.1, 0.1)),
             L_ochotensis = list(mean = c(0,3), sd = c(0.1, 0.1)))
obs_model <- Observation$new(data = calcofi,
                             dists = dists,
                             n_states = 2,
                             par = par0)
hmm <- HMM$new(obs = obs_model, 
               hid = hidden)
hmm$fit(silent=TRUE)

Summary

HMMs are a useful approach at identifying latent state vectors that undergo discrete transitions

Estimation of HMMs for time series in R generally done with ML methods

Fast, but these algorithms may get stuck
Robust solutions = multiple starting values

Bayesian estimation generally beyond the scope of this class

Very straightforward to build these models in BUGS/JAGS
More complicated with newer sofware (Stan), though examples here: https://mc-stan.org/docs/2_26/stan-users-guide/hmms-section.html

JAGS Examples

Why JAGS? Categorical samplers
Let’s re-fit our CalCOFI example in JAGS. We’ll step through this in small chunks to understand what’s going on in JAGS syntax

JAGS model

First we need to specify the initial state. We don’t know it, but assume each state (1/2) is equally likely

model{
for(i in 1:2){initp[i] <- 0.5}
x[1] ~ dcat(initp)

JAGS model

Next we can think about parameterizing the transition matrix. $\Gamma$ is 2 x 2, but only has 2 parameters
We’ll specify uniform(0,1) priors on these

model{
for(i in 1:2){initp[i] <- 0.5}
x[1] ~ dcat(initp)
p ~ dunif(0,1)
q ~ dunif(0,1)

JAGS model

Next we specify $\Gamma$

model{
for(i in 1:2){initp[i] <- 0.5}
x[1] ~ dcat(initp)
gamma12 ~ dunif(0,1)
gamma21 ~ dunif(0,1)
Gamma[1,1] <- 1-gamma12
Gamma[1,2] <- gamma12
Gamma[2,1] <- gamma21
Gamma[2,2] <- 1-gamm21

JAGS model

Next, add in the transition model for the latent state x

model{
for(i in 1:2){initp[i] <- 0.5}
x[1] ~ dcat(initp)
p ~ dunif(0,1)
q ~ dunif(0,1)
Gamma[1,1] <- 1-p
Gamma[1,2] <- p
Gamma[2,1] <- q
Gamma[2,2] <- 1-q
for(i in 2:n){x[i] ~ dcat(Gamma[x[i-1],])}
}

JAGS model

And finally the observation model for our observed data
We need to estimate 2 means (u) and a residual error variance term

model{
for(i in 1:2){initp[i] <- 0.5}
x[1] ~ dcat(initp)
p ~ dunif(0,1)
q ~ dunif(0,1)
Gamma[1,1] <- 1-p
Gamma[1,2] <- p
Gamma[2,1] <- q
Gamma[2,2] <- 1-q
for(i in 2:n){x[i] ~ dcat(Gamma[x[i-1],])}
resid_tau~dgamma(0.001,0.001)
for(i in 1:2) {u[i] ~ dnorm(0,1)}
for(i in 1:n){y[i] ~ dnorm(u[x[i]],resid_tau)}
}

JAGS model

To run the model, save it in a file, e.g. ‘model.txt’
Create data list

n = nrow(calcofi)
y = log(calcofi$m)
jags_data = list("n","y")

Create parameter list

jags_params = c("x","u","Gamma","resid_tau")

JAGS model

Running the model

jagsfit <- jags(data=jags_data, 
parameters.to.save=jags_params,
model.file="model.txt")

JAGS model

Next steps: thus far, we’ve only considered HMMs that are AR(1)
Challenge: try to modify the example above for the CalCOFI data to be AR(2)
[hint: what are the dimensions of $\Gamma$?]