Gentle introduction to HMMs
Theory and notation
Examples of univariate HMMs in R
Examples of multivariate HMMs in R
Gentle introduction to HMMs
Theory and notation
Examples of univariate HMMs in R
Examples of multivariate HMMs in R
For additional background, see
Zucchini et al. (2008, 2016) “Hidden Markov Models for Time Series: An Introduction Using R”
Jackson (2011) “Multi-State Models for Panel Data: The msm Package for R”
Visser and Speekenbrink (2010) “depmixS4: An R Package for Hidden Markov Models”
McClintock et al. (2020) “Uncovering ecological state dynamics with hidden Markov models”
Michelot (2022) “hmmTMB: Hidden Markov models with flexible covariate effects in R”
We’ve already discussed state space models. These models include
a latent process model (we don’t directly observe)
a data or observation model
we’ve generally assumed both models be normal (or multivariate normal)
Process model: \[{ x }_{ t }={ x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]
Observation model: \[{ y }_{ t }={ x }_{ t }+{ \delta }_{ t }\]
where \({ \varepsilon }_{ t } \sim Normal\left( 0,{ \sigma }_{ \varepsilon } \right)\) and \({ \delta }_{ t } \sim Normal\left( 0,{ \sigma }_{ \delta } \right)\)
Adding AR coefficients can make these models stationary with respect to the mean,
\[{ x }_{ t }={ p\cdot x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]
however they may not be able to explain some datasets very well.
Example from Moore (2018): “Predicting tipping points in complex environmental systems”
Many examples of time series in ecology & fisheries may alternate between multiple states (2+)
Vert-pre et al. (2013) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562848/
Francis et al. (2012) https://onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2486.2012.02702.x
Mark recapture models with latent states
Identifying changing behavior or habitat use:
Tennessen et al. (2019) “Hidden Markov models reveal temporal patterns and sex differences in killer whale behavior”
Leos-Barajas et al. (2017) “Analysis of animal accelerometer data using hidden Markov models”
momentuHMM: R package for analysis of telemetry data using generalized multivariate hidden Markov models of animal movement
Rand et al. 2024 https://doi.org/10.1016/j.ecolmodel.2024.110800
Hodgson et al. 2020 https://doi.org/10.1016/j.biocon.2020.108685
Lots of non-HMM approaches for detecting regimes
Breakpoint / segmented regression
State space model
Hidden Markov Model
They can actually fit the data from a regime model quite well.
What’s lacking is inference about:
Lots of applications: speech recognition, bioinformatics, animal movement, environmental data (rain), finance
Markov Process
Entire history can be written as \[\left\{ { x }_{ 1 },{ x }_{ 2 },{ x }_{ 3 },...,{ x }_{ T } \right\}\]
A key piece of the Markov process are the transition probabilities.
The probability of transitioning from state \(j\) to \(i\) given the current state is \[P\left( { x }_{ t+1 }=j | { x }_{ t }=i \right) ={ \gamma }_{ ij }\]
And then these probabilities can be summarized in a transition matrix, \[\Gamma =\left[ \begin{matrix} { \gamma }_{ 11 } & { \gamma }_{ 12 } & { \gamma }_{ 13 } \\ { \gamma }_{ 21 } & { \gamma }_{ 22 } & { \gamma }_{ 23 } \\ { \gamma }_{ 31 } & { \gamma }_{ 32 } & { \gamma }_{ 33 } \end{matrix} \right]\]
Elements of \(\Gamma\) can be fixed a priori (0, etc)
# initial state x <- sample(1:2, size = 1) # transition matrix Gamma <- matrix(c(0.9, 0.1, 0.2, 0.8), 2, 2, byrow = TRUE) for(t in 2:100) { x[t] <- sample(1:2, size = 1, prob = Gamma[x[t-1],]) }
Matrix \(\Gamma\) is the 1-step transitions. However k-step transition probabilities can be generated,
\(\Gamma(k) = \Gamma^{k}\)
There are two general flavours of transition matrices:
Covariates may enter HMMs in several ways
Effects on the mean
Effects on the transition matrix \(\Gamma\)
For effects on \(\Gamma\), probabilities are constrained (0,1) and constrained to sum to 1
multivariate logit regression used to relate covariates to elements of \(\Gamma\)
reference / base case is fixed at zero (Agresti 2002)
Covariates can also be included on observation model, not discussed in this class
plogis(logit(0.1) + 0.015*1:100)
# initial state x <- sample(1:2, size = 1) # transition matrix Gamma <- matrix(c(0.9, 0.1, 0.2, 0.8), 2, 2, byrow = TRUE) for(t in 2:100) { # update transition probabilities Gamma[1,2] = plogis(logit(0.1) + 0.015*t) Gamma[1,1] = 1 - Gamma[1,2] Gamma[2,1] = plogis(logit(0.2) + 0.015*t) Gamma[2,2] = 1 - Gamma[2,1] # simulate next state x[t] <- sample(1:2, size = 1,prob = Gamma[x[t-1],]) }
# initial state x <- sample(1:2, size=1) # transition matrix Gamma <- matrix(c(0.9, 0.1, 0.2, 0.8), 2, 2, byrow = TRUE) for(t in 2:100) { x[t] <- sample(1:2, size = 1, prob = Gamma[x[t-1],]) } u = c(1.2, 5.4) y = rnorm(length(x), u[x], 1)
One quantity of interest from HMMs is the marginal distribution, \(P(Y_{t})\)
Output from the model will give us the conditional distribution, \(P(Y_{t}|x_{t}=i)\). But we need to marginalize over all possible states
Solution: \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ P({ x }_{ t }=i)P({ Y }_{ t }|{ x }_{ t }=i) }\] which can also be expressed as \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ { \delta }_{ i }P({ Y }_{ t }|{ x }_{ t }=i) }\] where \(\delta\) represents the stationary distribution (Zucchini et al. 2006)
Estimation of HMM parameters can be quite complicated
Dealing with joint likelihood of observed data and unobserved states
\[P(x,Y)=\prod _{ i=1 }^{ T }{ P({ x }_{ t }|{ x }_{ t-1 })P({ Y }_{ t }|{ x }_{ t }) }\]
Estimation most commonly done with maximum likelihood
Forward-backward algorithm: calculate probability of observed data sequence
Viterbi algorithm: used to generate most likely states
EM-algorithm: estimate / update parameters
\[P\left( { x }_{ t }|{ Y }_{ 1:T } \right) =P({ x }_{ t }|{ Y }_{ 1:t },{ Y }_{ t+1:T })=P({ x }_{ t }|{ Y }_{ 1:t })P({ Y }_{ t+1:T }|{ x }_{ t })\] * Probability of state given the data is the probability of the state given the data up to that time step multiplied by the probability of future data given state
Forward-backward algorithm has 3 steps (sometimes 2/3 combined):
As a first example, let’s use an example of rainfall data in Seattle from the end of 2018 and beginning of 2019 (accessed from wunderground.com)
We could model rainfall directly, but for starters let’s just model whether or not it rained on a given day (0/1).
rain$rained = ifelse(rain$Rainfall > 0, 1, 0)
Questions that we might be interested in:
We don’t really need HMMs to address these questions
Transition probabilities can be just calculated directly,
\[\frac{\text{# consecutive rainy days}}{\text{# rainy days}}\]
For example, we can create a differenced variable to indicate no change (0), and compute
rain$diff = c(diff(rain$rained), NA) p_rain_rain = length(which(rain$diff == 0 & rain$rained==1)) / length(which(rain$rained==1))
Ok, now let’s assume that there’s some minor observation or measurement error in the rain gauge data. For this case, it’s best to use a HMM.
Let’s start with using the R package hmmTMB
. The main package vignette is linked here,
There’s three main steps in fitting a HMM with hmmTMB
. We’ll step through each of these in the following slides
Hidden state model
MarkovChain$...
Observation model
Observation$...
Construct HMM
HMM$...
First we need to specify the number of latent / hidden states, and will start with 2 (rained / didn’t rain)
hidden <- MarkovChain$new(data = rain, n_states = 2)
The observation model family represents the distribution of responses conditioned on the states. For this example, we’re modeling rainfall as a binary response
# options for other families in vignette dists <- list(rained = "binom") # named list of starting values -- varies by family par0 <- list(rained = list(size = c(1,1), prob = c(0.1, 0.9))) # create observation model obs_model <- Observation$new(data = rain, dists = dists, n_states = 2, par = par0)
Next we can construct the model, bringing the hidden and observation models together
hmm <- HMM$new(obs = obs_model, hid = hidden)
Fitting the model can be done by calling fit (without assigning a new object)
hmm$fit(silent=TRUE)
Now that we’ve set up the model and have done the estimation we can examine the output. A basic summary can be printed with
hmm
Or a more accessible approach is with
hmm$par()
Here,
hmm$tpm
contains the transition matrix
If we color observations by the most likely state sequence (Viterbi algorithm), it looks like the observations are being perfectly classified
rain$est_state <- factor(paste0("State", hmm$viterbi())) ggplot(rain, aes(Time, rained, col = est_state)) + geom_point() + ylab("Rained?") + theme_bw()
viterbi()
returns the most likely state sequence over the time series
state_probs()
returns a matrix time steps x states, with the estimated probabilities of each
sample_states()
returns posterior samples of states, with uncertainty
The transition matrices (tpm
) are also easily accessible
To State 1 | To State 2 | |
---|---|---|
From State 1 | 0.59 | 0.41 |
From State 2 | 0.23 | 0.77 |
As an aside, a general challenge with HMMs is that the results / best model might be sensitive to
Random number seed used
Starting values for estimated parameters
There’s a couple practical ways to try to overcome these issues. Pseudocode:
best = 1.0e10 best_model = NA for(i in 1:iter) { set.seed(i) p <- runif(1)# randomize initial values par0 <- list(rained = list(size = c(1, 1), prob = c(p, 1-p))) [re-fit the model here] if(hmm$AIC_conditional() < best) { best_model = fitmod #? best solution? best = hmm$AIC_conditional() } }
Let’s move on to a more complex example.
We’ll pull some data from the CalCOFI ichthyoplankton cruises in Southern California. Some species have been used to indicate cool versus warm regimes.
For this example, we’ll focus on the California smoothtongue (Leuroglossus stilbius)
Caveat: the survey is spatially gridded, and you’d want to perform index standardization or use spatial models to generate indices. For simplicty, we’re just taking the mean abundance across stations in April-May.
We’ll start with fitting a 2-state model with. Assumptions:
set.seed(123) calcofi$ln_abund <- log(calcofi$m) hidden <- MarkovChain$new(data = calcofi, n_states = 2) dists <- list(ln_abund = "norm") par0 <- list(ln_abund = list(mean = c(0,3), sd = c(0.1, 0.1))) obs_model <- Observation$new(data = calcofi, dists = dists, n_states = 2, par = par0) hmm <- HMM$new(obs = obs_model, hid = hidden) hmm$fit(silent=TRUE)
This ends up being a case where 2+ states aren’t supported – most observations are classified to a single state
calcofi$est_state <- factor(paste0("State", hmm$viterbi())) ggplot(calcofi, aes(year, ln_abund, col = est_state)) + geom_point() + ylab("Ln CPUE") + xlab("Year") + theme_bw()
In comparing 2 vs 3 state HMM, one could compare AIC with hmm$AIC_marginal()
or hmm$conditional()
However, opposite is often true – more states fit better, and result in lower AIC
How to decide? Occam’s razor & metrics of predictive ability
Pohle et al. 2017. “Selecting the Number of States in Hidden Markov Models: Pragmatic Solutions Illustrated Using Animal Movement”
We can also look at residual QQ plots as a goodness of fit metric (as you would do with DHARMa
or other packages). With hmmTMB
pseudo <- hmm$pseudores() ggplot(as.data.frame(pseudo), aes(sample = ln_abund)) + geom_qq() + geom_qq_line(col="red") + ggtitle("QQ Plot")
If the univariate examples make sense, it’s not that different to extend these models for multiple time series.
In this setting, the assumption is that the latent state process is the same for all time series, though the time series may differ
We’ll extend our CalCOFI example to include 2 species now
Fitting a 2-state model with 2 species as responses. First we have to reshape the data,
calcofi$ln = log(calcofi$m) # ln transform calcofi <- dcast(melt(calcofi[,c("year","ln","species")], id.vars = c("year", "species")), year ~ species) names(calcofi)[2:3] = c("L_stilbius","L_ochotensis") head(calcofi)
## year L_stilbius L_ochotensis ## 1 1980 0.85449030 -0.8414039 ## 2 1981 3.13964169 1.6941728 ## 3 1982 3.70359056 3.5435378 ## 4 1983 -0.03344793 0.4947281 ## 5 1984 3.00082695 2.4346651 ## 6 1985 0.24911445 -0.8471716
Note the only thing we have to change are the named family list dists
and the initial parameters par0
set.seed(123) hidden <- MarkovChain$new(data = calcofi, n_states = 2) dists <- list(L_stilbius = "norm", L_ochotensis = "norm") par0 <- list(L_stilbius = list(mean = c(0,3), sd = c(0.1, 0.1)), L_ochotensis = list(mean = c(0,3), sd = c(0.1, 0.1))) obs_model <- Observation$new(data = calcofi, dists = dists, n_states = 2, par = par0) hmm <- HMM$new(obs = obs_model, hid = hidden) hmm$fit(silent=TRUE)
HMMs are a useful approach at identifying latent state vectors that undergo discrete transitions
Estimation of HMMs for time series in R generally done with ML methods
Bayesian estimation generally beyond the scope of this class
Why JAGS? Categorical samplers
Let’s re-fit our CalCOFI example in JAGS. We’ll step through this in small chunks to understand what’s going on in JAGS syntax
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp)
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) p ~ dunif(0,1) q ~ dunif(0,1)
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) gamma12 ~ dunif(0,1) gamma21 ~ dunif(0,1) Gamma[1,1] <- 1-gamma12 Gamma[1,2] <- gamma12 Gamma[2,1] <- gamma21 Gamma[2,2] <- 1-gamm21
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) p ~ dunif(0,1) q ~ dunif(0,1) Gamma[1,1] <- 1-p Gamma[1,2] <- p Gamma[2,1] <- q Gamma[2,2] <- 1-q for(i in 2:n){x[i] ~ dcat(Gamma[x[i-1],])} }
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) p ~ dunif(0,1) q ~ dunif(0,1) Gamma[1,1] <- 1-p Gamma[1,2] <- p Gamma[2,1] <- q Gamma[2,2] <- 1-q for(i in 2:n){x[i] ~ dcat(Gamma[x[i-1],])} resid_tau~dgamma(0.001,0.001) for(i in 1:2) {u[i] ~ dnorm(0,1)} for(i in 1:n){y[i] ~ dnorm(u[x[i]],resid_tau)} }
To run the model, save it in a file, e.g. ‘model.txt’
Create data list
n = nrow(calcofi) y = log(calcofi$m) jags_data = list("n","y")
jags_params = c("x","u","Gamma","resid_tau")
jagsfit <- jags(data=jags_data, parameters.to.save=jags_params, model.file="model.txt")
Next steps: thus far, we’ve only considered HMMs that are AR(1)
Challenge: try to modify the example above for the CalCOFI data to be AR(2)
[hint: what are the dimensions of \(\Gamma\)?]