Gentle introduction to HMMs
Theory and notation
Examples of univariate HMMs in R
Examples of multivariate HMMs in R
4 Feb 2021
Gentle introduction to HMMs
Theory and notation
Examples of univariate HMMs in R
Examples of multivariate HMMs in R
For additional background, see
Zucchini et al. (2008, 2016) “Hidden Markov Models for Time Series: An Introduction Using R”
Jackson (2011) “Multi-State Models for Panel Data: The msm Package for R”
Visser and Speekenbrink (2010) “depmixS4: An R Package for Hidden Markov Models”
McClintock et al. (2020) “Uncovering ecological state dynamics with hidden Markov models”
We’ve already discussed state space models. These models include
a latent process model (we don’t directly observe)
a data or observation model
we’ve generally assumed both models be normal (or multivariate normal)
Process model: \[{ x }_{ t }={ x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]
Observation model: \[{ y }_{ t }={ x }_{ t }+{ \delta }_{ t }\]
where \({ \varepsilon }_{ t } \sim Normal\left( 0,{ \sigma }_{ \varepsilon } \right)\) and \({ \delta }_{ t } \sim Normal\left( 0,{ \sigma }_{ \delta } \right)\)
Adding AR coefficients can make these models stationary with respect to the mean,
\[{ x }_{ t }={ p\cdot x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]
however they may not be able to explain some datasets very well.
Many examples of time series in ecology & fisheries may alternate between multiple states (2+)
Vert-pre et al. (2013) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562848/
Francis et al. (2012) https://onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2486.2012.02702.x
Mark recapture models with latent states
Identifying changing behavior or habitat use:
Tennessen et al. (2019) “Hidden Markov models reveal temporal patterns and sex differences in killer whale behavior”
Leos-Barajas et al. (2017) “Analysis of animal accelerometer data using hidden Markov models”
momentuHMM: R package for analysis of telemetry data using generalized multivariate hidden Markov models of animal movement
Lots of non-HMM approaches for detecting regimes
They can actually fit the data from a regime model quite well.
What’s lacking is inference about:
Lots of applications: speech recognition, bioinformatics, animal movement, environmental data (rain), finance
Markov Process
Entire history can be written as \[\left\{ { x }_{ 1 },{ x }_{ 2 },{ x }_{ 3 },...,{ x }_{ T } \right\}\]
A key piece of the Markov process are the transition probabilities.
The probability of transitioning from state \(j\) to \(i\) given the current state is \[P\left( { x }_{ t+1 }=j | { x }_{ t }=i \right) ={ \gamma }_{ ij }\]
And then these probabilities can be summarized in a transition matrix, \[\Gamma =\left[ \begin{matrix} { \gamma }_{ 11 } & { \gamma }_{ 12 } & { \gamma }_{ 13 } \\ { \gamma }_{ 21 } & { \gamma }_{ 22 } & { \gamma }_{ 23 } \\ { \gamma }_{ 31 } & { \gamma }_{ 32 } & { \gamma }_{ 33 } \end{matrix} \right]\]
Elements of \(\Gamma\) can be fixed a priori (0, etc)
# initial state x <- sample(1:2, size=1) # transition matrix Gamma = matrix(c(0.9,0.1,0.2,0.8),2,2) for(t in 2:100) { x[t] <- sample(1:2, size=1,prob = Gamma[x[t-1],]) }
Matrix \(\Gamma\) is the 1-step transitions. However k-step transition probabilities can be generated,
\(\Gamma(k) = \Gamma^{k}\)
There are two general flavours of transition matrices:
Covariates may enter HMMs in one of two ways
Effects on the mean
Effects on the transition matrix \(\Gamma\)
For effects on \(\Gamma\), probabilities are constrained (0,1) and constrained to sum to 1
plogis(logit(0.1) + 0.015*1:100)
# initial state x <- sample(1:2, size=1) # transition matrix Gamma = matrix(c(0.9,0.1,0.2,0.8),2,2) for(t in 2:100) { Gamma[1,2] = plogis(logit(0.1) + 0.015*t) Gamma[1,1] = 1 - Gamma[1,2] Gamma[2,1] = plogis(logit(0.2) + 0.015*t) Gamma[2,2] = 1 - Gamma[2,1] x[t] <- sample(1:2, size=1,prob = Gamma[x[t-1],]) }
Observations: observable data \({Y}_{ i=1,...,N }\)
States: latent (discrete) variables that are not directly observed
Transition probabilities: transition matrix representing the probability of transitioning between states in the Markov chain
Emission probabilities: how likely the states are at any particular timestep
# initial state x <- sample(1:2, size=1) # transition matrix Gamma = matrix(c(0.9,0.1,0.2,0.8),2,2) for(t in 2:100) { x[t] <- sample(1:2, size=1,prob = Gamma[x[t-1],]) } u = c(1.2, 5.4) y = rnorm(length(x), u[x], 1)
One quantity of interest from HMMs is the marginal distribution, \(P(Y_{t})\)
Output from the model will give us the conditional distribution, \(P(Y_{t}|x_{t}=i)\). But we need to marginalize over all possible states
Solution: \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ P({ x }_{ t }=i)P({ Y }_{ t }|{ x }_{ t }=i) }\] which can also be expressed as \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ { \delta }_{ i }P({ Y }_{ t }|{ x }_{ t }=i) }\] where \(\delta\) represents the stationary distribution (Zucchini et al. 2006)
Estimation of HMM parameters can be quite complicated
Dealing with joint likelihood of observed data and unobserved states
\[P(x,Y)=\prod _{ i=1 }^{ T }{ P({ x }_{ t }|{ x }_{ t-1 })P({ Y }_{ t }|{ x }_{ t }) }\]
Estimation most commonly done with maximum likelihood
\[P\left( { x }_{ t }|{ Y }_{ 1:T } \right) =P({ x }_{ t }|{ Y }_{ 1:t },{ Y }_{ t+1:T })=P({ x }_{ t }|{ Y }_{ 1:t })P({ Y }_{ t+1:T }|{ x }_{ t })\] * Probability of state given the data is the probability of the state given the data up to that time step multiplied by the probability of future data given state
Forward-backward algorithm has 3 steps (sometimes 2/3 combined):
As a first example, let’s use an example of rainfall data in Seattle from the end of 2018 and beginning of 2019 (accessed from wunderground.com)
We could model rainfall directly, but for starters let’s just model whether or not it rained on a given day (0/1).
rain$rained = ifelse(rain$Rainfall > 0, 1, 0)
Questions that we might be interested in:
We don’t really need HMMs to address these questions
Transition probabilities can be just calculated directly,
\[\frac { \# consecutive \quad rainy \quad days }{ \# rainy \quad days }\]
For example, we can create a differenced variable to indicate no change (0), and compute
rain$diff = c(diff(rain$rained), NA) p_rain_rain = length(which(rain$diff == 0 & rain$rained==1)) / length(which(rain$rained==1))
Ok, now let’s assume that there’s some minor observation or measurement error in the rain gauge data. For this case, it’s best to use a HMM.
Let’s start with using the R package depmixS4
. There’s a good vignette on the package that you can refer back to,
https://cran.r-project.org/web/packages/depmixS4/vignettes/depmixS4.pdf
There’s two functions we’ll use to set up the HMM with depmixS4
.
First we’ll structure the model using the same formula notation you’re hopefully familiar with,
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
Stepping through this rained
is our response (yes / no)
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
nstates
is the number of alternative states (> 2), specified a priori
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
transition
is a formula to specify whether any covariates are to be included in the transition probabilities. The default is no covariates, and that these transitions aren’t time varying.
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
family
is family or list of families (a list for multiple response variables) of the families associated with each response. The majority of common families are supported
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
For a complete list, see
?depmixS4::GLMresponse
Ok, now that we’ve set up the model, we can do the estimation and look at the output
set.seed(123) fitmod = fit(mod)
To_1 | To_2 | |
---|---|---|
From_1 | 0.77 | 0.23 |
From_2 | 0.41 | 0.59 |
As a warning, note that the results from the estimation are a bit sensitive to the initial seed. Look at how much the transition probabilties change,
set.seed(121) fitmod = fit(mod)
To_1 | To_2 | |
---|---|---|
From_1 | 0.59 | 0.41 |
From_2 | 0.23 | 0.77 |
There’s a couple practical ways to try to overcome this seed issue.
Unfortunately for this example, this doesn’t solve the issue
Pseudocode:
best = 1.0e10 best_model = NA for(i in 1:iter) { # fit the model fitmod = fit(mod) # check to see if this is the best solution? if(AIC(fitmod) < best) { best_model = fitmod best = AIC(fitmod) } }
Let’s move on to a more complex example.
We’ll pull some data from the CalCOFI ichthyoplankton cruises in Southern California. Some species have been used to indicate cool versus warm regimes.
For this example, we’ll focus on the California smoothtongue (Leuroglossus stilbius)
Caveat: the survey is spatially gridded, and you’d want to perform index standardization or use spatial models to generate indices. For simplicty, we’re just taking the mean abundance across stations in April-May.
We’ll start with fitting a 2-state model with depmix
. Assumptions:
set.seed(123) calcofi$ln = log(calcofi$m) mod = depmix(ln ~ 1, nstates = 2, data=calcofi) fitmod = fit(mod)
First let’s look at how to get predictions out of a depmix.fitted
object
We’ll start with the state probabilities. Remember we could work with either
viterbi
functionposterior
functionThe most probable states are
prstates = apply(posterior(fitmod)[,c("S1","S2")], 1, which.max)
plot(prstates, type="b", xlab="Time", ylab="State")
depmixS4
doesn’t have a predict()
or forecast()
function, but creating estimated data is pretty straightforward. We can get the means out with
mu = summary(fitmod)[,1]
pred = data.frame("year"=seq(min(calcofi$year), max(calcofi$year)), "fit" = mu[prstates])
Some diagnostics (like autocorrelation) may not look great. What if we compared the previous model to one with 3 states? AIC from the 2-state model was
## [1] 118.994
set.seed(123) calcofi$ln = log(calcofi$m) mod = depmix(ln ~ 1, nstates = 3, data=calcofi) fitmod = fit(mod)
This seems like increasing the states doesn’t result in lower AIC
## [1] 122.5181
In comparing 2 vs 3 state HMM, we found lower AIC for 2-state model
However, opposite is often true – more states fit better, and result in lower AIC
How to decide? Occam’s razor & metrics of predictive ability
Pohle et al. 2017. “Selecting the Number of States in Hidden Markov Models: Pragmatic Solutions Illustrated Using Animal Movement”
If the univariate examples make sense, it’s not that different to extend these models for multiple time series.
In this setting, the assumption is that the latent state process is the same for all time series, though the time series may differ
In depmix
, these arguments generally become lists
We’ll extend our CalCOFI example to include 2 species now
Fitting a 2-state model with 2 species as responses. First we have to reshape the data,
calcofi$ln = log(calcofi$m) # ln transform calcofi <- dcast(melt(calcofi[,c("year","ln","species")], id.vars = c("year", "species")), year ~ species) names(calcofi)[2:3] = c("L_stilbius","L_ochotensis") head(calcofi)
## year L_stilbius L_ochotensis ## 1 1980 0.85449030 -0.8414039 ## 2 1981 3.13964169 1.6941728 ## 3 1982 3.70359056 3.5435378 ## 4 1983 -0.03344793 0.4947281 ## 5 1984 3.00082695 2.4346651 ## 6 1985 0.24911445 -0.8471716
set.seed(123) mod = depmix(list(L_stilbius ~ 1, L_ochotensis~1), nstates = 2, family = list(gaussian(),gaussian()), data=calcofi) fitmod = fit(mod)
We could also have situations where the time series are of different length.
For example, if L_ochotensis time series was missing first half of values
The argument ntimes
is a list that becomes particularly valuable here - representing the length of each time series
HMMs are a useful approach at identifying latent state vectors that undergo discrete transitions
Estimation of HMMs for time series in R generally done with ML methods
Bayesian estimation generally beyond the scope of this class
Why JAGS? Categorical samplers
Let’s re-fit our CalCOFI example in JAGS. We’ll step through this in small chunks to understand what’s going on in JAGS syntax
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp)
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) p ~ dunif(0,1) q ~ dunif(0,1)
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) gamma12 ~ dunif(0,1) gamma21 ~ dunif(0,1) Gamma[1,1] <- 1-gamma12 Gamma[1,2] <- gamma12 Gamma[2,1] <- gamma21 Gamma[2,2] <- 1-gamm21
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) p ~ dunif(0,1) q ~ dunif(0,1) Gamma[1,1] <- 1-p Gamma[1,2] <- p Gamma[2,1] <- q Gamma[2,2] <- 1-q for(i in 2:n){x[i] ~ dcat(Gamma[x[i-1],])} }
model{ for(i in 1:2){initp[i] <- 0.5} x[1] ~ dcat(initp) p ~ dunif(0,1) q ~ dunif(0,1) Gamma[1,1] <- 1-p Gamma[1,2] <- p Gamma[2,1] <- q Gamma[2,2] <- 1-q for(i in 2:n){x[i] ~ dcat(Gamma[x[i-1],])} resid_tau~dgamma(0.001,0.001) for(i in 1:2) {u[i] ~ dnorm(0,1)} for(i in 1:n){y[i] ~ dnorm(u[x[i]],resid_tau)} }
To run the model, save it in a file, e.g. ‘model.txt’
Create data list
n = nrow(calcofi) y = log(calcofi$m) jags_data = list("n","y")
jags_params = c("x","u","Gamma","resid_tau")
jagsfit <- jags(data=jags_data, parameters.to.save=jags_params, model.file="model.txt")
Next steps: thus far, we’ve only considered HMMs that are AR(1)
Homework: try to modify the example above for the CalCOFI data to be AR(2)
[hint: what are the dimensions of \(\Gamma\)?]