Gentle introduction to HMMs
Theory and notation
Examples of univariate HMMs in R
Examples of multivariate HMMs in R
14 Feb 2019
Gentle introduction to HMMs
Theory and notation
Examples of univariate HMMs in R
Examples of multivariate HMMs in R
For additional background, see
Zucchini et al. (2008, 2016) “Hidden Markov Models for Time Series: An Introduction Using R”
Jackson (2011) “Multi-State Models for Panel Data: The msm Package for R”
Visser and Speekenbrink (2010) “depmixS4: An R Package for Hidden Markov Models”
We’ve already discussed state space models. These models include
a latent process model (we don’t directly observe)
a data or observation model (we’ve generally assumed this to be normal or multivariate normal)
Process model: \[{ x }_{ t }={ x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]
Observation model: \[{ y }_{ t }={ x }_{ t }+{ \delta }_{ t }\]
where \({ \varepsilon }_{ t } \sim Normal\left( 0,{ \sigma }_{ \varepsilon } \right)\) and \({ \delta }_{ t } \sim Normal\left( 0,{ \sigma }_{ \delta } \right)\)
Adding AR coefficients can make these models stationary with respect to the mean,
\[{ x }_{ t }={ p\cdot x }_{ t-1 }+{ \varepsilon }_{ t-1 }\]
however they may not be able to explain some datasets very well.
Many examples of time series in ecology & fisheries may alternate between multiple states (2+)
Vert-pre et al. (2013) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3562848/
Francis et al. (2012) https://onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2486.2012.02702.x
Lots of non-HMM approaches for detecting regimes
They can actually fit the data from a regime model quite well.
What’s lacking is inference about:
Lots of applications: speech recognition, bioinformatics, animal movement, environmental data (rain), finance
Markov Process
Entire history can be written as \[\left\{ { x }_{ 1 },{ x }_{ 2 },{ x }_{ 3 },...,{ x }_{ T } \right\}\]
A key piece of the Markov process are the transition probabilities.
And then these probabilities can be summarized in a transition matrix, \[\Gamma =\left[ \begin{matrix} { \gamma }_{ 11 } & { \gamma }_{ 12 } & { \gamma }_{ 13 } \\ { \gamma }_{ 21 } & { \gamma }_{ 22 } & { \gamma }_{ 23 } \\ { \gamma }_{ 31 } & { \gamma }_{ 32 } & { \gamma }_{ 33 } \end{matrix} \right]\]
Matrix \(\Gamma\) is the 1-step transitions. However k-step transition probabilities can be generated,
\(\Gamma(k) = \Gamma^{k}\)
There are two general flavours of transition matrices:
Covariates may enter HMMs in one of two ways
Effects on the mean
Effects on the transition matrix \(\Gamma\)
Observations: observable data \({Y}_{ i=1,...,N }\)
One quantity of interest from HMMs is the marginal distribution, \(P(Y_{t})\)
Output from the model will give us the conditional distribution, \(P(Y_{t}|x_{t}=i)\). But we need to marginalize over all possible states
Solution: \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ P({ x }_{ t }=i)P({ Y }_{ t }|{ x }_{ t }=i) }\] which can also be expressed as \[P(Y_{ t })=\sum _{ i=1 }^{ N }{ { \delta }_{ i }P({ Y }_{ t }|{ x }_{ t }=i) }\] where \(\delta\) represents the stationary distribution (Zucchini et al. 2006)
Estimation of HMM parameters can be quite complicated
Dealing with joint likelihood of observed data and unobserved states
\[P(x,Y)=\prod _{ i=1 }^{ T }{ P({ x }_{ t }|{ x }_{ t-1 })P({ Y }_{ t }|{ x }_{ t }) }\]
Estimation most commonly done with maximum likelihood
\[P\left( { x }_{ t }|{ Y }_{ 1:T } \right) =P({ x }_{ t }|{ Y }_{ 1:t },{ Y }_{ t+1:T })=P({ x }_{ t }|{ Y }_{ 1:t })P({ Y }_{ t+1:T }|{ x }_{ t })\] * Probability of state given the data is the probability of the state given the data up to that time step multiplied by the probability of future data
Forward-backward algorithm has 3 steps (sometimes 2/3 combined):
As a first example, let’s use an example of rainfall data in Seattle from the end of 2018 and beginning of 2019 (accessed from wunderground.com)
rain = read.csv("seattle_rain.csv") ggplot(rain, aes(Time, Rainfall)) + geom_line() + xlab("Days since Nov 1") + ylab("Precip (in)")
We could model rainfall directly, but for starters let’s just model whether or not it rained on a given day (0/1).
rain$rained = ifelse(rain$Rainfall > 0, 1, 0) ggplot(rain, aes(Time, rained)) + geom_line() + xlab("Days since Nov 1") + ylab("Rained?")
Questions that we might be interested in:
We don’t really need HMMs to address these questions
Transition probabilities can be just calculated directly,
\[\frac { \# consecutive \quad rainy \quad days }{ \# rainy \quad days }\]
For example, we can create a differenced variable to indicate no change (0), and compute
rain$diff = c(diff(rain$rained), NA) p_rain_rain = length(which(rain$diff == 0 & rain$rained==1)) / length(which(rain$rained==1))
Ok, now let’s assume that there’s some minor observation or measurement error in the rain gauge data. For this case, it’s best to use a HMM.
Let’s start with using the R package depmixS4
. There’s a good vignette on the package that you can refer back to,
https://cran.r-project.org/web/packages/depmixS4/vignettes/depmixS4.pdf
There’s two functions we’ll use to set up the HMM with depmixS4
.
First we’ll structure the model using the same formula notation you’re hopefully familiar with,
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
Stepping through this rained
is our response (yes / no)
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
nstates
is the number of alternative states (> 2), specified a priori
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
transition
is a formula to specify whether any covariates are to be included in the transition probabilities. The default is no covariates, and that these transitions aren’t time varying.
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
family
is family or list of families (a list for multiple response variables) of the families associated with each response. The majority of common families are supported
mod = depmix(rained ~ 1, nstates = 2, transition = ~ 1, family = binomial(), data=rain)
For a complete list, see
?depmixS4::GLMresponse
Ok, now that we’ve set up the model, we can do the estimation
set.seed(123) fitmod = fit(mod)
and look at the output
summary(fitmod)
## Initial state probabilties model ## pr1 pr2 ## 1 0 ## ## Transition matrix ## toS1 toS2 ## fromS1 0.773 0.227 ## fromS2 0.410 0.590 ## ## Response parameters ## Resp 1 : binomial ## Re1.(Intercept) ## St1 2.087 ## St2 -29.863
As a warning, note that the results from the estimation are a bit sensitive to the initial seed. Look at how much the transition probabilties change,
set.seed(121) fitmod = fit(mod)
summary(fitmod)
## Initial state probabilties model ## pr1 pr2 ## 0 1 ## ## Transition matrix ## toS1 toS2 ## fromS1 0.590 0.410 ## fromS2 0.227 0.773 ## ## Response parameters ## Resp 1 : binomial ## Re1.(Intercept) ## St1 -29.807 ## St2 2.087
There’s a couple practical ways to try to overcome this seed issue.
Unfortunately for this example, this doesn’t solve the issue
Pseudocode:
best = 1.0e10 best_model = NA for(i in 1:iter) { # fit the model fitmod = fit(mod) # check to see if this is the best solution? if(AIC(fitmod) < best) { best_model = fitmod best = AIC(fitmod) } }
Let’s move on to a more complex example.
We’ll pull some data from the CalCOFI ichthyoplankton cruises in Southern California. Some species have been used to indicate cool versus warm regimes.
For this example, we’ll focus on the California smoothtongue (Leuroglossus stilbius)
Caveat: the survey is spatially gridded, and you’d want to perform index standardization or use spatial models to generate indices. For simplicty, we’re just taking the mean abundance across stations in April-May.
We’ll start with fitting a 2-state model with depmix
. Assumptions:
set.seed(123) calcofi$ln = log(calcofi$m) mod = depmix(ln ~ 1, nstates = 2, data=calcofi) fitmod = fit(mod)
First let’s look at how to get predictions out of a depmix.fitted
object
We’ll start with the state probabilities. Remember we could work with either
viterbi
functionposterior
functionThe most probable states are
prstates = apply(posterior(fitmod)[,c("S1","S2")], 1, which.max)
plot(prstates, type="b", xlab="Time", ylab="State")
depmixS4
doesn’t have a predict()
or forecast()
function, but creating estimated data is pretty straightforward. We can get the means out with
mu = summary(fitmod)[,1]
pred = data.frame("year"=seq(min(calcofi$year), max(calcofi$year)), "fit" = mu[prstates])
Some diagnostics (like autocorrelation) may not look great. What if we compared the previous model to one with 3 states? AIC from the 2-state model was
## [1] 118.994
set.seed(123) calcofi$ln = log(calcofi$m) mod = depmix(ln ~ 1, nstates = 3, data=calcofi) fitmod = fit(mod)
This seems like increasing the states doesn’t result in lower AIC
## [1] 122.5181
If the univariate examples make sense, it’s not that different to extend these models for multiple time series.
In this setting, the assumption is that the latent state process is the same for all time series, though the time series may differ
In depmix
, these arguments generally become lists
We’ll extend our CalCOFI example to include 2 species now
Fitting a 2-state model with 2 species as responses. First we have to reshape the data,
calcofi$ln = log(calcofi$m) # ln transform calcofi <- dcast(melt(calcofi[,c("year","ln","species")], id.vars = c("year", "species")), year ~ species) names(calcofi)[2:3] = c("L_stilbius","L_ochotensis") head(calcofi)
## year L_stilbius L_ochotensis ## 1 1980 0.85449030 -0.8414039 ## 2 1981 3.13964169 1.6941728 ## 3 1982 3.70359056 3.5435378 ## 4 1983 -0.03344793 0.4947281 ## 5 1984 3.00082695 2.4346651 ## 6 1985 0.24911445 -0.8471716
set.seed(123) mod = depmix(list(L_stilbius ~ 1, L_ochotensis~1), nstates = 2, family = list(gaussian(),gaussian()), data=calcofi) fitmod = fit(mod)
We could also have situations where the time series are of different length.
For example, if L_ochotensis time series was missing first half of values
The argument ntimes
is a list that becomes particularly valuable here - representing the length of each time series
HMMs are a useful approach at identifying latent state vectors that undergo discrete transitions
Estimation of HMMs for time series in R generally done with ML methods
Bayesian estimation generally beyond the scope of this class