For Part 1 of the homework, we’ll use data from the Pacific Decadal Oscillation (PDO) to ask questions about identifying regimes. This dataset can be accessed via the rpdo
package.
First, let’s grab the data.
library(dplyr)
#install.packages("rsoi")
pdo <- rsoi::download_pdo()
We will look at the winter PDO only. We need to shift the year for Oct-Dec by +1 since Oct-Feb spans 2 calendar years. Then we need to remove the first year since that would only have Jan-Feb.
pdo$Year[which(pdo$Month%in%c("Oct","Nov","Dec"))] <- pdo$Year[which(pdo$Month%in%c("Oct","Nov","Dec"))] + 1
pdo <- dplyr::group_by(pdo, Year) %>%
dplyr::summarize(winter_pdo = mean(PDO[which(Month %in% c("Oct","Nov","Dec","Jan","Feb"))])) %>%
dplyr::select(winter_pdo, Year)
# The first year will be missing Oct-Dec
pdo <- pdo[-1,]
Use the depmixS4
package discussed in the HMM lecture to answer the questions in in Part 1.
library("depmixS4")
mod <- depmix(response = winter_pdo ~ 1, data = pdo, nstates = 2, trstart = runif(4))
fitmod <- fit(mod)
## converged at iteration 25 with logLik: -142.8743
prstates <- apply(posterior(fitmod)[,c("S1","S2")],
1, which.max)
plot(prstates, type="b", xlab="Time", ylab="State")
logLik()
gets you the log-likelihood from model fits in R).ll <- rep(NA,20)
for(i in 1:20){
mod <- depmix(response = winter_pdo ~ 1, data = pdo, nstates = 2)
fitmod <- fit(mod)
ll[i] <- logLik(fitmod)
}
Looks good.
hist(ll, main="logLik")
summary(fitmod, which="transition")
## Transition matrix
## toS1 toS2
## fromS1 0.848 0.152
## fromS2 0.116 0.884
Make a matrix.
GAM <- matrix(getpars(fitmod)[3:6],2,2, byrow=TRUE)
GAM
## [,1] [,2]
## [1,] 0.8483073 0.1516927
## [2,] 0.1163817 0.8836183
Persistence probabilites are 0.8483073 for staying in state 1 andd 0.8836183 for staying state 2.
\[ P[x_{t+1}=j|x_t=i] = \gamma_{i,j} \]
The \(\gamma_{i,j}\) matrix is \(\Gamma\), where the rows are from so \(i\) and the columns are to so \(j\). So looks just like the matrix in Question 3. \[ \Gamma = \begin{bmatrix}\gamma_{1,1}&\gamma_{1,2}\\ \gamma_{2,1}&\gamma_{2,2}\end{bmatrix} = \begin{bmatrix}0.848&0.152\\0.116&0.884\end{bmatrix} \]
prstates = apply(posterior(fitmod)[,c("S1","S2")],
1, which.max)
plot(prstates, type="b", xlab="Time", ylab="State")
plot(pdo$Year, posterior(fitmod)[,c("S1")], type="l")
lines(pdo$Year, posterior(fitmod)[,c("S2")], col="red", lty=2)
for
loop to find it. Let \(p_1\) be the probability that PDO is in state 1 and \(p_2\) be the probability that PDO is in state 2. Note \(p_1 + p_2 = 1\). If \(P\) is the transition matrix (in Q3),\[\begin{bmatrix}p_1&p_2\end{bmatrix}_n = \begin{bmatrix}p_1&p_2\end{bmatrix}_{n-1} P\] Note this is a 1x2 matrix times a 2x2 matrix on the right. Start with \(p_1=1\) and \(p_2=0\), say. Run a for
loop until
\[\begin{bmatrix}p_1&p_2\end{bmatrix}_n \approx \begin{bmatrix}p_1&p_2\end{bmatrix}_{n-1}\]
x.stationary <- matrix(c(0,1), 1, 2)
for(i in 1:1000) x.stationary <- x.stationary %*% GAM
x.stationary
## [,1] [,2]
## [1,] 0.4341395 0.5658605
So about 43.4% of the time, the PDO will be in state 1.
To get the analytical solution, find the stationary distribution of a Markov chain. Watch this youtube video for example. You solve this linear equation: \[
\begin{bmatrix}p_1\\p_2\end{bmatrix} = P^\top \begin{bmatrix}p_1\\p_2\end{bmatrix}
\] with the constraint that \(p_1 + p_2 = 1\). One of the constraints from our transition equation is extra, so we remove that. That’s the -1
in the code.
b <- matrix(c(0,1), ncol=1)
A <- rbind(t(GAM)-diag(1,2), 1)
solve(A[-1,], b)
## [,1]
## [1,] 0.4341395
## [2,] 0.5658605
2-state is best via AIC.
df <- data.frame(model=paste("State", 1:4), AIC=NA)
for(i in 1:4){
mod <- depmix(response = winter_pdo ~ 1, data = pdo, nstates = i)
fit <- fit(mod)
df$AIC[i] <- AIC(fit)
}
knitr::kable(df)
model | AIC |
---|---|
State 1 | 315.9630 |
State 2 | 299.7486 |
State 3 | 304.7102 |
State 4 | 317.3225 |
Looks quite stable.
pars <- c()
for(i in 1:20){
mod <- depmix(response = winter_pdo ~ 1, data = pdo, nstates = 2)
fitmod <- fit(mod)
pars <- rbind(pars, data.frame(name=c("p11", "p12", "p21", "p22"), value=getpars(fitmod)[3:6]))
}
library(ggplot2)
ggplot(pars, aes(x=value)) + geom_histogram() + xlim(c(0,1)) + facet_wrap(~name)
mod <- depmix(response = winter_pdo ~ 1, data = pdo, nstates = 2, trstart = runif(4))
fitmod <- fit(mod)
Hmm, depmixS4
does not appear to return the pseudo-residuals for HMMs.
Let’s include year in the response model.
library("depmixS4")
mod <- depmix(
response = winter_pdo ~ 1 + Year,
data = pdo, nstates = 2, trstart = runif(4))
fitmod <- fit(mod)
## converged at iteration 53 with logLik: -138.3141
This model has higher log likelihood and lower AIC.
logLik(fitmod)
## 'log Lik.' -138.3141 (df=9)
AIC(fitmod)
## [1] 294.6282
Let’s include year in the transition model. This one is not better.
library("depmixS4")
mod <- depmix(
response = winter_pdo ~ 1,
transition = ~ 1 + Year,
data = pdo, nstates = 2,
pstart = runif(4))
fitmod <- fit(mod)
## converged at iteration 72 with logLik: -141.0019
This model has higher log likelihood and lower AIC.
logLik(fitmod)
## 'log Lik.' -141.0019 (df=9)
AIC(fitmod)
## [1] 300.0038