Hidden Markov Models

Author

Dylan Hubl, Eric French, Zoe Rand

Published

May 9, 2023

Make sure to label your final write-up with “final” in the title so we know which one is the final one.

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ purrr::%||%()   masks base::%||%()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(depmixS4)
Loading required package: nnet
Loading required package: MASS

Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select

Loading required package: Rsolnp
Loading required package: nlme

Attaching package: 'nlme'

The following object is masked from 'package:dplyr':

    collapse
Code
library(knitr)

Data

As part of the California Current Integrated Ecosystem Report, NOAA scientists do annual updates of stoplight charts for ecosystem indicators.

We have included the stoplight.csv dataset for this week. One of the columns divides indicators into groups (e.g. Local Physical, Local Biological, etc). Please pick a type of indicators, and develop a 2- or 3-state multivariate HMM. A few tips:

Describe what plankton groups and covariates you used and indicate any temporal subsetting you chose. Also describe any standardization, centering or scaling that was used.

Load the data

One of the columns divides indicators into groups (e.g. Local Physical, Local Biological, etc).

Code
stoplight <- read.csv(here::here("Lab-4", "stoplight.csv"))

Wrangle the data

Code
## add some code here
# we will look at Copepod_richness, N_copepod, S_copepod

df <- rbind.data.frame(stoplight[stoplight$Ecosystem.Indicators == "Copepod_richness",],stoplight[stoplight$Ecosystem.Indicators == "N_copepod",],stoplight[stoplight$Ecosystem.Indicators == "S_copepod",])
Code
newdf<-df%>%pivot_longer(-c(Ecosystem.Indicators, Type), names_to = "Year") %>%
  mutate(Year = as.numeric(gsub("X","",Year)))

ggplot(newdf) + 
  geom_line(aes(x = Year, y = value, color = Ecosystem.Indicators)) + 
  theme_classic()

We see that Copepod Richness and the Southern Copepod biomass measures are in phase with one another across the plot.

Code
#arranging data for model fits
df2<-df %>% dplyr::select(-Type) %>%
  gather(Year, value, -Ecosystem.Indicators) %>% 
  spread(Ecosystem.Indicators, value) 

df2$Year<-as.numeric(gsub("X","",df2$Year))
head(df2)
  Year Copepod_richness N_copepod S_copepod
1 1998             4.54     -0.76      0.60
2 1999            -2.58      0.04     -0.23
3 2000            -3.41      0.15     -0.21
4 2001            -1.03      0.15     -0.22
5 2002            -1.12      0.29     -0.23
6 2003             1.99     -0.13      0.09

2-State HMM

Code
set.seed(123)
#set up 2-state model
mod<-depmix(list(Copepod_richness ~1, N_copepod ~1, S_copepod ~1), 
           nstates = 2, 
           #assume gaussian errors for all
           family = list(gaussian(), gaussian(), gaussian()), 
           data = df2)
#fit the model
fitmod<-fit(mod)
converged at iteration 6 with logLik: -36.64126 
Code
#model results
summary(fitmod)
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition matrix 
        toS1  toS2
fromS1 0.783 0.217
fromS2 0.445 0.555

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1          -1.333  1.066           0.281  0.142          -0.146  0.102
St2           4.037  2.223          -0.368  0.393           0.325  0.187

The model results change slightly depending on the seed, so making sure it found the best model:

Code
iter <-100 #number of times to run the model
seeds<-sample(100:1000, size = iter, replace = F) #get 100 seed values
best <- 1e10 #this is large so first AIC is definitely smaller
best_model <- NA
for(i in 1:iter){
  #set the seed from the ones we picked (makes sure its new each time)
  set.seed(seeds[i]) 
  #fit the model
  fitmod <- fit(mod)
  #is this model better than the last?
  if(AIC(fitmod)< best){
    best_model <- fitmod
    best <- AIC(fitmod)
  }
}
converged at iteration 7 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 9 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 9 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 11 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 4 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 11 with logLik: -36.64126 
converged at iteration 9 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 9 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 9 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 4 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 5 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 8 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
converged at iteration 6 with logLik: -36.64126 
converged at iteration 7 with logLik: -36.64126 
Code
summary(best_model)
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.555 0.445
fromS2 0.217 0.783

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           4.037  2.223          -0.368  0.393           0.325  0.187
St2          -1.333  1.066           0.281  0.142          -0.146  0.102

The transition probabilities for the best model are in the table below:

Code
FromS1 <- c(0.555, 0.445)
FromS2 <- c(0.217, 0.783)
tab <- rbind.data.frame(FromS1,FromS2)
colnames(tab) <- c("To S1", "To S2")
row.names(tab) <- c("From S1", "From S2")
tab
        To S1 To S2
From S1 0.555 0.445
From S2 0.217 0.783

Plot of states and data from the 2-State HMM

Code
prstates<-apply(posterior(fitmod)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model)[,1]
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.555 0.445
fromS2 0.217 0.783

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           4.037  2.223          -0.368  0.393           0.325  0.187
St2          -1.333  1.066           0.281  0.142          -0.146  0.102
Code
mu2<-summary(best_model)[,3]
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.555 0.445
fromS2 0.217 0.783

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           4.037  2.223          -0.368  0.393           0.325  0.187
St2          -1.333  1.066           0.281  0.142          -0.146  0.102
Code
mu3<-summary(best_model)[,5]
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.555 0.445
fromS2 0.217 0.783

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           4.037  2.223          -0.368  0.393           0.325  0.187
St2          -1.333  1.066           0.281  0.142          -0.146  0.102
Code
pred <- tibble("Year" = df2$Year, "Copepod_richness" = mu[prstates], "N_copepod" = mu2[prstates], "S_copepod" = mu3[prstates])
pred2<-pred %>% pivot_longer(-Year, names_to = "Ecosystem.Indicators", values_to = "Fit")
ggplot() + geom_point(data = newdf, aes(x = Year, y = value, color = Ecosystem.Indicators)) + 
  geom_line(data = pred2, aes(x = Year, y = Fit, group = Ecosystem.Indicators, color = Ecosystem.Indicators)) + theme_classic()

Our best model does converge for the two state model.

We see that the predicted fit lines follow the general trend of the data. Southern Copepod Biomass is in phase with the Copepod Richness measure. The observations for Northern and Southern Copepod Biomass have much less variance than the Copepod Richness measurement and thus the fitted lines are much closer to the observed points.

3-State HMM

Code
set.seed(123)
#set up 3-state model
mod3<-depmix(list(Copepod_richness ~1, N_copepod ~1, S_copepod ~1), 
           nstates = 3, 
           #assume gaussian errors for all
           family = list(gaussian(), gaussian(), gaussian()), 
           data = df2)
#fit the model
fitmod3<-fit(mod3)
converged at iteration 13 with logLik: -14.21542 
Code
#model results
summary(fitmod3)
Initial state probabilities model 
pr1 pr2 pr3 
  0   1   0 

Transition matrix 
        toS1 toS2  toS3
fromS1 0.700 0.00 0.300
fromS2 0.200 0.40 0.400
fromS3 0.375 0.25 0.375

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1          -1.785  0.850           0.290  0.155          -0.201  0.030
St2           5.458  2.010          -0.682  0.220           0.474  0.103
St3           1.112  1.302           0.136  0.155           0.075  0.095

Again, the model results change slightly depending on the seed, so making sure it found the best model:

Code
iter <-100 #number of times to run the model
seeds<-sample(100:1000, size = iter, replace = F) #get 100 seed values
best <- 1e10 #this is large so first AIC is definitely smaller
best_model <- NA
for(i in 1:iter){
  #set the seed from the ones we picked (makes sure its new each time)
  set.seed(seeds[i]) 
  #fit the model
  fitmod3 <- fit(mod3)
  #is this model better than the last?
  if(AIC(fitmod3)< best){
    best_model <- fitmod3
    best <- AIC(fitmod3)
  }
}
converged at iteration 10 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 9 with logLik: -21.02429 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 8 with logLik: -21.02429 
converged at iteration 7 with logLik: -14.21542 
converged at iteration 15 with logLik: -29.00928 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 6 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 9 with logLik: -22.47665 
converged at iteration 10 with logLik: -22.47665 
converged at iteration 13 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 6 with logLik: -14.21542 
converged at iteration 13 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 14 with logLik: -21.02429 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 9 with logLik: -21.02429 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 8 with logLik: -21.02429 
converged at iteration 10 with logLik: -21.02429 
converged at iteration 14 with logLik: -28.13896 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 15 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 5 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 7 with logLik: -21.02429 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 13 with logLik: -14.21542 
converged at iteration 9 with logLik: -21.02429 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 13 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 9 with logLik: -21.02429 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 9 with logLik: -21.02429 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 13 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 8 with logLik: -21.02429 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 8 with logLik: -21.02429 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 10 with logLik: -21.02429 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 32 with logLik: -32.28304 
converged at iteration 15 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 15 with logLik: -24.71321 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 11 with logLik: -14.21542 
converged at iteration 9 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 12 with logLik: -14.21542 
converged at iteration 10 with logLik: -14.21542 
converged at iteration 8 with logLik: -14.21542 
converged at iteration 11 with logLik: -22.47665 
Code
summary(best_model)
Initial state probabilities model 
pr1 pr2 pr3 
  1   0   0 

Transition matrix 
       toS1  toS2  toS3
fromS1 0.40 0.200 0.400
fromS2 0.00 0.700 0.300
fromS3 0.25 0.375 0.375

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           5.458  2.010          -0.682  0.220           0.474  0.103
St2          -1.785  0.850           0.290  0.155          -0.201  0.030
St3           1.112  1.302           0.136  0.155           0.075  0.095

The transition probabilities for the 3-State model are in the table below

Code
FromS1 <- c(0.400, 0.200, 0.400)
FromS2 <- c(0.000, 0.700, 0.300)
FromS3 <- c(0.250, 0.375, 0.375)
tab2 <- rbind.data.frame(FromS1,FromS2,FromS3)
colnames(tab2) <- c("To S1", "To S2", "To S3")
row.names(tab2) <- c("From S1", "From S2", "From S3")
tab2
        To S1 To S2 To S3
From S1  0.40 0.200 0.400
From S2  0.00 0.700 0.300
From S3  0.25 0.375 0.375

Plot of states and data from the 3-State HMM

Code
prstates<-apply(posterior(fitmod3)[,c("S1", "S2","S3")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model)[,1]
Initial state probabilities model 
pr1 pr2 pr3 
  1   0   0 

Transition matrix 
       toS1  toS2  toS3
fromS1 0.40 0.200 0.400
fromS2 0.00 0.700 0.300
fromS3 0.25 0.375 0.375

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           5.458  2.010          -0.682  0.220           0.474  0.103
St2          -1.785  0.850           0.290  0.155          -0.201  0.030
St3           1.112  1.302           0.136  0.155           0.075  0.095
Code
mu2<-summary(best_model)[,3]
Initial state probabilities model 
pr1 pr2 pr3 
  1   0   0 

Transition matrix 
       toS1  toS2  toS3
fromS1 0.40 0.200 0.400
fromS2 0.00 0.700 0.300
fromS3 0.25 0.375 0.375

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           5.458  2.010          -0.682  0.220           0.474  0.103
St2          -1.785  0.850           0.290  0.155          -0.201  0.030
St3           1.112  1.302           0.136  0.155           0.075  0.095
Code
mu3<-summary(best_model)[,5]
Initial state probabilities model 
pr1 pr2 pr3 
  1   0   0 

Transition matrix 
       toS1  toS2  toS3
fromS1 0.40 0.200 0.400
fromS2 0.00 0.700 0.300
fromS3 0.25 0.375 0.375

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
Resp 3 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd Re3.(Intercept) Re3.sd
St1           5.458  2.010          -0.682  0.220           0.474  0.103
St2          -1.785  0.850           0.290  0.155          -0.201  0.030
St3           1.112  1.302           0.136  0.155           0.075  0.095
Code
pred <- tibble("Year" = df2$Year, "Copepod_richness" = mu[prstates], "N_copepod" = mu2[prstates], "S_copepod" = mu3[prstates])
pred2<-pred %>% pivot_longer(-Year, names_to = "Ecosystem.Indicators", values_to = "Fit")
ggplot() + geom_point(data = newdf, aes(x = Year, y = value, color = Ecosystem.Indicators)) + 
  geom_line(data = pred2, aes(x = Year, y = Fit, group = Ecosystem.Indicators, color = Ecosystem.Indicators)) + theme_classic()

Code
# The 3-State HMM seems to fit the data worse than the 2-State HMM

The best model does converge for the 3 state model.

The predicted values do not appear to fit the data well in the three state model. The Northern and Southern Copepod predicted values seem to switch states away from their respective observations in a few places. And the Copepod Richness predicted values do not drop down to the lowest state in a few sections where it seems like it would be appropriate to do so.

One thing to note is that the transition probabilities state that there is a 0.00 chance of switching from state 2 to state 1. State 2 being the lowest level for each of the fitted lines and State 1 being the highest. We do see that the fitted values do make this transition so this is apparently just a very small number that is left at 0.00 due to rounding in the table.

Results

Code
`3-State` <- AIC(fitmod3)
`2-State` <- AIC(fitmod)
d <- c(`3-State`,`2-State`)-min(`3-State`,`2-State`)

e <- cbind.data.frame(rbind(`3-State`,`2-State`),d)
colnames(e) <- c("AIC","delta AIC")
row.names(e) <- c("3-State","2-State")
e
             AIC delta AIC
3-State  96.9533  0.000000
2-State 103.2825  6.329209

The three state model has the lower AIC value despite appearing to not fit the data as well.

Discussion

When looking at the three Copepod measurements, both of our models converge. We did not use any covariates in order to keep our models simple. The AIC score indicates that the 3-state model is better supported than the 2-state. However, visually it appears that the two state model fits the data better. Additionally, when thinking of a biological reason for selecting a 2 vs 3 state model, it seems to make more sense to go with a two state model as we see on the NOAA website that copepods richness gradually increases as the summer starts and then decreases at the end of the summer. There is a very brief plateau in May but the 3 state model doesn’t seem to follow that general pattern, instead it seems to make large random jumps. At one point Copepod Richness drops to the lowest state but there is no observed data there. In contrast, the 2 state model seems to follow the general pattern of the observed data quite well.

We see that Southern Copepod Biomass is in phase with the Copepod Richness measurement in both models. This is because there are more species representing the Southern Copepod group than the Northern so when the Southern Copepod biomass is high the total richness is higher as well due to the increased number of species present. It also makes sense that the Northern and Southern groups are out of phase as two different water currents which are opposed to one another carry the copepods from each group.

All three sets of observations are tied to the same transition probabilities and therefore the model must balance each set of observation data to estimate the probabilities. At each data point the model gives decides based on the probabilities whether the fitted line should change states. Sometimes the model makes a switch that does not seem appropriate based on those probabilities. This is why we see that some of the transition switches in the 3 state model don’t seem to make sense.

Team contributions

Dylan and Zoe did the data wrangling. Zoe coded the 2 state model. Eric adapted this code for the 3 state model. Dylan created tables, results, and led discussion section for group. Entire group contributed thoughts and comments to the discussion section.