Hidden Markov Models

Author

Nick Chambers, Madison Shipley, Karl Veggerby

Published

May 9, 2023

Data set

In this lab, we’re looking at ecological indicators and exploring hidden Markov models and probabilities of shifting between underlying states.

Load the data

We explored the stoplight.csv dataset for this week. One of the columns divides indicators into groups (e.g. Climate, Local Physical, Local Biological, etc) and we wanted to look at some biological indicators and climate/physical indicators.

Code
stoplight <- read.csv(here::here("Lab-4", "stoplight.csv"))
head(stoplight)
  Ecosystem.Indicators           Type X1998 X1999 X2000 X2001 X2002 X2003 X2004
1         PDO_DecMarch        Climate  5.07 -1.75 -4.17  1.86 -1.73  7.45  1.85
2         PDO_MaySept)        Climate -0.37 -5.13 -3.58 -4.22 -0.26  3.42  2.96
3                  ONI        Climate  1.12 -1.07 -1.07 -0.40  0.18  0.27  0.20
4                  SST Local Physical 13.77 13.20 13.33 12.98 13.05 13.56 14.68
5     Upper_20m_NovMar Local Physical 12.30 10.31 10.12 10.22 10.08 10.70 10.86
6    Upper_20m_MaySept Local Physical 10.35 10.07 10.16  9.76  8.99  9.62 11.31
  X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016 X2017
1  2.44  1.94 -0.17 -3.06 -5.41  2.17 -3.65 -5.07 -1.67  1.24  9.26  6.69  3.38
2  3.48  0.28  0.91 -7.63 -1.11 -3.53 -6.45 -7.79 -3.47  5.07  8.08  6.60  2.18
3  0.47 -0.30  0.08 -0.98 -0.23  0.55 -0.72 -0.43 -0.30 -0.28  0.67  1.22  0.13
4 13.58 12.73 13.64 12.45 13.44 12.76 13.23 13.34 13.72 14.03 13.89 13.83 13.59
5 10.61 10.59 10.04  9.33 10.19 11.01 10.02  9.62 10.12  9.62 12.73 12.03 11.47
6 10.73  9.97 10.07  9.30  9.90 10.42  9.95  9.92 10.43 10.95 10.69 10.32 10.14
  X2018 X2019 X2020 X2021
1  1.52  2.01 -0.76 -2.77
2  0.45  3.94 -1.35 -6.64
3 -0.45  0.72  0.28 -0.72
4 13.71 14.68 13.40 12.95
5 10.63 10.99  9.37 10.12
6 10.34 11.31 10.87  9.85

Wrangle the data

We initially selected the Pacific Decadal Osculation (PDO) for the Dec/March period, sea surface temperature (SST), and copepod richness and ichthy communitiy index, which we assumed was comperable to copepod richness (though maybe not).

Code
#ecosystem indicators
stoplight[,1]  
 [1] "PDO_DecMarch"                                          
 [2] "PDO_MaySept)"                                          
 [3] "ONI"                                                   
 [4] "SST"                                                   
 [5] "Upper_20m_NovMar"                                      
 [6] "Upper_20m_MaySept"                                     
 [7] "Deep_temperature"                                      
 [8] "Deep_salinity"                                         
 [9] "Copepod_richness"                                      
[10] "N_copepod"                                             
[11] "S_copepod"                                             
[12] "Biological_transition"                                 
[13] "Nearshore_Ichthyoplankton"                             
[14] "Ichthy_community_index"                                
[15] "Chinook_salmon_juv"                                    
[16] "Coho_salmon_juv"                                       
[17] "Principal Component scores (PC1)"                      
[18] "Principal Component scores (PC2)"                      
[19] "Physical Spring Trans.\nUI based (day of year)"        
[20] "Physical Spring Trans. Hydrographic (day of year)"     
[21] "Upwelling Anomaly\n(Sum April-May)"                    
[22] "Length of Upwelling Season\nUI based (days)"           
[23] "Copepod Community Index\n(MDS axis 1 scores; May-Sept)"
Code
years<-seq(1998,2021,1)

#ecosystem indicators to pull
PDO_DecMarch<-unlist(as.vector(stoplight[1,3:26])) 
SST<-unlist(as.vector(stoplight[4,3:26])) 
Copepod_richness<-unlist(as.vector(stoplight[9,3:26])) 
Ichthy_community_index<-unlist(as.vector(stoplight[14,3:26]))

dat<-matrix(nrow=length(years), ncol=5)
dat[,1]<-years
dat[,2]<-PDO_DecMarch
dat[,3]<-SST
dat[,4]<-Copepod_richness
dat[,5]<-Ichthy_community_index
colnames(dat)<-c("years","PDO", "SST", "CopeRich", "FishInx")
rownames(dat)<-seq(1,24,1)

dat<-as.data.frame(dat)
head(dat)
  years   PDO   SST CopeRich FishInx
1  1998  5.07 13.77     4.54   -9.47
2  1999 -1.75 13.20    -2.58  -20.44
3  2000 -4.17 13.33    -3.41  -28.90
4  2001  1.86 12.98    -1.03  -14.48
5  2002 -1.73 13.05    -1.12  -13.98
6  2003  7.45 13.56     1.99   -2.29

Data Visualization

The first rule of science is to plot your data! We started with looking at all our data together.

Code
# Plotting all
ggplot(dat, aes(x = years)) +
  geom_line(aes(y = PDO, color = "PDO"), linewidth = 1) +
  geom_line(aes(y = SST, color = "SST"), linewidth = 1) +
  geom_line(aes(y = CopeRich, color = "CopeRich"), linewidth = 1) +
  geom_line(aes(y = FishInx, color = "FishInx"), linewidth = 1) +
  scale_color_manual(values = c(PDO = "blue", SST = "red", CopeRich = "green", FishInx = "purple")) +
  labs(x = "Year", y = "Value", color = "Variable") +
  theme_minimal()

Sea surface temperature stats pretty consistent, while the fish index had a significant range, possibly following PDO if you squint, but was not immediatly compelling. We narrowed it down and decided to focus on copepod richness and PDO, which seemed to be incredibly correlated, and plotted the time series and their relationship to each other. In a positive PDO index, there is a positive correlation with copepod index.

Code
# Plotting Coperich and env cov
ggplot(dat, aes(x = years)) +
  geom_line(aes(y = PDO, color = "PDO"), size = 1) +
  geom_line(aes(y = CopeRich, color = "CopeRich"), size = 1) +
  scale_color_manual(values = c(PDO = "blue", SST = "red", CopeRich = "green", FishInx = "purple")) +
  labs(x = "Year", y = "Value", color = "Variable") +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Code
#CopeRichness tracks very closely with PDO
#SST seems to have a negligible correlation 

ggplot(dat, aes(x = PDO, y = CopeRich, color = CopeRich < 0)) +
  geom_point() +
  geom_line(aes(x = PDO, y = CopeRich)) +
  geom_hline(yintercept = 0, linetype = "dotted") +
  geom_vline(xintercept = 0, linetype = "dotted") +
  scale_color_manual(values = c("red", "blue"), labels = c("Above 0", "Below 0")) +
  labs(x = "PDO", y = "CopeRich") +
  theme_minimal()

Code
#fish index is all over the place, let's not use it. 

We also wanted to think about our possible hidden states, and while there seems to be a clear two-state sytem with PDO in a negative or positive index, we wondered if there was a third state “weak” index.

Code
# PDO states 
ggplot(dat, aes(x = years)) +
  geom_line(aes(y = PDO, color = "PDO"), size = 1) +
  scale_color_manual(values = c(PDO = "blue", SST = "red", CopeRich = "green", FishInx = "purple")) +
  labs(x = "Year", y = "Value", color = "Variable") +
  geom_hline(yintercept = 0, linetype = "dotted") +
  geom_hline(yintercept = 2.5, linetype = "dotted") +
  geom_hline(yintercept = -2.5, linetype = "dotted") +
  theme_minimal()

Objective tips

Our primary objectives for this lab were to gain a better understanding of hmm options and getting some practice with interpretation by exploring different model options considering copepod richness and PDO. While there was much more we could have done, we decided to explore model specification, and interpretation of AIC for these types of models, which can be misleading depending on the ecological assumptions being made.

General tasks

  • Please pick a type of indicators, and develop a 2- or 3-state multivariate HMM.

Methods

We fit models with different either two or three states to determine whether copepod richness and/or PDO seems to have 2 or 3 states. It’s unclear if there is a ‘neutral’ state that modulates copepod richness in addition to the hot and cool phases.

We also explored the transition option, by assuming either that there is no covariate informing the transition, or that PDO informed the transition.

We assumed the familys were gaussian for all models tested.

We plotted the fit of all models to copepod richness and PDO data from the stoplight dataset. We then also compared model fits with AIC in an attempt to select the best model.

Note that in the report we give probabilities of transitioning to and from states, but these models are very sensitive to starting conditions, so we iterate over many set seeds for each model to find the best fit. That being said, some of the values written in this report may be different upon knitting.

Model 1 Copepods only: 2 states

We started with a very simple model, with copepod richness as the intercept and no covariates informing the transition. We started by assuming a two state system.

Code
# Model 1
mod1 = depmix(CopeRich ~1,
                   nstates = 2, 
                   transition = ~1, 
                   family = gaussian(),
                   data=dat)

fitmod1 = fit(mod1)
converged at iteration 16 with logLik: -54.68034 
Code
#Find best fit 
iter <-100
seeds<-sample(100:1000, size = iter, replace = F)
best1 <- 1e10
best_model1 <- NA
for(i in 1:iter){
  set.seed(seeds[i])
  fitmod1 <- fit(mod1)
  if(AIC(fitmod1)< best1){
    best_model1 <- fitmod1
    best1 <- AIC(fitmod1)
  }
}
converged at iteration 20 with logLik: -54.68034 
converged at iteration 21 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 16 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 31 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 21 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 26 with logLik: -54.68034 
converged at iteration 23 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 15 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 25 with logLik: -54.68034 
converged at iteration 15 with logLik: -54.68034 
converged at iteration 21 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 23 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 27 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 15 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 15 with logLik: -54.68034 
converged at iteration 21 with logLik: -54.68034 
converged at iteration 24 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 16 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 21 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 24 with logLik: -54.68034 
converged at iteration 14 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 16 with logLik: -54.68034 
converged at iteration 14 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 23 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 22 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 15 with logLik: -54.68034 
converged at iteration 23 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 25 with logLik: -54.68034 
converged at iteration 11 with logLik: -54.68034 
converged at iteration 21 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 15 with logLik: -54.68034 
converged at iteration 14 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 28 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 18 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 16 with logLik: -54.68034 
converged at iteration 17 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
converged at iteration 19 with logLik: -54.68034 
converged at iteration 20 with logLik: -54.68034 
Code
summary(best_model1)
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.598 0.402
fromS2 0.246 0.754

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.sd
St1           3.504  2.552
St2          -1.414  1.050

We see that there is a ~75% probability of state one staying in state one, and ~60% of state two staying in state two, with ~25% and 40% probability for transitioning from state 1 to state 2 and state 2 to state 1 respectively, showing that the system generally is more stable in state one.

Code
plot(ts(posterior(best_model1, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",
     main="Posterior Probability of Changing States",
     frame=FALSE)

Code
prstates <- apply(posterior(best_model1)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu <- summary(best_model1)[,1]
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.598 0.402
fromS2 0.246 0.754

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.sd
St1           3.504  2.552
St2          -1.414  1.050
Code
pred <- tibble("Year" = dat$years, "Copepod Richness" = mu[prstates])

#predicted values
pred1<-pred %>% 
  pivot_longer(-Year, names_to = "Copepod Richness", values_to = "Fit")

ggplot() +
  geom_line(data = pred1, aes(x = years, y = Fit, color = "predicted_states"), size = 1) +
  geom_line(data = dat, aes(x = years, y = CopeRich, color = "CopeRich"), size = 1) +
  geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color = "PDO" ), size = 1)+
  scale_color_manual(values = c(PDO = "blue", SST = "red", CopeRich = "green",
                                FishInx = "purple", predicted_states = "orange")) +
  labs(title = " Model 1 - Two States", x = "Year", y = "Value", color = "Variable") +
  theme_minimal()

Model 1 produces a reasonably good fit to the data.

Model 2 Copepods only: 3 states

We continued by using the same simple model from above, and assuming 3 states.

Code
# Model 2
mod2 = depmix(CopeRich ~1,
                   nstates = 3, 
                   transition = ~1, 
                   family = gaussian(),
                   data=dat)

fitmod2 = fit(mod2)
converged at iteration 46 with logLik: -52.30002 
Code
#Find best fit 
iter <-100
seeds<-sample(100:1000, size = iter, replace = F)
best2 <- 1e10
best_model2 <- NA
for(i in 1:iter){
  set.seed(seeds[i])
  fitmod2 <- fit(mod2)
  if(AIC(fitmod2)< best2){
    best_model2 <- fitmod2
    best2 <- AIC(fitmod2)
  }
}
converged at iteration 34 with logLik: -51.04047 
converged at iteration 85 with logLik: -53.46158 
converged at iteration 29 with logLik: -48.60721 
converged at iteration 38 with logLik: -52.30002 
converged at iteration 39 with logLik: -51.04047 
converged at iteration 36 with logLik: -52.30002 
converged at iteration 50 with logLik: -51.1154 
converged at iteration 26 with logLik: -48.60721 
converged at iteration 34 with logLik: -51.04047 
converged at iteration 34 with logLik: -51.04047 
converged at iteration 78 with logLik: -53.46158 
converged at iteration 73 with logLik: -53.46158 
converged at iteration 34 with logLik: -51.04047 
converged at iteration 39 with logLik: -52.30002 
converged at iteration 33 with logLik: -52.30002 
converged at iteration 53 with logLik: -51.1154 
converged at iteration 31 with logLik: -52.30002 
converged at iteration 30 with logLik: -52.30002 
converged at iteration 33 with logLik: -52.30002 
converged at iteration 50 with logLik: -52.30002 
converged at iteration 35 with logLik: -52.30002 
converged at iteration 37 with logLik: -51.04047 
converged at iteration 48 with logLik: -52.30002 
converged at iteration 37 with logLik: -51.04047 
converged at iteration 64 with logLik: -53.46158 
converged at iteration 43 with logLik: -51.04047 
converged at iteration 32 with logLik: -52.30002 
converged at iteration 81 with logLik: -53.46158 
converged at iteration 35 with logLik: -51.04047 
converged at iteration 63 with logLik: -53.46158 
converged at iteration 67 with logLik: -53.46158 
converged at iteration 51 with logLik: -52.30002 
converged at iteration 35 with logLik: -52.30002 
converged at iteration 36 with logLik: -51.04047 
converged at iteration 32 with logLik: -52.30002 
converged at iteration 39 with logLik: -52.30002 
converged at iteration 41 with logLik: -51.1154 
converged at iteration 64 with logLik: -52.30002 
converged at iteration 20 with logLik: -48.60721 
converged at iteration 35 with logLik: -52.30002 
converged at iteration 39 with logLik: -51.04047 
converged at iteration 40 with logLik: -52.30002 
converged at iteration 34 with logLik: -52.30002 
converged at iteration 32 with logLik: -52.30002 
converged at iteration 33 with logLik: -52.30002 
converged at iteration 37 with logLik: -52.30002 
converged at iteration 26 with logLik: -48.60721 
converged at iteration 42 with logLik: -52.30002 
converged at iteration 30 with logLik: -52.30002 
converged at iteration 96 with logLik: -53.46158 
converged at iteration 38 with logLik: -52.30002 
converged at iteration 62 with logLik: -53.46158 
converged at iteration 56 with logLik: -53.46158 
converged at iteration 40 with logLik: -52.30002 
converged at iteration 63 with logLik: -53.46158 
converged at iteration 32 with logLik: -52.30002 
converged at iteration 47 with logLik: -51.04047 
converged at iteration 58 with logLik: -53.46158 
converged at iteration 33 with logLik: -51.04047 
converged at iteration 34 with logLik: -51.04047 
converged at iteration 34 with logLik: -52.30002 
converged at iteration 34 with logLik: -51.04047 
converged at iteration 33 with logLik: -52.30002 
converged at iteration 34 with logLik: -52.30002 
converged at iteration 33 with logLik: -48.60721 
converged at iteration 37 with logLik: -52.30002 
converged at iteration 62 with logLik: -53.46158 
converged at iteration 37 with logLik: -51.04047 
converged at iteration 45 with logLik: -52.30002 
converged at iteration 38 with logLik: -52.30002 
converged at iteration 34 with logLik: -52.30002 
converged at iteration 36 with logLik: -52.30002 
converged at iteration 89 with logLik: -52.30002 
converged at iteration 52 with logLik: -51.1154 
converged at iteration 36 with logLik: -51.1154 
converged at iteration 71 with logLik: -53.46158 
converged at iteration 45 with logLik: -52.30002 
converged at iteration 30 with logLik: -52.30002 
converged at iteration 48 with logLik: -51.03809 
converged at iteration 95 with logLik: -53.46158 
converged at iteration 43 with logLik: -52.30002 
converged at iteration 32 with logLik: -52.30002 
converged at iteration 62 with logLik: -51.1154 
converged at iteration 37 with logLik: -52.30002 
converged at iteration 33 with logLik: -48.60721 
converged at iteration 31 with logLik: -52.30002 
converged at iteration 32 with logLik: -52.30002 
converged at iteration 38 with logLik: -52.30002 
converged at iteration 31 with logLik: -52.30002 
converged at iteration 32 with logLik: -51.04047 
converged at iteration 42 with logLik: -52.30002 
converged at iteration 48 with logLik: -52.30002 
converged at iteration 32 with logLik: -52.30002 
converged at iteration 41 with logLik: -52.30002 
converged at iteration 25 with logLik: -48.60721 
converged at iteration 88 with logLik: -53.46158 
converged at iteration 29 with logLik: -48.60721 
converged at iteration 31 with logLik: -52.30002 
converged at iteration 76 with logLik: -53.46158 
converged at iteration 35 with logLik: -51.04047 
Code
summary(best_model2)
Initial state probabilities model 
pr1 pr2 pr3 
  0   0   1 

Transition matrix 
        toS1  toS2  toS3
fromS1 0.500 0.000 0.500
fromS2 0.072 0.781 0.147
fromS3 0.000 0.573 0.427

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.sd
St1           7.785  0.365
St2          -1.329  1.079
St3           2.947  1.124

This transition matrix is a little trickier to interpret, but the likelihood of transitioning to state three was low, ~7% for state one and 0% of transitioning from 2 to 3. This may not be a great specification.

Code
# plot(ts(posterior(best_model1, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",
#      main="Posterior probability of state 1 (copepods good?).",
#      frame=FALSE)

prstates <- apply(posterior(best_model2)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu <- summary(best_model2)[,1]
Initial state probabilities model 
pr1 pr2 pr3 
  0   0   1 

Transition matrix 
        toS1  toS2  toS3
fromS1 0.500 0.000 0.500
fromS2 0.072 0.781 0.147
fromS3 0.000 0.573 0.427

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.sd
St1           7.785  0.365
St2          -1.329  1.079
St3           2.947  1.124
Code
pred <- tibble("Year" = dat$years, "Copepod Richness" = mu[prstates])

#predicted values
pred2 <- pred %>% 
  pivot_longer(-Year, names_to = "Copepod Richness", values_to = "Fit")

ggplot() +
   geom_line(data = pred2, aes(x = years, y = Fit, color = "predicted_states"), size = 1) +
  geom_line(data = dat, aes(x = years, y = CopeRich, color = "CopeRich"), size = 1) +
  geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color = "PDO" ), size = 1)+
  scale_color_manual(values = c(PDO = "blue", SST = "red", CopeRich = "green",
                                FishInx = "purple", predicted_states = "orange")) +
  labs(title = " Model 2 - Three States", x = "Year", y = "Value", color = "Variable") +
  theme_minimal()

Compared to model 1, model 2 produces a poor visual fit to the data, with a narrower peak later in the timeseries. Because of this we assumed 2 states in the other models we tested.

Model 3 Copepods and PDO

Because Copepods and PDO have such a clear relationship visually, we started by looking at a model considering both copepod richness and PDO as intercepts, with no covariates informing the transition.

Code
#Copepods and PDO as intercepts 
mod3 = depmix(list(CopeRich ~1, PDO~1),
             nstates = 2, 
             transition = ~1, 
             family = list(gaussian(),gaussian()),
             data=dat)

fitmod3 = fit(mod3)
converged at iteration 29 with logLik: -110.5438 
Code
#Find best fit 
iter <-100
seeds<-sample(100:1000, size = iter, replace = F)
best3 <- 1e10
best_model3 <- NA
for(i in 1:iter){
  set.seed(seeds[i])
  fitmod3 <- fit(mod3)
  if(AIC(fitmod3)< best3){
    best_model3 <- fitmod3
    best3 <- AIC(fitmod3)
  }
}
converged at iteration 29 with logLik: -110.5438 
converged at iteration 24 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 30 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 31 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 24 with logLik: -110.5438 
converged at iteration 23 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 23 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 32 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 24 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 24 with logLik: -110.5438 
converged at iteration 18 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 30 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 24 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 26 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 24 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 31 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 24 with logLik: -110.5438 
converged at iteration 30 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 23 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 27 with logLik: -110.5438 
converged at iteration 25 with logLik: -110.5438 
converged at iteration 28 with logLik: -110.5438 
converged at iteration 29 with logLik: -110.5438 
Code
summary(best_model3)
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.614 0.386
fromS2 0.250 0.750

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1           3.460  2.482           4.030  2.625
St2          -1.502  0.989          -1.946  2.262

This model still preferred state one, with a ~75% probability of staying in one and ~25% chance of transitioning to two. There was a ~61% change of state two to stay in state two and a ~39% chance to transition back to state one. Note there are values for the intercepts and sd for Re1 (coperich) and Re2 (PDO) in this model.

Code
plot(ts(posterior(best_model3, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",
     main="Posterior probability of state 1 (copepods good?).",
     frame=FALSE)

Code
prstates<-apply(posterior(best_model3)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model3)[,1]
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.614 0.386
fromS2 0.250 0.750

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1           3.460  2.482           4.030  2.625
St2          -1.502  0.989          -1.946  2.262
Code
mu3<-summary(best_model3)[,3]
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.614 0.386
fromS2 0.250 0.750

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1           3.460  2.482           4.030  2.625
St2          -1.502  0.989          -1.946  2.262
Code
pred <- tibble("Year" = dat$years, "Copepod Richness" = mu[prstates],"PDO" = mu3[prstates])
pred3<-pred %>% pivot_longer(-Year, names_to = "Covariates", values_to = "Fit")


ggplot() +
  geom_line(data = pred3, aes(x = Year, y = Fit, group = Covariates, color = Covariates)) +
  geom_line(data = dat, aes(x = years, y = CopeRich, color = "CopeRich"), linewidth = 1) +
  geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color = "PDO"), linewidth = 1) +
  scale_color_manual(values = c("orange", "lightgreen", "blue")) +  # Assign distinct colors for each line in pred3
  labs(title = "Model 3 - Transition ~ 1", x = "Year", y = "Value", color = "Variable") +
  theme_minimal()

Model 4 Copepods and PDO with transition informed by PDO

Next we wanted to look at copepod richness and PDO with the transition informed by the PDO.

Code
#Copepods and PDO as intercepts where the transition is informed by PDO
mod4 = depmix(list(CopeRich ~1, PDO~1),
              nstates = 2, 
              transition = ~PDO, 
              family = list(gaussian(),gaussian()),
              data=dat)
fitmod4 = fit(mod4)
converged at iteration 15 with logLik: -109.5888 
Code
#Find best fit 
iter <-100
seeds<-sample(100:1000, size = iter, replace = F)
best4 <- 1e10
best_model4 <- NA
for(i in 1:iter){
  set.seed(seeds[i])
  fitmod4 <- fit(mod4)
  if(AIC(fitmod4)< best4){
    best_model4 <- fitmod4
    best4 <- AIC(fitmod4)
  }
}
converged at iteration 16 with logLik: -109.5888 
converged at iteration 13 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 8 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 22 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 11 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 13 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 20 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 13 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 11 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 13 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 19 with logLik: -109.5888 
converged at iteration 7 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 10 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 11 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 11 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 18 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 17 with logLik: -109.5888 
converged at iteration 10 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 24 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 15 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 19 with logLik: -109.5888 
converged at iteration 14 with logLik: -109.5888 
converged at iteration 9 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 19 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
converged at iteration 13 with logLik: -109.5888 
converged at iteration 16 with logLik: -109.5888 
Code
summary(best_model4)
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1         St2
(Intercept)   0 -1.29578981
PDO           0 -0.07649563
Probalities at zero values of the covariates.
0.7851256 0.2148744 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1        St2
(Intercept)   0 -1.1892398
PDO           0  0.4097314
Probalities at zero values of the covariates.
0.7666051 0.2333949 


Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1          -1.438  1.019          -1.803  2.325
St2           3.650  2.414           4.168  2.628

This transition matrix isn’t interpretable as is, but we’ll come back to that later. One question is whether or not this is appropriate, to model PDO as an intercept and use it also as a covariate. Is this confounding? Poor practice?

Code
plot(ts(posterior(best_model4, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",
     main="Posterior probability of state 1 (copepods good?).",
     frame=FALSE)

Code
prstates<-apply(posterior(best_model4)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model4)[,1]
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1         St2
(Intercept)   0 -1.29578981
PDO           0 -0.07649563
Probalities at zero values of the covariates.
0.7851256 0.2148744 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1        St2
(Intercept)   0 -1.1892398
PDO           0  0.4097314
Probalities at zero values of the covariates.
0.7666051 0.2333949 


Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1          -1.438  1.019          -1.803  2.325
St2           3.650  2.414           4.168  2.628
Code
mu4<-summary(best_model4)[,3]
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1         St2
(Intercept)   0 -1.29578981
PDO           0 -0.07649563
Probalities at zero values of the covariates.
0.7851256 0.2148744 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1        St2
(Intercept)   0 -1.1892398
PDO           0  0.4097314
Probalities at zero values of the covariates.
0.7666051 0.2333949 


Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1          -1.438  1.019          -1.803  2.325
St2           3.650  2.414           4.168  2.628
Code
pred <- tibble("Year" = dat$years, "Copepod Richness" = mu[prstates],"PDO" = mu4[prstates])
pred4<-pred %>% pivot_longer(-Year, names_to = "Covariates", values_to = "Fit")

ggplot() +
  geom_line(data = pred4, aes(x = Year, y = Fit, group = Covariates, color = Covariates)) +
  geom_line(data = dat, aes(x = years, y = CopeRich, color = "CopeRich"), linewidth = 1) +
  geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color = "PDO"), linewidth = 1) +
  scale_color_manual(values = c("orange", "lightgreen", "blue")) +  # Assign distinct colors for each line in pred3
  labs(title = " Model 4 - Transition ~ PDO", x = "Year", y = "Value", color = "Variable") +
  theme_minimal()

Models 3 and 4 produce similar fits to the data, the only obvious difference is that model 4 spends less time in the higher state than model 3.

Model 5 Copepods informed by PDO

In this model, we wanted to relate copepod richness to PDO directly, with no transtion.

Code
#Copepods as they relate to PDO
mod5 = depmix(list(CopeRich ~PDO),
              nstates = 2, 
              transition = ~1, 
              family = list(gaussian()),
              data=dat)
fitmod5 = fit(mod5)
converged at iteration 43 with logLik: -41.20702 
Code
#Find best fit 
iter <-100
seeds<-sample(100:1000, size = iter, replace = F)
best5 <- 1e10
best_model5 <- NA
for(i in 1:iter){
  set.seed(seeds[i])
  fitmod5 <- fit(mod5)
  if(AIC(fitmod5)< best5){
    best_model5 <- fitmod5
    best5 <- AIC(fitmod5)
  }
}
converged at iteration 38 with logLik: -41.20702 
converged at iteration 44 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 46 with logLik: -41.20702 
converged at iteration 23 with logLik: -39.88307 
converged at iteration 38 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 30 with logLik: -39.88307 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 22 with logLik: -39.88307 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 39 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 49 with logLik: -41.20702 
converged at iteration 46 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 43 with logLik: -43.79142 
converged at iteration 28 with logLik: -39.88307 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 43 with logLik: -43.79142 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 44 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 39 with logLik: -41.20702 
converged at iteration 38 with logLik: -41.20702 
converged at iteration 25 with logLik: -39.88307 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 26 with logLik: -39.88307 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 48 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 28 with logLik: -44.05687 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 22 with logLik: -39.88307 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 34 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 47 with logLik: -41.20702 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 22 with logLik: -39.88307 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 44 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 51 with logLik: -43.79142 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 44 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 51 with logLik: -41.20702 
converged at iteration 39 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 21 with logLik: -39.88307 
converged at iteration 51 with logLik: -41.20702 
converged at iteration 48 with logLik: -41.20702 
converged at iteration 44 with logLik: -41.20702 
converged at iteration 40 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 24 with logLik: -39.88307 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 42 with logLik: -41.20702 
converged at iteration 39 with logLik: -43.79142 
converged at iteration 21 with logLik: -39.88307 
converged at iteration 43 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 49 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 41 with logLik: -41.20702 
converged at iteration 46 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 45 with logLik: -41.20702 
converged at iteration 20 with logLik: -39.88307 
Code
summary(best_model5)
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition matrix 
        toS1  toS2
fromS1 0.730 0.270
fromS2 0.269 0.731

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.PDO Re1.sd
St1          -1.184   0.463  0.534
St2           1.344   0.684  1.219

This model has an interpreatable transition matrix, where the probability of staying or leaving each respective state is about ~70-30. Note that we now have PDO as informing our copepod richness in the response parameters.

Code
plot(ts(posterior(best_model5, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",
     main="Posterior probability of state 1 (copepods good?).",
     frame=FALSE)

Code
prstates<-apply(posterior(best_model5)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model5)[,1]
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition matrix 
        toS1  toS2
fromS1 0.730 0.270
fromS2 0.269 0.731

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.PDO Re1.sd
St1          -1.184   0.463  0.534
St2           1.344   0.684  1.219
Code
pred <- tibble("Year" = dat$years, "Copepod Richness" = mu[prstates])
pred5<-pred %>% pivot_longer(-Year, names_to = "Copepod Richness", values_to = "Fit")

ggplot() +
   geom_line(data = pred5, aes(x = years, y = Fit, color = "predicted_states"), size = 1) +
  geom_line(data = dat, aes(x = years, y = CopeRich, color = "CopeRich"), size = 1) +
  geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color = "PDO" ), size = 1)+
  scale_color_manual(values = c(PDO = "blue", SST = "red", CopeRich = "green",
                                FishInx = "purple", predicted_states = "orange")) +
  labs(title = " Model 5 - Copepod Richness ~ PDO , Transition ~1", x = "Year", y = "Value", color = "Variable") +
  theme_minimal()

Copepod richness and PDO data have the closest correlation, and it seemed reasonable to relate them to each other and model the variation of Copepod Richness as a function of PDO. Admittedly, we are not sure if this is a appropriate entry to the depmix() function but the results for models 5 and 6 were interesting so we included them.

Model 6 Copepods informed by PDO with PDO informing transition

This model is the same as model 5, but using PDO as a covariate for transition, which could be inappropriate for hmm model specification. But, we wanted to explore it.

For the intercept we chose to model Copepod Richness ~ PDO. These data sets have the closest correlation and so this was done to test of the fit could be improved by modeling the variation of Copepod Richness as a function of PDO. Admittedly, we are not sure if this is a appropriate entry to the depmix() function but the results for models 5 and 6 were interesting so we included them.

Code
#Copepods as they relate to PDO where the transition is informed by PDO
mod6 = depmix(list(CopeRich ~PDO),
              nstates = 2, 
              transition = ~PDO, 
              family = list(gaussian()),
              data=dat)
fitmod6 = fit(mod6)
converged at iteration 26 with logLik: -36.69908 
Code
#Find best fit 
iter <-100
seeds<-sample(100:1000, size = iter, replace = F)
best6 <- 1e10
best_model6 <- NA
for(i in 1:iter){
  set.seed(seeds[i])
  fitmod6 <- fit(mod6)
  if(AIC(fitmod6)< best6){
    best_model6 <- fitmod6
    best6 <- AIC(fitmod6)
  }
}
converged at iteration 27 with logLik: -36.69921 
converged at iteration 30 with logLik: -36.69921 
converged at iteration 28 with logLik: -36.69923 
converged at iteration 31 with logLik: -36.6992 
converged at iteration 29 with logLik: -36.69921 
converged at iteration 26 with logLik: -36.69944 
converged at iteration 27 with logLik: -36.69907 
converged at iteration 30 with logLik: -36.69923 
converged at iteration 30 with logLik: -36.69923 
converged at iteration 31 with logLik: -36.69907 
converged at iteration 28 with logLik: -36.6992 
converged at iteration 28 with logLik: -36.69907 
converged at iteration 30 with logLik: -36.69921 
converged at iteration 30 with logLik: -36.69907 
converged at iteration 29 with logLik: -36.69943 
converged at iteration 16 with logLik: -38.06323 
converged at iteration 30 with logLik: -36.69908 
converged at iteration 28 with logLik: -36.69922 
converged at iteration 32 with logLik: -36.69923 
converged at iteration 31 with logLik: -36.69908 
converged at iteration 29 with logLik: -36.69922 
converged at iteration 29 with logLik: -36.69908 
converged at iteration 27 with logLik: -36.69918 
converged at iteration 28 with logLik: -36.69922 
converged at iteration 30 with logLik: -36.69922 
converged at iteration 27 with logLik: -36.6992 
converged at iteration 29 with logLik: -36.69921 
converged at iteration 27 with logLik: -36.69922 
converged at iteration 33 with logLik: -36.69921 
converged at iteration 34 with logLik: -36.69907 
converged at iteration 27 with logLik: -36.69907 
converged at iteration 13 with logLik: -38.06323 
converged at iteration 31 with logLik: -36.69921 
converged at iteration 28 with logLik: -36.69923 
converged at iteration 25 with logLik: -36.6992 
converged at iteration 29 with logLik: -36.69907 
converged at iteration 26 with logLik: -36.69922 
converged at iteration 26 with logLik: -36.6992 
converged at iteration 26 with logLik: -36.69912 
converged at iteration 29 with logLik: -36.69923 
converged at iteration 26 with logLik: -36.69944 
converged at iteration 26 with logLik: -36.69922 
converged at iteration 28 with logLik: -36.69908 
converged at iteration 26 with logLik: -36.69922 
converged at iteration 26 with logLik: -36.69912 
converged at iteration 34 with logLik: -36.69909 
converged at iteration 64 with logLik: -41.46754 
converged at iteration 31 with logLik: -36.69907 
converged at iteration 29 with logLik: -36.69919 
converged at iteration 28 with logLik: -36.69921 
converged at iteration 28 with logLik: -36.69907 
converged at iteration 26 with logLik: -36.69908 
converged at iteration 29 with logLik: -36.6992 
converged at iteration 16 with logLik: -38.06323 
converged at iteration 26 with logLik: -36.69921 
converged at iteration 28 with logLik: -36.69921 
converged at iteration 30 with logLik: -36.69907 
converged at iteration 29 with logLik: -36.69907 
converged at iteration 31 with logLik: -36.6992 
converged at iteration 32 with logLik: -36.6992 
converged at iteration 27 with logLik: -36.69921 
converged at iteration 29 with logLik: -36.69908 
converged at iteration 27 with logLik: -36.6992 
converged at iteration 27 with logLik: -36.6992 
converged at iteration 28 with logLik: -36.6992 
converged at iteration 27 with logLik: -36.69907 
converged at iteration 27 with logLik: -36.69908 
converged at iteration 28 with logLik: -36.6992 
converged at iteration 30 with logLik: -36.69907 
converged at iteration 28 with logLik: -36.69908 
converged at iteration 28 with logLik: -36.6992 
converged at iteration 26 with logLik: -36.6992 
converged at iteration 27 with logLik: -36.69921 
converged at iteration 26 with logLik: -36.69923 
converged at iteration 28 with logLik: -36.69921 
converged at iteration 29 with logLik: -36.69923 
converged at iteration 28 with logLik: -36.69919 
converged at iteration 29 with logLik: -36.6992 
converged at iteration 28 with logLik: -36.69907 
converged at iteration 28 with logLik: -36.69944 
converged at iteration 29 with logLik: -36.69907 
converged at iteration 27 with logLik: -36.69922 
converged at iteration 28 with logLik: -36.69921 
converged at iteration 31 with logLik: -36.69922 
converged at iteration 29 with logLik: -36.69907 
converged at iteration 32 with logLik: -36.6992 
converged at iteration 16 with logLik: -38.06323 
converged at iteration 27 with logLik: -36.69907 
converged at iteration 28 with logLik: -36.69921 
converged at iteration 25 with logLik: -36.69919 
converged at iteration 27 with logLik: -36.69913 
converged at iteration 31 with logLik: -36.69922 
converged at iteration 28 with logLik: -36.69922 
converged at iteration 28 with logLik: -36.69921 
converged at iteration 31 with logLik: -36.69912 
converged at iteration 31 with logLik: -36.69921 
converged at iteration 27 with logLik: -36.69922 
converged at iteration 30 with logLik: -36.69923 
converged at iteration 27 with logLik: -36.69944 
converged at iteration 26 with logLik: -36.69921 
Code
summary(best_model6)
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1       St2
(Intercept)   0 12.511030
PDO           0 -0.150226
Probalities at zero values of the covariates.
3.685761e-06 0.9999963 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1      St2
(Intercept)   0 6.846489
PDO           0 4.202840
Probalities at zero values of the covariates.
0.001062053 0.9989379 


Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.PDO Re1.sd
St1          -0.220   0.199  0.704
St2           0.136   0.892  1.151

Again, given that we included PDO as a transition covariate, the matrix was not interpretable as is in the model summary, but we address this in the More Model Diagnostics Section.

Code
plot(ts(posterior(best_model6, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",
     main="Posterior probability of state 1 (copepods good?).",
     frame=FALSE)

Code
prstates<-apply(posterior(best_model6)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model6)[,1]
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1       St2
(Intercept)   0 12.511030
PDO           0 -0.150226
Probalities at zero values of the covariates.
3.685761e-06 0.9999963 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1      St2
(Intercept)   0 6.846489
PDO           0 4.202840
Probalities at zero values of the covariates.
0.001062053 0.9989379 


Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.PDO Re1.sd
St1          -0.220   0.199  0.704
St2           0.136   0.892  1.151
Code
pred <- tibble("Year" = dat$years, "Copepod Richness" = mu[prstates])
pred6<-pred %>% pivot_longer(-Year, names_to = "Copepod Richness", values_to = "Fit")

ggplot() +
   geom_line(data = pred6, aes(x = years, y = Fit, color = "predicted_states"), size = 1) +
  geom_line(data = dat, aes(x = years, y = CopeRich, color = "CopeRich"), size = 1) +
  geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color = "PDO" ), size = 1)+
  scale_color_manual(values = c(PDO = "blue", SST = "red", CopeRich = "green",
                                FishInx = "purple", predicted_states = "orange")) +
  labs(title = " Model 6 - Copepod Richness ~ PDO, Transition ~ PDO", x = "Year", y = "Value", color = "Variable") +
  theme_minimal()

Visually, model 6 produced the worst fit to the data. This model suggests there is a dominant state near the mean of the data and the second state is only marginally different. The dampening of the oscillation must be from the transition being informed directly by PDO.

Results

Summarize the model you’ve created. Specifically,

  • Does it converge?

All the models converged even when running the for loop over various seeds, however visual inspection of model fits revealed that several fits were quite poor, including our top model ranked by AICc.

  • How many states seem to be most supported?

Compared to model 1, model two with 3 states produced a poor visual fit to the data. Because of this we assumed 2 states in the other models we tested. This aligned with our visual estimates of two states.

  • Plot the time series of estimated states. What does this mean?

The plots of estimated states show the underlying pattern that may exist if there is in fact a shift between states. The variation in the data from this pattern would be related to unexplained error related to covariates other than the underlying states. Visually there appears to be good support for two underlying states, however, as shown below, the estimated states in our top AICc ranked model do not appear to fit the data well, indicating that AICc may not be the best metric to use here. It definitely highlights the dangers of blindly trusting AIC without further model checks.

Model Selection with AIC

Code
mods <- c("1","2","3","4","5","6")
aic <- c(AIC(best_model1), AIC(best_model2), AIC(best_model3), AIC(best_model4), AIC(best_model5),AIC(best_model6))
delta.aic <- aic-min(aic)
tab <- cbind.data.frame(mods, aic, delta.aic)

tab
  mods       aic  delta.aic
1    1 123.36068  27.962542
2    2 125.21442  29.816290
3    3 243.08766 147.689531
4    4 245.17760 149.779466
5    5  97.76613   2.367998
6    6  95.39813   0.000000

AIC tells us that the best model would be model 6, although it is just marginally better than model 5 with a delta AIC of just over 2. The delta AIC values are only about 2 different between models 1-2 and 3-4 and 5-6 respectively. Each set of models are the most similar to each other within the models tested and these AIC values indicate little to no support for one approach within each group to be significantly better than the other. AIC values do indicate significantly better performance between these sets of models, however this support is not supported by looking at visual fits of the data. The two best models, 5 and 6, produce the worst visual fits to the data of any models tested. It is unclear why AIC is selecting these models as the best, but it does seem that AIC favors the simple models with only one intercept rather than models with two intercepts. Because of the apparent poor fit of models 5 and 6, it seems other methods of model selection such as Cross Validation or leave One Out may be more appropriate here than AIC.

More Model Diagnostics

The following function allows us to interpret the transition matrix when there is a covariate informing the transition.

Code
#create a funciton to help build transition matrices
convert_probs <- function(fit, coef_value = 0) {
  # extract coefficients
  p1 <- fit@transition[[1]]@parameters$coefficients
  p2 <- fit@transition[[2]]@parameters$coefficients
  
  # assemble transition matrix in mlogit space
  m <- matrix(0, 2, 2)
  m[1,] <- p1[1,] + p1[2,] * coef_value #intercept + slope * coefficient
  m[2,] <- p2[1,] + p2[2,] * coef_value
  
  # exponentiate
  m <- exp(m)
  
  # normalize
  m[1,] <- m[1,] / sum(m[1,])
  m[2,] <- m[2,] / sum(m[2,])
  return(m)
}

Let’s look at all the models together. Note that the above function was applied to models 4 and 6, which did not have an interpretable transition matrix.

Code
#Look at the transition matrices and variance
summary(fitmod1)
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition matrix 
        toS1  toS2
fromS1 0.754 0.246
fromS2 0.402 0.598

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.sd
St1          -1.414  1.050
St2           3.504  2.552
Code
summary(fitmod2)
Initial state probabilities model 
pr1 pr2 pr3 
  1   0   0 

Transition matrix 
        toS1  toS2  toS3
fromS1 0.000 1.000 0.000
fromS2 0.067 0.779 0.154
fromS3 0.411 0.000 0.589

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.sd
St1           3.268  0.754
St2          -1.320  1.101
St3           4.431  2.951
Code
summary(fitmod3)
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition matrix 
        toS1  toS2
fromS1 0.750 0.250
fromS2 0.386 0.614

Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1          -1.502  0.990          -1.945  2.262
St2           3.460  2.482           4.030  2.625
Code
summary(fitmod4) # no transition matrix
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1        St2
(Intercept)   0  1.1898714
PDO           0 -0.4098372
Probalities at zero values of the covariates.
0.2332819 0.7667181 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1        St2
(Intercept)   0 1.29650280
PDO           0 0.07664089
Probalities at zero values of the covariates.
0.2147542 0.7852458 


Response parameters 
Resp 1 : gaussian 
Resp 2 : gaussian 
    Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1           3.650  2.414           4.168  2.628
St2          -1.438  1.019          -1.803  2.325
Code
summary(fitmod5)
Initial state probabilities model 
pr1 pr2 
  1   0 

Transition matrix 
        toS1  toS2
fromS1 0.731 0.269
fromS2 0.270 0.730

Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.PDO Re1.sd
St1           1.344   0.684  1.219
St2          -1.184   0.463  0.534
Code
summary(fitmod6) # no transition matrix
Initial state probabilities model 
pr1 pr2 
  0   1 

Transition model for state (component) 1 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1         St2
(Intercept)   0 10.22711258
PDO           0 -0.03208095
Probalities at zero values of the covariates.
3.617477e-05 0.9999638 

Transition model for state (component) 2 
Model of type multinomial (mlogit), formula: ~PDO
Coefficients: 
            St1      St2
(Intercept)   0 6.846492
PDO           0 4.202830
Probalities at zero values of the covariates.
0.00106205 0.9989379 


Response parameters 
Resp 1 : gaussian 
    Re1.(Intercept) Re1.PDO Re1.sd
St1          -0.220   0.199  0.704
St2           0.136   0.892  1.151
Code
#get the transition matrices
convert_probs(fitmod4)
          [,1]      [,2]
[1,] 0.2332819 0.7667181
[2,] 0.2147542 0.7852458
Code
convert_probs(fitmod6)
             [,1]      [,2]
[1,] 3.617477e-05 0.9999638
[2,] 1.062050e-03 0.9989379

State 3 in model 2 appears to be fairly unstable as its state 3 to state 3 transition probability is 0, and it has a fairly low probability of other states transitioning to state 3, outside of state 2 to state 3. Our best model run did not generate any observations in state 3 so it is unclear if this is a middle state as we hypothesized or not.

The state 1 transition probabilities in models 3 and 4 showed an opposite pattern, yet produced very similar fits to the data. The difference appeared to only be shown in the most recent years, where in model 4 there was a more rapid shift out of the higher state. This aligns with what we would expect with the change in transition probabilities from state 1 to state 2.

Model 5 appeared to have smaller intercept values which makes sense as the predicted states were closer to 0 than in the other models.

The transition matrix for model 6 was the most unique with a high likelihood that the model will transition to and stay in state 2. This is another good indication that the model is a poor fit to the data as the data visually appears to have at least two states, and the other models produced good visual fits with a two state approach.

Discussion

PDO is an index of ocean conditions that is generally thought to act as a driver for abundance and richness of organisms in the North Pacific. Here we predicted that PDO would be a driver of copepod richness. The two time series of data appear to closely co-vary and seem to show a pattern of a high and low state with rapid swings between the two. Our models using 2 states produced good visual fits and we found little support for three states. AIC appears to be a relatively poor choice for model selection in this instance although it is unclear if that is specific to HMM models or if there was a programming error. Other types of model selection would likely increase our understanding of model performance. We saw no obvious difference between a rapid shift in states , transition = ~1 and a shift proportional to the PDO data set. This suggests that there is in fact a very rapid shift between states, which aligns with our expected result based on the general lack of observations near neutral conditions.

What we’re still confused about

One question we had is that our ‘best’ model based on AIC values did not appear to be a good fit to the data visually. When this is the case, what should our next steps be? Visually confirming model fit seems to overrule AIC values when those diagnostics are contradictory, is this true?

We didn’t have a good handle on how the ‘nstates’ and ‘transitions’ inputs to the models interacted and what type of model diagnostics would help us quantify underlying relationships. After we were in pretty deep we realized having PDO in both the model list (related to copepod richness or as an independent intercept) and as a covarate for transition may be inappropriate, but are unsure and would like more information.

PDO has a very strong correlative pattern with copepod richness. We ran several combinations of model to examine the strength of this relationship. We produced some models with decent looking fits, but these models were not as highly ranked by AIC as other, seemingly less well fitting models. Is AIC a poor choice? We have a suspicion this question will be answered in the next lecture(s)…

Team contributions

Madison took point on coding, Nick helped with coding and compiled it into the final rmd, and Nick and Karl did writing and Madison proofread and organized the .Rmd document.