In this lab, we’re looking at ecological indicators and exploring hidden Markov models and probabilities of shifting between underlying states.
Load the data
We explored the stoplight.csv dataset for this week. One of the columns divides indicators into groups (e.g. Climate, Local Physical, Local Biological, etc) and we wanted to look at some biological indicators and climate/physical indicators.
We initially selected the Pacific Decadal Osculation (PDO) for the Dec/March period, sea surface temperature (SST), and copepod richness and ichthy communitiy index, which we assumed was comperable to copepod richness (though maybe not).
Code
#ecosystem indicatorsstoplight[,1]
[1] "PDO_DecMarch"
[2] "PDO_MaySept)"
[3] "ONI"
[4] "SST"
[5] "Upper_20m_NovMar"
[6] "Upper_20m_MaySept"
[7] "Deep_temperature"
[8] "Deep_salinity"
[9] "Copepod_richness"
[10] "N_copepod"
[11] "S_copepod"
[12] "Biological_transition"
[13] "Nearshore_Ichthyoplankton"
[14] "Ichthy_community_index"
[15] "Chinook_salmon_juv"
[16] "Coho_salmon_juv"
[17] "Principal Component scores (PC1)"
[18] "Principal Component scores (PC2)"
[19] "Physical Spring Trans.\nUI based (day of year)"
[20] "Physical Spring Trans. Hydrographic (day of year)"
[21] "Upwelling Anomaly\n(Sum April-May)"
[22] "Length of Upwelling Season\nUI based (days)"
[23] "Copepod Community Index\n(MDS axis 1 scores; May-Sept)"
The first rule of science is to plot your data! We started with looking at all our data together.
Code
# Plotting allggplot(dat, aes(x = years)) +geom_line(aes(y = PDO, color ="PDO"), linewidth =1) +geom_line(aes(y = SST, color ="SST"), linewidth =1) +geom_line(aes(y = CopeRich, color ="CopeRich"), linewidth =1) +geom_line(aes(y = FishInx, color ="FishInx"), linewidth =1) +scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green", FishInx ="purple")) +labs(x ="Year", y ="Value", color ="Variable") +theme_minimal()
Sea surface temperature stats pretty consistent, while the fish index had a significant range, possibly following PDO if you squint, but was not immediatly compelling. We narrowed it down and decided to focus on copepod richness and PDO, which seemed to be incredibly correlated, and plotted the time series and their relationship to each other. In a positive PDO index, there is a positive correlation with copepod index.
Code
# Plotting Coperich and env covggplot(dat, aes(x = years)) +geom_line(aes(y = PDO, color ="PDO"), size =1) +geom_line(aes(y = CopeRich, color ="CopeRich"), size =1) +scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green", FishInx ="purple")) +labs(x ="Year", y ="Value", color ="Variable") +theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Code
#CopeRichness tracks very closely with PDO#SST seems to have a negligible correlation ggplot(dat, aes(x = PDO, y = CopeRich, color = CopeRich <0)) +geom_point() +geom_line(aes(x = PDO, y = CopeRich)) +geom_hline(yintercept =0, linetype ="dotted") +geom_vline(xintercept =0, linetype ="dotted") +scale_color_manual(values =c("red", "blue"), labels =c("Above 0", "Below 0")) +labs(x ="PDO", y ="CopeRich") +theme_minimal()
Code
#fish index is all over the place, let's not use it.
We also wanted to think about our possible hidden states, and while there seems to be a clear two-state sytem with PDO in a negative or positive index, we wondered if there was a third state “weak” index.
Our primary objectives for this lab were to gain a better understanding of hmm options and getting some practice with interpretation by exploring different model options considering copepod richness and PDO. While there was much more we could have done, we decided to explore model specification, and interpretation of AIC for these types of models, which can be misleading depending on the ecological assumptions being made.
General tasks
Please pick a type of indicators, and develop a 2- or 3-state multivariate HMM.
Methods
We fit models with different either two or three states to determine whether copepod richness and/or PDO seems to have 2 or 3 states. It’s unclear if there is a ‘neutral’ state that modulates copepod richness in addition to the hot and cool phases.
We also explored the transition option, by assuming either that there is no covariate informing the transition, or that PDO informed the transition.
We assumed the familys were gaussian for all models tested.
We plotted the fit of all models to copepod richness and PDO data from the stoplight dataset. We then also compared model fits with AIC in an attempt to select the best model.
Note that in the report we give probabilities of transitioning to and from states, but these models are very sensitive to starting conditions, so we iterate over many set seeds for each model to find the best fit. That being said, some of the values written in this report may be different upon knitting.
Model 1 Copepods only: 2 states
We started with a very simple model, with copepod richness as the intercept and no covariates informing the transition. We started by assuming a two state system.
Code
# Model 1mod1 =depmix(CopeRich ~1,nstates =2, transition =~1, family =gaussian(),data=dat)fitmod1 =fit(mod1)
converged at iteration 16 with logLik: -54.68034
Code
#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best1 <-1e10best_model1 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod1 <-fit(mod1)if(AIC(fitmod1)< best1){ best_model1 <- fitmod1 best1 <-AIC(fitmod1) }}
converged at iteration 20 with logLik: -54.68034
converged at iteration 21 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 16 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 31 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 21 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 26 with logLik: -54.68034
converged at iteration 23 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 15 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 25 with logLik: -54.68034
converged at iteration 15 with logLik: -54.68034
converged at iteration 21 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 23 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 27 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 15 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 15 with logLik: -54.68034
converged at iteration 21 with logLik: -54.68034
converged at iteration 24 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 16 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 21 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 24 with logLik: -54.68034
converged at iteration 14 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 16 with logLik: -54.68034
converged at iteration 14 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 23 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 22 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 15 with logLik: -54.68034
converged at iteration 23 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 25 with logLik: -54.68034
converged at iteration 11 with logLik: -54.68034
converged at iteration 21 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 15 with logLik: -54.68034
converged at iteration 14 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 28 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 18 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 16 with logLik: -54.68034
converged at iteration 17 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
converged at iteration 19 with logLik: -54.68034
converged at iteration 20 with logLik: -54.68034
We see that there is a ~75% probability of state one staying in state one, and ~60% of state two staying in state two, with ~25% and 40% probability for transitioning from state 1 to state 2 and state 2 to state 1 respectively, showing that the system generally is more stable in state one.
Code
plot(ts(posterior(best_model1, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior Probability of Changing States",frame=FALSE)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])#predicted valuespred1<-pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred1, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 1 - Two States", x ="Year", y ="Value", color ="Variable") +theme_minimal()
Model 1 produces a reasonably good fit to the data.
Model 2 Copepods only: 3 states
We continued by using the same simple model from above, and assuming 3 states.
Code
# Model 2mod2 =depmix(CopeRich ~1,nstates =3, transition =~1, family =gaussian(),data=dat)fitmod2 =fit(mod2)
converged at iteration 46 with logLik: -52.30002
Code
#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best2 <-1e10best_model2 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod2 <-fit(mod2)if(AIC(fitmod2)< best2){ best_model2 <- fitmod2 best2 <-AIC(fitmod2) }}
converged at iteration 34 with logLik: -51.04047
converged at iteration 85 with logLik: -53.46158
converged at iteration 29 with logLik: -48.60721
converged at iteration 38 with logLik: -52.30002
converged at iteration 39 with logLik: -51.04047
converged at iteration 36 with logLik: -52.30002
converged at iteration 50 with logLik: -51.1154
converged at iteration 26 with logLik: -48.60721
converged at iteration 34 with logLik: -51.04047
converged at iteration 34 with logLik: -51.04047
converged at iteration 78 with logLik: -53.46158
converged at iteration 73 with logLik: -53.46158
converged at iteration 34 with logLik: -51.04047
converged at iteration 39 with logLik: -52.30002
converged at iteration 33 with logLik: -52.30002
converged at iteration 53 with logLik: -51.1154
converged at iteration 31 with logLik: -52.30002
converged at iteration 30 with logLik: -52.30002
converged at iteration 33 with logLik: -52.30002
converged at iteration 50 with logLik: -52.30002
converged at iteration 35 with logLik: -52.30002
converged at iteration 37 with logLik: -51.04047
converged at iteration 48 with logLik: -52.30002
converged at iteration 37 with logLik: -51.04047
converged at iteration 64 with logLik: -53.46158
converged at iteration 43 with logLik: -51.04047
converged at iteration 32 with logLik: -52.30002
converged at iteration 81 with logLik: -53.46158
converged at iteration 35 with logLik: -51.04047
converged at iteration 63 with logLik: -53.46158
converged at iteration 67 with logLik: -53.46158
converged at iteration 51 with logLik: -52.30002
converged at iteration 35 with logLik: -52.30002
converged at iteration 36 with logLik: -51.04047
converged at iteration 32 with logLik: -52.30002
converged at iteration 39 with logLik: -52.30002
converged at iteration 41 with logLik: -51.1154
converged at iteration 64 with logLik: -52.30002
converged at iteration 20 with logLik: -48.60721
converged at iteration 35 with logLik: -52.30002
converged at iteration 39 with logLik: -51.04047
converged at iteration 40 with logLik: -52.30002
converged at iteration 34 with logLik: -52.30002
converged at iteration 32 with logLik: -52.30002
converged at iteration 33 with logLik: -52.30002
converged at iteration 37 with logLik: -52.30002
converged at iteration 26 with logLik: -48.60721
converged at iteration 42 with logLik: -52.30002
converged at iteration 30 with logLik: -52.30002
converged at iteration 96 with logLik: -53.46158
converged at iteration 38 with logLik: -52.30002
converged at iteration 62 with logLik: -53.46158
converged at iteration 56 with logLik: -53.46158
converged at iteration 40 with logLik: -52.30002
converged at iteration 63 with logLik: -53.46158
converged at iteration 32 with logLik: -52.30002
converged at iteration 47 with logLik: -51.04047
converged at iteration 58 with logLik: -53.46158
converged at iteration 33 with logLik: -51.04047
converged at iteration 34 with logLik: -51.04047
converged at iteration 34 with logLik: -52.30002
converged at iteration 34 with logLik: -51.04047
converged at iteration 33 with logLik: -52.30002
converged at iteration 34 with logLik: -52.30002
converged at iteration 33 with logLik: -48.60721
converged at iteration 37 with logLik: -52.30002
converged at iteration 62 with logLik: -53.46158
converged at iteration 37 with logLik: -51.04047
converged at iteration 45 with logLik: -52.30002
converged at iteration 38 with logLik: -52.30002
converged at iteration 34 with logLik: -52.30002
converged at iteration 36 with logLik: -52.30002
converged at iteration 89 with logLik: -52.30002
converged at iteration 52 with logLik: -51.1154
converged at iteration 36 with logLik: -51.1154
converged at iteration 71 with logLik: -53.46158
converged at iteration 45 with logLik: -52.30002
converged at iteration 30 with logLik: -52.30002
converged at iteration 48 with logLik: -51.03809
converged at iteration 95 with logLik: -53.46158
converged at iteration 43 with logLik: -52.30002
converged at iteration 32 with logLik: -52.30002
converged at iteration 62 with logLik: -51.1154
converged at iteration 37 with logLik: -52.30002
converged at iteration 33 with logLik: -48.60721
converged at iteration 31 with logLik: -52.30002
converged at iteration 32 with logLik: -52.30002
converged at iteration 38 with logLik: -52.30002
converged at iteration 31 with logLik: -52.30002
converged at iteration 32 with logLik: -51.04047
converged at iteration 42 with logLik: -52.30002
converged at iteration 48 with logLik: -52.30002
converged at iteration 32 with logLik: -52.30002
converged at iteration 41 with logLik: -52.30002
converged at iteration 25 with logLik: -48.60721
converged at iteration 88 with logLik: -53.46158
converged at iteration 29 with logLik: -48.60721
converged at iteration 31 with logLik: -52.30002
converged at iteration 76 with logLik: -53.46158
converged at iteration 35 with logLik: -51.04047
This transition matrix is a little trickier to interpret, but the likelihood of transitioning to state three was low, ~7% for state one and 0% of transitioning from 2 to 3. This may not be a great specification.
Code
# plot(ts(posterior(best_model1, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",# main="Posterior probability of state 1 (copepods good?).",# frame=FALSE)prstates <-apply(posterior(best_model2)[,c("S1", "S2")],1, which.max)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])#predicted valuespred2 <- pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred2, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 2 - Three States", x ="Year", y ="Value", color ="Variable") +theme_minimal()
Compared to model 1, model 2 produces a poor visual fit to the data, with a narrower peak later in the timeseries. Because of this we assumed 2 states in the other models we tested.
Model 3 Copepods and PDO
Because Copepods and PDO have such a clear relationship visually, we started by looking at a model considering both copepod richness and PDO as intercepts, with no covariates informing the transition.
Code
#Copepods and PDO as intercepts mod3 =depmix(list(CopeRich ~1, PDO~1),nstates =2, transition =~1, family =list(gaussian(),gaussian()),data=dat)fitmod3 =fit(mod3)
converged at iteration 29 with logLik: -110.5438
Code
#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best3 <-1e10best_model3 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod3 <-fit(mod3)if(AIC(fitmod3)< best3){ best_model3 <- fitmod3 best3 <-AIC(fitmod3) }}
converged at iteration 29 with logLik: -110.5438
converged at iteration 24 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 30 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 31 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 24 with logLik: -110.5438
converged at iteration 23 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 23 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 32 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 24 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 24 with logLik: -110.5438
converged at iteration 18 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 30 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 24 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 26 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 24 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 31 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 24 with logLik: -110.5438
converged at iteration 30 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 23 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 27 with logLik: -110.5438
converged at iteration 25 with logLik: -110.5438
converged at iteration 28 with logLik: -110.5438
converged at iteration 29 with logLik: -110.5438
This model still preferred state one, with a ~75% probability of staying in one and ~25% chance of transitioning to two. There was a ~61% change of state two to stay in state two and a ~39% chance to transition back to state one. Note there are values for the intercepts and sd for Re1 (coperich) and Re2 (PDO) in this model.
Code
plot(ts(posterior(best_model3, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates],"PDO"= mu3[prstates])pred3<-pred %>%pivot_longer(-Year, names_to ="Covariates", values_to ="Fit")ggplot() +geom_line(data = pred3, aes(x = Year, y = Fit, group = Covariates, color = Covariates)) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), linewidth =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO"), linewidth =1) +scale_color_manual(values =c("orange", "lightgreen", "blue")) +# Assign distinct colors for each line in pred3labs(title ="Model 3 - Transition ~ 1", x ="Year", y ="Value", color ="Variable") +theme_minimal()
Model 4 Copepods and PDO with transition informed by PDO
Next we wanted to look at copepod richness and PDO with the transition informed by the PDO.
Code
#Copepods and PDO as intercepts where the transition is informed by PDOmod4 =depmix(list(CopeRich ~1, PDO~1),nstates =2, transition =~PDO, family =list(gaussian(),gaussian()),data=dat)fitmod4 =fit(mod4)
converged at iteration 15 with logLik: -109.5888
Code
#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best4 <-1e10best_model4 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod4 <-fit(mod4)if(AIC(fitmod4)< best4){ best_model4 <- fitmod4 best4 <-AIC(fitmod4) }}
converged at iteration 16 with logLik: -109.5888
converged at iteration 13 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 8 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 22 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 11 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 13 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 20 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 13 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 11 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 13 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 19 with logLik: -109.5888
converged at iteration 7 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 10 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 11 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 11 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 18 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 17 with logLik: -109.5888
converged at iteration 10 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 24 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 15 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 19 with logLik: -109.5888
converged at iteration 14 with logLik: -109.5888
converged at iteration 9 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 19 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
converged at iteration 13 with logLik: -109.5888
converged at iteration 16 with logLik: -109.5888
Code
summary(best_model4)
Initial state probabilities model
pr1 pr2
0 1
Transition model for state (component) 1
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 -1.29578981
PDO 0 -0.07649563
Probalities at zero values of the covariates.
0.7851256 0.2148744
Transition model for state (component) 2
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 -1.1892398
PDO 0 0.4097314
Probalities at zero values of the covariates.
0.7666051 0.2333949
Response parameters
Resp 1 : gaussian
Resp 2 : gaussian
Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1 -1.438 1.019 -1.803 2.325
St2 3.650 2.414 4.168 2.628
This transition matrix isn’t interpretable as is, but we’ll come back to that later. One question is whether or not this is appropriate, to model PDO as an intercept and use it also as a covariate. Is this confounding? Poor practice?
Code
plot(ts(posterior(best_model4, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model4)[,1]
Initial state probabilities model
pr1 pr2
0 1
Transition model for state (component) 1
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 -1.29578981
PDO 0 -0.07649563
Probalities at zero values of the covariates.
0.7851256 0.2148744
Transition model for state (component) 2
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 -1.1892398
PDO 0 0.4097314
Probalities at zero values of the covariates.
0.7666051 0.2333949
Response parameters
Resp 1 : gaussian
Resp 2 : gaussian
Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1 -1.438 1.019 -1.803 2.325
St2 3.650 2.414 4.168 2.628
Code
mu4<-summary(best_model4)[,3]
Initial state probabilities model
pr1 pr2
0 1
Transition model for state (component) 1
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 -1.29578981
PDO 0 -0.07649563
Probalities at zero values of the covariates.
0.7851256 0.2148744
Transition model for state (component) 2
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 -1.1892398
PDO 0 0.4097314
Probalities at zero values of the covariates.
0.7666051 0.2333949
Response parameters
Resp 1 : gaussian
Resp 2 : gaussian
Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1 -1.438 1.019 -1.803 2.325
St2 3.650 2.414 4.168 2.628
Code
pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates],"PDO"= mu4[prstates])pred4<-pred %>%pivot_longer(-Year, names_to ="Covariates", values_to ="Fit")ggplot() +geom_line(data = pred4, aes(x = Year, y = Fit, group = Covariates, color = Covariates)) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), linewidth =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO"), linewidth =1) +scale_color_manual(values =c("orange", "lightgreen", "blue")) +# Assign distinct colors for each line in pred3labs(title =" Model 4 - Transition ~ PDO", x ="Year", y ="Value", color ="Variable") +theme_minimal()
Models 3 and 4 produce similar fits to the data, the only obvious difference is that model 4 spends less time in the higher state than model 3.
Model 5 Copepods informed by PDO
In this model, we wanted to relate copepod richness to PDO directly, with no transtion.
Code
#Copepods as they relate to PDOmod5 =depmix(list(CopeRich ~PDO),nstates =2, transition =~1, family =list(gaussian()),data=dat)fitmod5 =fit(mod5)
converged at iteration 43 with logLik: -41.20702
Code
#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best5 <-1e10best_model5 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod5 <-fit(mod5)if(AIC(fitmod5)< best5){ best_model5 <- fitmod5 best5 <-AIC(fitmod5) }}
converged at iteration 38 with logLik: -41.20702
converged at iteration 44 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 46 with logLik: -41.20702
converged at iteration 23 with logLik: -39.88307
converged at iteration 38 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 43 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 30 with logLik: -39.88307
converged at iteration 41 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 43 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 22 with logLik: -39.88307
converged at iteration 43 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 39 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 49 with logLik: -41.20702
converged at iteration 46 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 43 with logLik: -43.79142
converged at iteration 28 with logLik: -39.88307
converged at iteration 42 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 43 with logLik: -43.79142
converged at iteration 41 with logLik: -41.20702
converged at iteration 44 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 39 with logLik: -41.20702
converged at iteration 38 with logLik: -41.20702
converged at iteration 25 with logLik: -39.88307
converged at iteration 40 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 26 with logLik: -39.88307
converged at iteration 42 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 48 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 28 with logLik: -44.05687
converged at iteration 41 with logLik: -41.20702
converged at iteration 22 with logLik: -39.88307
converged at iteration 43 with logLik: -41.20702
converged at iteration 43 with logLik: -41.20702
converged at iteration 34 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 47 with logLik: -41.20702
converged at iteration 43 with logLik: -41.20702
converged at iteration 22 with logLik: -39.88307
converged at iteration 45 with logLik: -41.20702
converged at iteration 44 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 51 with logLik: -43.79142
converged at iteration 40 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 44 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 43 with logLik: -41.20702
converged at iteration 51 with logLik: -41.20702
converged at iteration 39 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 21 with logLik: -39.88307
converged at iteration 51 with logLik: -41.20702
converged at iteration 48 with logLik: -41.20702
converged at iteration 44 with logLik: -41.20702
converged at iteration 40 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 24 with logLik: -39.88307
converged at iteration 41 with logLik: -41.20702
converged at iteration 42 with logLik: -41.20702
converged at iteration 39 with logLik: -43.79142
converged at iteration 21 with logLik: -39.88307
converged at iteration 43 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 49 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 41 with logLik: -41.20702
converged at iteration 46 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 45 with logLik: -41.20702
converged at iteration 20 with logLik: -39.88307
This model has an interpreatable transition matrix, where the probability of staying or leaving each respective state is about ~70-30. Note that we now have PDO as informing our copepod richness in the response parameters.
Code
plot(ts(posterior(best_model5, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])pred5<-pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred5, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 5 - Copepod Richness ~ PDO , Transition ~1", x ="Year", y ="Value", color ="Variable") +theme_minimal()
Copepod richness and PDO data have the closest correlation, and it seemed reasonable to relate them to each other and model the variation of Copepod Richness as a function of PDO. Admittedly, we are not sure if this is a appropriate entry to the depmix() function but the results for models 5 and 6 were interesting so we included them.
Model 6 Copepods informed by PDO with PDO informing transition
This model is the same as model 5, but using PDO as a covariate for transition, which could be inappropriate for hmm model specification. But, we wanted to explore it.
For the intercept we chose to model Copepod Richness ~ PDO. These data sets have the closest correlation and so this was done to test of the fit could be improved by modeling the variation of Copepod Richness as a function of PDO. Admittedly, we are not sure if this is a appropriate entry to the depmix() function but the results for models 5 and 6 were interesting so we included them.
Code
#Copepods as they relate to PDO where the transition is informed by PDOmod6 =depmix(list(CopeRich ~PDO),nstates =2, transition =~PDO, family =list(gaussian()),data=dat)fitmod6 =fit(mod6)
converged at iteration 26 with logLik: -36.69908
Code
#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best6 <-1e10best_model6 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod6 <-fit(mod6)if(AIC(fitmod6)< best6){ best_model6 <- fitmod6 best6 <-AIC(fitmod6) }}
converged at iteration 27 with logLik: -36.69921
converged at iteration 30 with logLik: -36.69921
converged at iteration 28 with logLik: -36.69923
converged at iteration 31 with logLik: -36.6992
converged at iteration 29 with logLik: -36.69921
converged at iteration 26 with logLik: -36.69944
converged at iteration 27 with logLik: -36.69907
converged at iteration 30 with logLik: -36.69923
converged at iteration 30 with logLik: -36.69923
converged at iteration 31 with logLik: -36.69907
converged at iteration 28 with logLik: -36.6992
converged at iteration 28 with logLik: -36.69907
converged at iteration 30 with logLik: -36.69921
converged at iteration 30 with logLik: -36.69907
converged at iteration 29 with logLik: -36.69943
converged at iteration 16 with logLik: -38.06323
converged at iteration 30 with logLik: -36.69908
converged at iteration 28 with logLik: -36.69922
converged at iteration 32 with logLik: -36.69923
converged at iteration 31 with logLik: -36.69908
converged at iteration 29 with logLik: -36.69922
converged at iteration 29 with logLik: -36.69908
converged at iteration 27 with logLik: -36.69918
converged at iteration 28 with logLik: -36.69922
converged at iteration 30 with logLik: -36.69922
converged at iteration 27 with logLik: -36.6992
converged at iteration 29 with logLik: -36.69921
converged at iteration 27 with logLik: -36.69922
converged at iteration 33 with logLik: -36.69921
converged at iteration 34 with logLik: -36.69907
converged at iteration 27 with logLik: -36.69907
converged at iteration 13 with logLik: -38.06323
converged at iteration 31 with logLik: -36.69921
converged at iteration 28 with logLik: -36.69923
converged at iteration 25 with logLik: -36.6992
converged at iteration 29 with logLik: -36.69907
converged at iteration 26 with logLik: -36.69922
converged at iteration 26 with logLik: -36.6992
converged at iteration 26 with logLik: -36.69912
converged at iteration 29 with logLik: -36.69923
converged at iteration 26 with logLik: -36.69944
converged at iteration 26 with logLik: -36.69922
converged at iteration 28 with logLik: -36.69908
converged at iteration 26 with logLik: -36.69922
converged at iteration 26 with logLik: -36.69912
converged at iteration 34 with logLik: -36.69909
converged at iteration 64 with logLik: -41.46754
converged at iteration 31 with logLik: -36.69907
converged at iteration 29 with logLik: -36.69919
converged at iteration 28 with logLik: -36.69921
converged at iteration 28 with logLik: -36.69907
converged at iteration 26 with logLik: -36.69908
converged at iteration 29 with logLik: -36.6992
converged at iteration 16 with logLik: -38.06323
converged at iteration 26 with logLik: -36.69921
converged at iteration 28 with logLik: -36.69921
converged at iteration 30 with logLik: -36.69907
converged at iteration 29 with logLik: -36.69907
converged at iteration 31 with logLik: -36.6992
converged at iteration 32 with logLik: -36.6992
converged at iteration 27 with logLik: -36.69921
converged at iteration 29 with logLik: -36.69908
converged at iteration 27 with logLik: -36.6992
converged at iteration 27 with logLik: -36.6992
converged at iteration 28 with logLik: -36.6992
converged at iteration 27 with logLik: -36.69907
converged at iteration 27 with logLik: -36.69908
converged at iteration 28 with logLik: -36.6992
converged at iteration 30 with logLik: -36.69907
converged at iteration 28 with logLik: -36.69908
converged at iteration 28 with logLik: -36.6992
converged at iteration 26 with logLik: -36.6992
converged at iteration 27 with logLik: -36.69921
converged at iteration 26 with logLik: -36.69923
converged at iteration 28 with logLik: -36.69921
converged at iteration 29 with logLik: -36.69923
converged at iteration 28 with logLik: -36.69919
converged at iteration 29 with logLik: -36.6992
converged at iteration 28 with logLik: -36.69907
converged at iteration 28 with logLik: -36.69944
converged at iteration 29 with logLik: -36.69907
converged at iteration 27 with logLik: -36.69922
converged at iteration 28 with logLik: -36.69921
converged at iteration 31 with logLik: -36.69922
converged at iteration 29 with logLik: -36.69907
converged at iteration 32 with logLik: -36.6992
converged at iteration 16 with logLik: -38.06323
converged at iteration 27 with logLik: -36.69907
converged at iteration 28 with logLik: -36.69921
converged at iteration 25 with logLik: -36.69919
converged at iteration 27 with logLik: -36.69913
converged at iteration 31 with logLik: -36.69922
converged at iteration 28 with logLik: -36.69922
converged at iteration 28 with logLik: -36.69921
converged at iteration 31 with logLik: -36.69912
converged at iteration 31 with logLik: -36.69921
converged at iteration 27 with logLik: -36.69922
converged at iteration 30 with logLik: -36.69923
converged at iteration 27 with logLik: -36.69944
converged at iteration 26 with logLik: -36.69921
Code
summary(best_model6)
Initial state probabilities model
pr1 pr2
0 1
Transition model for state (component) 1
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 12.511030
PDO 0 -0.150226
Probalities at zero values of the covariates.
3.685761e-06 0.9999963
Transition model for state (component) 2
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 6.846489
PDO 0 4.202840
Probalities at zero values of the covariates.
0.001062053 0.9989379
Response parameters
Resp 1 : gaussian
Re1.(Intercept) Re1.PDO Re1.sd
St1 -0.220 0.199 0.704
St2 0.136 0.892 1.151
Again, given that we included PDO as a transition covariate, the matrix was not interpretable as is in the model summary, but we address this in the More Model Diagnostics Section.
Code
plot(ts(posterior(best_model6, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)
Warning in .local(object, ...): Argument 'type' not specified and will default
to 'viterbi'. This default may change in future releases of depmixS4. Please
see ?posterior for alternative options.
Code
mu<-summary(best_model6)[,1]
Initial state probabilities model
pr1 pr2
0 1
Transition model for state (component) 1
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 12.511030
PDO 0 -0.150226
Probalities at zero values of the covariates.
3.685761e-06 0.9999963
Transition model for state (component) 2
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 6.846489
PDO 0 4.202840
Probalities at zero values of the covariates.
0.001062053 0.9989379
Response parameters
Resp 1 : gaussian
Re1.(Intercept) Re1.PDO Re1.sd
St1 -0.220 0.199 0.704
St2 0.136 0.892 1.151
Code
pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])pred6<-pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred6, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 6 - Copepod Richness ~ PDO, Transition ~ PDO", x ="Year", y ="Value", color ="Variable") +theme_minimal()
Visually, model 6 produced the worst fit to the data. This model suggests there is a dominant state near the mean of the data and the second state is only marginally different. The dampening of the oscillation must be from the transition being informed directly by PDO.
Results
Summarize the model you’ve created. Specifically,
Does it converge?
All the models converged even when running the for loop over various seeds, however visual inspection of model fits revealed that several fits were quite poor, including our top model ranked by AICc.
How many states seem to be most supported?
Compared to model 1, model two with 3 states produced a poor visual fit to the data. Because of this we assumed 2 states in the other models we tested. This aligned with our visual estimates of two states.
Plot the time series of estimated states. What does this mean?
The plots of estimated states show the underlying pattern that may exist if there is in fact a shift between states. The variation in the data from this pattern would be related to unexplained error related to covariates other than the underlying states. Visually there appears to be good support for two underlying states, however, as shown below, the estimated states in our top AICc ranked model do not appear to fit the data well, indicating that AICc may not be the best metric to use here. It definitely highlights the dangers of blindly trusting AIC without further model checks.
AIC tells us that the best model would be model 6, although it is just marginally better than model 5 with a delta AIC of just over 2. The delta AIC values are only about 2 different between models 1-2 and 3-4 and 5-6 respectively. Each set of models are the most similar to each other within the models tested and these AIC values indicate little to no support for one approach within each group to be significantly better than the other. AIC values do indicate significantly better performance between these sets of models, however this support is not supported by looking at visual fits of the data. The two best models, 5 and 6, produce the worst visual fits to the data of any models tested. It is unclear why AIC is selecting these models as the best, but it does seem that AIC favors the simple models with only one intercept rather than models with two intercepts. Because of the apparent poor fit of models 5 and 6, it seems other methods of model selection such as Cross Validation or leave One Out may be more appropriate here than AIC.
More Model Diagnostics
The following function allows us to interpret the transition matrix when there is a covariate informing the transition.
Code
#create a funciton to help build transition matricesconvert_probs <-function(fit, coef_value =0) {# extract coefficients p1 <- fit@transition[[1]]@parameters$coefficients p2 <- fit@transition[[2]]@parameters$coefficients# assemble transition matrix in mlogit space m <-matrix(0, 2, 2) m[1,] <- p1[1,] + p1[2,] * coef_value #intercept + slope * coefficient m[2,] <- p2[1,] + p2[2,] * coef_value# exponentiate m <-exp(m)# normalize m[1,] <- m[1,] /sum(m[1,]) m[2,] <- m[2,] /sum(m[2,])return(m)}
Let’s look at all the models together. Note that the above function was applied to models 4 and 6, which did not have an interpretable transition matrix.
Code
#Look at the transition matrices and variancesummary(fitmod1)
Initial state probabilities model
pr1 pr2
1 0
Transition model for state (component) 1
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 1.1898714
PDO 0 -0.4098372
Probalities at zero values of the covariates.
0.2332819 0.7667181
Transition model for state (component) 2
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 1.29650280
PDO 0 0.07664089
Probalities at zero values of the covariates.
0.2147542 0.7852458
Response parameters
Resp 1 : gaussian
Resp 2 : gaussian
Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd
St1 3.650 2.414 4.168 2.628
St2 -1.438 1.019 -1.803 2.325
Initial state probabilities model
pr1 pr2
0 1
Transition model for state (component) 1
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 10.22711258
PDO 0 -0.03208095
Probalities at zero values of the covariates.
3.617477e-05 0.9999638
Transition model for state (component) 2
Model of type multinomial (mlogit), formula: ~PDO
Coefficients:
St1 St2
(Intercept) 0 6.846492
PDO 0 4.202830
Probalities at zero values of the covariates.
0.00106205 0.9989379
Response parameters
Resp 1 : gaussian
Re1.(Intercept) Re1.PDO Re1.sd
St1 -0.220 0.199 0.704
St2 0.136 0.892 1.151
Code
#get the transition matricesconvert_probs(fitmod4)
State 3 in model 2 appears to be fairly unstable as its state 3 to state 3 transition probability is 0, and it has a fairly low probability of other states transitioning to state 3, outside of state 2 to state 3. Our best model run did not generate any observations in state 3 so it is unclear if this is a middle state as we hypothesized or not.
The state 1 transition probabilities in models 3 and 4 showed an opposite pattern, yet produced very similar fits to the data. The difference appeared to only be shown in the most recent years, where in model 4 there was a more rapid shift out of the higher state. This aligns with what we would expect with the change in transition probabilities from state 1 to state 2.
Model 5 appeared to have smaller intercept values which makes sense as the predicted states were closer to 0 than in the other models.
The transition matrix for model 6 was the most unique with a high likelihood that the model will transition to and stay in state 2. This is another good indication that the model is a poor fit to the data as the data visually appears to have at least two states, and the other models produced good visual fits with a two state approach.
Discussion
PDO is an index of ocean conditions that is generally thought to act as a driver for abundance and richness of organisms in the North Pacific. Here we predicted that PDO would be a driver of copepod richness. The two time series of data appear to closely co-vary and seem to show a pattern of a high and low state with rapid swings between the two. Our models using 2 states produced good visual fits and we found little support for three states. AIC appears to be a relatively poor choice for model selection in this instance although it is unclear if that is specific to HMM models or if there was a programming error. Other types of model selection would likely increase our understanding of model performance. We saw no obvious difference between a rapid shift in states , transition = ~1 and a shift proportional to the PDO data set. This suggests that there is in fact a very rapid shift between states, which aligns with our expected result based on the general lack of observations near neutral conditions.
What we’re still confused about
One question we had is that our ‘best’ model based on AIC values did not appear to be a good fit to the data visually. When this is the case, what should our next steps be? Visually confirming model fit seems to overrule AIC values when those diagnostics are contradictory, is this true?
We didn’t have a good handle on how the ‘nstates’ and ‘transitions’ inputs to the models interacted and what type of model diagnostics would help us quantify underlying relationships. After we were in pretty deep we realized having PDO in both the model list (related to copepod richness or as an independent intercept) and as a covarate for transition may be inappropriate, but are unsure and would like more information.
PDO has a very strong correlative pattern with copepod richness. We ran several combinations of model to examine the strength of this relationship. We produced some models with decent looking fits, but these models were not as highly ranked by AIC as other, seemingly less well fitting models. Is AIC a poor choice? We have a suspicion this question will be answered in the next lecture(s)…
Team contributions
Madison took point on coding, Nick helped with coding and compiled it into the final rmd, and Nick and Karl did writing and Madison proofread and organized the .Rmd document.
---title: "Team 3 - Lab 4"subtitle: "Hidden Markov Models"author: "Nick Chambers, Madison Shipley, Karl Veggerby"date: May 9, 2023output: html_document: code-folding: true toc: true toc_float: true---```{r setup, include = FALSE}options(dplyr.summarise.inform = FALSE)library(depmixS4)library(tidyverse)library(kableExtra)library(hmmTMB)library(depmixS4)```# Data set In this lab, we're looking at ecological indicators and exploring hidden Markov models and probabilities of shifting between underlying states. ## Load the dataWe explored the `stoplight.csv` dataset for this week. One of the columns divides indicators into groups (e.g. Climate, Local Physical, Local Biological, etc) and we wanted to look at some biological indicators and climate/physical indicators. ```{r load_data}stoplight <- read.csv(here::here("Lab-4", "stoplight.csv"))head(stoplight)```## Wrangle the dataWe initially selected the Pacific Decadal Osculation (PDO) for the Dec/March period, sea surface temperature (SST), and copepod richness and ichthy communitiy index, which we assumed was comperable to copepod richness (though maybe not).```{r}#ecosystem indicatorsstoplight[,1] years<-seq(1998,2021,1)#ecosystem indicators to pullPDO_DecMarch<-unlist(as.vector(stoplight[1,3:26])) SST<-unlist(as.vector(stoplight[4,3:26])) Copepod_richness<-unlist(as.vector(stoplight[9,3:26])) Ichthy_community_index<-unlist(as.vector(stoplight[14,3:26]))dat<-matrix(nrow=length(years), ncol=5)dat[,1]<-yearsdat[,2]<-PDO_DecMarchdat[,3]<-SSTdat[,4]<-Copepod_richnessdat[,5]<-Ichthy_community_indexcolnames(dat)<-c("years","PDO", "SST", "CopeRich", "FishInx")rownames(dat)<-seq(1,24,1)dat<-as.data.frame(dat)head(dat)```## Data VisualizationThe first rule of science is to plot your data! We started with looking at all our data together. ```{r}# Plotting allggplot(dat, aes(x = years)) +geom_line(aes(y = PDO, color ="PDO"), linewidth =1) +geom_line(aes(y = SST, color ="SST"), linewidth =1) +geom_line(aes(y = CopeRich, color ="CopeRich"), linewidth =1) +geom_line(aes(y = FishInx, color ="FishInx"), linewidth =1) +scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green", FishInx ="purple")) +labs(x ="Year", y ="Value", color ="Variable") +theme_minimal()```Sea surface temperature stats pretty consistent, while the fish index had a significant range, possibly following PDO if you squint, but was not immediatly compelling. We narrowed it down and decided to focus on copepod richness and PDO, which seemed to be incredibly correlated, and plotted the time series and their relationship to each other. In a positive PDO index, there is a positive correlation with copepod index. ```{r}# Plotting Coperich and env covggplot(dat, aes(x = years)) +geom_line(aes(y = PDO, color ="PDO"), size =1) +geom_line(aes(y = CopeRich, color ="CopeRich"), size =1) +scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green", FishInx ="purple")) +labs(x ="Year", y ="Value", color ="Variable") +theme_minimal()#CopeRichness tracks very closely with PDO#SST seems to have a negligible correlation ggplot(dat, aes(x = PDO, y = CopeRich, color = CopeRich <0)) +geom_point() +geom_line(aes(x = PDO, y = CopeRich)) +geom_hline(yintercept =0, linetype ="dotted") +geom_vline(xintercept =0, linetype ="dotted") +scale_color_manual(values =c("red", "blue"), labels =c("Above 0", "Below 0")) +labs(x ="PDO", y ="CopeRich") +theme_minimal()#fish index is all over the place, let's not use it. ```We also wanted to think about our possible hidden states, and while there seems to be a clear two-state sytem with PDO in a negative or positive index, we wondered if there was a third state "weak" index. ```{r}# PDO states ggplot(dat, aes(x = years)) +geom_line(aes(y = PDO, color ="PDO"), size =1) +scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green", FishInx ="purple")) +labs(x ="Year", y ="Value", color ="Variable") +geom_hline(yintercept =0, linetype ="dotted") +geom_hline(yintercept =2.5, linetype ="dotted") +geom_hline(yintercept =-2.5, linetype ="dotted") +theme_minimal()```### Objective tips Our primary objectives for this lab were to gain a better understanding of hmm options and getting some practice with interpretation by exploring different model options considering copepod richness and PDO. While there was much more we could have done, we decided to explore model specification, and interpretation of AIC for these types of models, which can be misleading depending on the ecological assumptions being made. # General tasks* Please pick a type of indicators, and develop a 2- or 3-state multivariate HMM. # MethodsWe fit models with different either two or three states to determine whether copepod richness and/or PDO seems to have 2 or 3 states. It's unclear if there is a 'neutral' state that modulates copepod richness in addition to the hot and cool phases. We also explored the transition option, by assuming either that there is no covariate informing the transition, or that PDO informed the transition.We assumed the familys were gaussian for all models tested. We plotted the fit of all models to copepod richness and PDO data from the stoplight dataset. We then also compared model fits with AIC in an attempt to select the best model. Note that in the report we give probabilities of transitioning to and from states, but these models are very sensitive to starting conditions, so we iterate over many set seeds for each model to find the best fit. That being said, some of the values written in this report may be different upon knitting. ## Model 1 Copepods only: 2 states We started with a very simple model, with copepod richness as the intercept and no covariates informing the transition. We started by assuming a two state system. ```{r}# Model 1mod1 =depmix(CopeRich ~1,nstates =2, transition =~1, family =gaussian(),data=dat)fitmod1 =fit(mod1)#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best1 <-1e10best_model1 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod1 <-fit(mod1)if(AIC(fitmod1)< best1){ best_model1 <- fitmod1 best1 <-AIC(fitmod1) }}summary(best_model1)```We see that there is a ~75% probability of state one staying in state one, and ~60% of state two staying in state two, with ~25% and 40% probability for transitioning from state 1 to state 2 and state 2 to state 1 respectively, showing that the system generally is more stable in state one.```{r}plot(ts(posterior(best_model1, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior Probability of Changing States",frame=FALSE)prstates <-apply(posterior(best_model1)[,c("S1", "S2")],1, which.max)mu <-summary(best_model1)[,1]pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])#predicted valuespred1<-pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred1, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 1 - Two States", x ="Year", y ="Value", color ="Variable") +theme_minimal()```Model 1 produces a reasonably good fit to the data. ## Model 2 Copepods only: 3 states We continued by using the same simple model from above, and assuming 3 states. ```{r}# Model 2mod2 =depmix(CopeRich ~1,nstates =3, transition =~1, family =gaussian(),data=dat)fitmod2 =fit(mod2)#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best2 <-1e10best_model2 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod2 <-fit(mod2)if(AIC(fitmod2)< best2){ best_model2 <- fitmod2 best2 <-AIC(fitmod2) }}summary(best_model2)```This transition matrix is a little trickier to interpret, but the likelihood of transitioning to state three was low, ~7% for state one and 0% of transitioning from 2 to 3. This may not be a great specification. ```{r}# plot(ts(posterior(best_model1, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",# main="Posterior probability of state 1 (copepods good?).",# frame=FALSE)prstates <-apply(posterior(best_model2)[,c("S1", "S2")],1, which.max)mu <-summary(best_model2)[,1]pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])#predicted valuespred2 <- pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred2, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 2 - Three States", x ="Year", y ="Value", color ="Variable") +theme_minimal()```Compared to model 1, model 2 produces a poor visual fit to the data, with a narrower peak later in the timeseries. Because of this we assumed 2 states in the other models we tested. ## Model 3 Copepods and PDO Because Copepods and PDO have such a clear relationship visually, we started by looking at a model considering both copepod richness and PDO as intercepts, with no covariates informing the transition.```{r}#Copepods and PDO as intercepts mod3 =depmix(list(CopeRich ~1, PDO~1),nstates =2, transition =~1, family =list(gaussian(),gaussian()),data=dat)fitmod3 =fit(mod3)#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best3 <-1e10best_model3 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod3 <-fit(mod3)if(AIC(fitmod3)< best3){ best_model3 <- fitmod3 best3 <-AIC(fitmod3) }}summary(best_model3)```This model still preferred state one, with a ~75% probability of staying in one and ~25% chance of transitioning to two. There was a ~61% change of state two to stay in state two and a ~39% chance to transition back to state one. Note there are values for the intercepts and sd for Re1 (coperich) and Re2 (PDO) in this model. ```{r}plot(ts(posterior(best_model3, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)prstates<-apply(posterior(best_model3)[,c("S1", "S2")],1, which.max)mu<-summary(best_model3)[,1]mu3<-summary(best_model3)[,3]pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates],"PDO"= mu3[prstates])pred3<-pred %>%pivot_longer(-Year, names_to ="Covariates", values_to ="Fit")ggplot() +geom_line(data = pred3, aes(x = Year, y = Fit, group = Covariates, color = Covariates)) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), linewidth =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO"), linewidth =1) +scale_color_manual(values =c("orange", "lightgreen", "blue")) +# Assign distinct colors for each line in pred3labs(title ="Model 3 - Transition ~ 1", x ="Year", y ="Value", color ="Variable") +theme_minimal()```## Model 4 Copepods and PDO with transition informed by PDO Next we wanted to look at copepod richness and PDO with the transition informed by the PDO.```{r}#Copepods and PDO as intercepts where the transition is informed by PDOmod4 =depmix(list(CopeRich ~1, PDO~1),nstates =2, transition =~PDO, family =list(gaussian(),gaussian()),data=dat)fitmod4 =fit(mod4)#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best4 <-1e10best_model4 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod4 <-fit(mod4)if(AIC(fitmod4)< best4){ best_model4 <- fitmod4 best4 <-AIC(fitmod4) }}summary(best_model4)```This transition matrix isn't interpretable as is, but we'll come back to that later. One question is whether or not this is appropriate, to model PDO as an intercept and use it also as a covariate. Is this confounding? Poor practice? ```{r}plot(ts(posterior(best_model4, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)prstates<-apply(posterior(best_model4)[,c("S1", "S2")],1, which.max)mu<-summary(best_model4)[,1]mu4<-summary(best_model4)[,3]pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates],"PDO"= mu4[prstates])pred4<-pred %>%pivot_longer(-Year, names_to ="Covariates", values_to ="Fit")ggplot() +geom_line(data = pred4, aes(x = Year, y = Fit, group = Covariates, color = Covariates)) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), linewidth =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO"), linewidth =1) +scale_color_manual(values =c("orange", "lightgreen", "blue")) +# Assign distinct colors for each line in pred3labs(title =" Model 4 - Transition ~ PDO", x ="Year", y ="Value", color ="Variable") +theme_minimal()```Models 3 and 4 produce similar fits to the data, the only obvious difference is that model 4 spends less time in the higher state than model 3. ## Model 5 Copepods informed by PDO In this model, we wanted to relate copepod richness to PDO directly, with no transtion. ```{r}#Copepods as they relate to PDOmod5 =depmix(list(CopeRich ~PDO),nstates =2, transition =~1, family =list(gaussian()),data=dat)fitmod5 =fit(mod5)#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best5 <-1e10best_model5 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod5 <-fit(mod5)if(AIC(fitmod5)< best5){ best_model5 <- fitmod5 best5 <-AIC(fitmod5) }}summary(best_model5)```This model has an interpreatable transition matrix, where the probability of staying or leaving each respective state is about ~70-30. Note that we now have PDO as informing our copepod richness in the response parameters. ```{r}plot(ts(posterior(best_model5, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)prstates<-apply(posterior(best_model5)[,c("S1", "S2")],1, which.max)mu<-summary(best_model5)[,1]pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])pred5<-pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred5, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 5 - Copepod Richness ~ PDO , Transition ~1", x ="Year", y ="Value", color ="Variable") +theme_minimal()```Copepod richness and PDO data have the closest correlation, and it seemed reasonable to relate them to each other and model the variation of Copepod Richness as a function of PDO. Admittedly, we are not sure if this is a appropriate entry to the `depmix()` function but the results for models 5 and 6 were interesting so we included them. ## Model 6 Copepods informed by PDO with PDO informing transitionThis model is the same as model 5, but using PDO as a covariate for transition, which could be inappropriate for hmm model specification. But, we wanted to explore it. For the intercept we chose to model Copepod Richness ~ PDO. These data sets have the closest correlation and so this was done to test of the fit could be improved by modeling the variation of Copepod Richness as a function of PDO. Admittedly, we are not sure if this is a appropriate entry to the `depmix()` function but the results for models 5 and 6 were interesting so we included them. ```{r}#Copepods as they relate to PDO where the transition is informed by PDOmod6 =depmix(list(CopeRich ~PDO),nstates =2, transition =~PDO, family =list(gaussian()),data=dat)fitmod6 =fit(mod6)#Find best fit iter <-100seeds<-sample(100:1000, size = iter, replace = F)best6 <-1e10best_model6 <-NAfor(i in1:iter){set.seed(seeds[i]) fitmod6 <-fit(mod6)if(AIC(fitmod6)< best6){ best_model6 <- fitmod6 best6 <-AIC(fitmod6) }}summary(best_model6)```Again, given that we included PDO as a transition covariate, the matrix was not interpretable as is in the model summary, but we address this in the More Model Diagnostics Section.```{r}plot(ts(posterior(best_model6, type="smoothing")[,1], start=c(2003,5), deltat=1/12),ylab="probability",main="Posterior probability of state 1 (copepods good?).",frame=FALSE)prstates<-apply(posterior(best_model6)[,c("S1", "S2")],1, which.max)mu<-summary(best_model6)[,1]pred <-tibble("Year"= dat$years, "Copepod Richness"= mu[prstates])pred6<-pred %>%pivot_longer(-Year, names_to ="Copepod Richness", values_to ="Fit")ggplot() +geom_line(data = pred6, aes(x = years, y = Fit, color ="predicted_states"), size =1) +geom_line(data = dat, aes(x = years, y = CopeRich, color ="CopeRich"), size =1) +geom_line(data = dat, aes(x = years, y = PDO_DecMarch, color ="PDO" ), size =1)+scale_color_manual(values =c(PDO ="blue", SST ="red", CopeRich ="green",FishInx ="purple", predicted_states ="orange")) +labs(title =" Model 6 - Copepod Richness ~ PDO, Transition ~ PDO", x ="Year", y ="Value", color ="Variable") +theme_minimal()```Visually, model 6 produced the worst fit to the data. This model suggests there is a dominant state near the mean of the data and the second state is only marginally different. The dampening of the oscillation must be from the transition being informed directly by PDO. # ResultsSummarize the model you've created. Specifically,- Does it converge?All the models converged even when running the for loop over various seeds, however visual inspection of model fits revealed that several fits were quite poor, including our top model ranked by AICc.- How many states seem to be most supported?Compared to model 1, model two with 3 states produced a poor visual fit to the data. Because of this we assumed 2 states in the other models we tested. This aligned with our visual estimates of two states.- Plot the time series of estimated states. What does this mean?The plots of estimated states show the underlying pattern that may exist if there is in fact a shift between states. The variation in the data from this pattern would be related to unexplained error related to covariates other than the underlying states. Visually there appears to be good support for two underlying states, however, as shown below, the estimated states in our top AICc ranked model do not appear to fit the data well, indicating that AICc may not be the best metric to use here. It definitely highlights the dangers of blindly trusting AIC without further model checks.## Model Selection with AIC```{r}mods <-c("1","2","3","4","5","6")aic <-c(AIC(best_model1), AIC(best_model2), AIC(best_model3), AIC(best_model4), AIC(best_model5),AIC(best_model6))delta.aic <- aic-min(aic)tab <-cbind.data.frame(mods, aic, delta.aic)tab```AIC tells us that the best model would be model 6, although it is just marginally better than model 5 with a delta AIC of just over 2. The delta AIC values are only about 2 different between models 1-2 and 3-4 and 5-6 respectively. Each set of models are the most similar to each other within the models tested and these AIC values indicate little to no support for one approach within each group to be significantly better than the other. AIC values do indicate significantly better performance between these sets of models, however this support is not supported by looking at visual fits of the data. The two best models, 5 and 6, produce the worst visual fits to the data of any models tested. It is unclear why AIC is selecting these models as the best, but it does seem that AIC favors the simple models with only one intercept rather than models with two intercepts. Because of the apparent poor fit of models 5 and 6, it seems other methods of model selection such as Cross Validation or leave One Out may be more appropriate here than AIC. ## More Model DiagnosticsThe following function allows us to interpret the transition matrix when there is a covariate informing the transition. ```{r}#create a funciton to help build transition matricesconvert_probs <-function(fit, coef_value =0) {# extract coefficients p1 <- fit@transition[[1]]@parameters$coefficients p2 <- fit@transition[[2]]@parameters$coefficients# assemble transition matrix in mlogit space m <-matrix(0, 2, 2) m[1,] <- p1[1,] + p1[2,] * coef_value #intercept + slope * coefficient m[2,] <- p2[1,] + p2[2,] * coef_value# exponentiate m <-exp(m)# normalize m[1,] <- m[1,] /sum(m[1,]) m[2,] <- m[2,] /sum(m[2,])return(m)}```Let's look at all the models together. Note that the above function was applied to models 4 and 6, which did not have an interpretable transition matrix. ```{r}#Look at the transition matrices and variancesummary(fitmod1)summary(fitmod2)summary(fitmod3)summary(fitmod4) # no transition matrixsummary(fitmod5)summary(fitmod6) # no transition matrix#get the transition matricesconvert_probs(fitmod4)convert_probs(fitmod6)```State 3 in model 2 appears to be fairly unstable as its state 3 to state 3 transition probability is 0, and it has a fairly low probability of other states transitioning to state 3, outside of state 2 to state 3. Our best model run did not generate any observations in state 3 so it is unclear if this is a middle state as we hypothesized or not. The state 1 transition probabilities in models 3 and 4 showed an opposite pattern, yet produced very similar fits to the data. The difference appeared to only be shown in the most recent years, where in model 4 there was a more rapid shift out of the higher state. This aligns with what we would expect with the change in transition probabilities from state 1 to state 2. Model 5 appeared to have smaller intercept values which makes sense as the predicted states were closer to 0 than in the other models. The transition matrix for model 6 was the most unique with a high likelihood that the model will transition to and stay in state 2. This is another good indication that the model is a poor fit to the data as the data visually appears to have at least two states, and the other models produced good visual fits with a two state approach.# DiscussionPDO is an index of ocean conditions that is generally thought to act as a driver for abundance and richness of organisms in the North Pacific. Here we predicted that PDO would be a driver of copepod richness. The two time series of data appear to closely co-vary and seem to show a pattern of a high and low state with rapid swings between the two. Our models using 2 states produced good visual fits and we found little support for three states. AIC appears to be a relatively poor choice for model selection in this instance although it is unclear if that is specific to HMM models or if there was a programming error. Other types of model selection would likely increase our understanding of model performance. We saw no obvious difference between a rapid shift in states , `transition = ~1` and a shift proportional to the PDO data set. This suggests that there is in fact a very rapid shift between states, which aligns with our expected result based on the general lack of observations near neutral conditions. ## What we're still confused aboutOne question we had is that our 'best' model based on AIC values did not appear to be a good fit to the data visually. When this is the case, what should our next steps be? Visually confirming model fit seems to overrule AIC values when those diagnostics are contradictory, is this true?We didn't have a good handle on how the 'nstates' and 'transitions' inputs to the models interacted and what type of model diagnostics would help us quantify underlying relationships. After we were in pretty deep we realized having PDO in both the model list (related to copepod richness or as an independent intercept) and as a covarate for transition may be inappropriate, but are unsure and would like more information. PDO has a very strong correlative pattern with copepod richness. We ran several combinations of model to examine the strength of this relationship. We produced some models with decent looking fits, but these models were not as highly ranked by AIC as other, seemingly less well fitting models. Is AIC a poor choice? We have a suspicion this question will be answered in the next lecture(s)...# Team contributionsMadison took point on coding, Nick helped with coding and compiled it into the final rmd, and Nick and Karl did writing and Madison proofread and organized the .Rmd document.