13.9 NEON EFI Aquatics Challenge
Before modeling temperature and oxygen jointly, we’ll start working with just the oxygen data alone.
We’ll just with just a AR(p) state space model. The Lake Barco data has associated standard errors with the oxygen data, so we’ll use that as a known observation error (rather than estimating that variance parameter, in our previous work).
A script that describes the model is called model_01.stan
. You can download this file here.
This should be familiar, following from the atsar
code with a couple modifications. First, Stan doesn’t like NAs being passed in as data, and we’ve had to do some indexing to avoid that. Second, notice that the model is flexible and can be used to fit models with any numbers of lags.
To get the data prepped for Stan, we need to do a few things. First, specify the lag and forecast horizon
<- neon_barc
data $indx <- seq(1, nrow(data))
data<- 7
n_forecast <- 1 n_lag
Next, we’ll drop observations with missing oxygen. If you wanted to do some validation, we could also split the data into a training and test set.
# As a first model, we'll just work with modeling oxygen
<- dplyr::filter(data, !is.na(oxygen))
o2_dat
# split the test and training data
<- max(data$indx) - n_forecast
last_obs <- dplyr::filter(o2_dat, indx <= last_obs)
o2_train <- dplyr::filter(data, indx > last_obs)
test
<- o2_train$indx
o2_x <- o2_train$oxygen
o2_y <- o2_train$oxygen_sd
o2_sd <- nrow(o2_train) n_o2
Remember that the Stan data needs to be in a list,
<- list(n = last_obs, n_o2 = n_o2, n_lag = n_lag, n_forecast = n_forecast,
stan_data o2_x = o2_x, o2_y = o2_y, o2_sd = o2_sd)
Finally we can compile the model. If we wanted to do fully Bayesian estimates, we could do that with
<- stan(file = "model_01.stan", data = stan_data) fit
Try fitting the Bayesian model with a short chain length (maybe 1000 iterations) and 1 MCMC chain.
But because we’re interested in doing this quickly, and running the model a bunch of times, we’ll try Stan’s optimizing function for MAP estimation.
<- stan_model(file = "model_01.stan")
m <- rstan::optimizing(m, data = stan_data, hessian = TRUE) o2_model
Let’s extract predictions from the fitted object,
$pred <- o2_model$par[grep("pred", names(o2_model$par))]
dataggplot(data, aes(date, pred)) + geom_line() + geom_point(aes(date,
col = "red", alpha = 0.5) oxygen),
Question how do we evaluate predictions? We’ll talk about that more next week, but for today we can think about it as RMSE(observations,predictions)
Question: modify the code above (just on the R side, not Stan) to fit a lag-2 or lag-3 model. Are the predictions similar?