13.9 NEON EFI Aquatics Challenge

Before modeling temperature and oxygen jointly, we’ll start working with just the oxygen data alone.

We’ll just with just a AR(p) state space model. The Lake Barco data has associated standard errors with the oxygen data, so we’ll use that as a known observation error (rather than estimating that variance parameter, in our previous work).

A script that describes the model is called model_01.stan. You can download this file here. This should be familiar, following from the atsar code with a couple modifications. First, Stan doesn’t like NAs being passed in as data, and we’ve had to do some indexing to avoid that. Second, notice that the model is flexible and can be used to fit models with any numbers of lags.

To get the data prepped for Stan, we need to do a few things. First, specify the lag and forecast horizon

data <- neon_barc
data$indx <- seq(1, nrow(data))
n_forecast <- 7
n_lag <- 1

Next, we’ll drop observations with missing oxygen. If you wanted to do some validation, we could also split the data into a training and test set.

# As a first model, we'll just work with modeling oxygen
o2_dat <- dplyr::filter(data, !is.na(oxygen))

# split the test and training data
last_obs <- max(data$indx) - n_forecast
o2_train <- dplyr::filter(o2_dat, indx <= last_obs)
test <- dplyr::filter(data, indx > last_obs)

o2_x <- o2_train$indx
o2_y <- o2_train$oxygen
o2_sd <- o2_train$oxygen_sd
n_o2 <- nrow(o2_train)

Remember that the Stan data needs to be in a list,

stan_data <- list(n = last_obs, n_o2 = n_o2, n_lag = n_lag, n_forecast = n_forecast, 
    o2_x = o2_x, o2_y = o2_y, o2_sd = o2_sd)

Finally we can compile the model. If we wanted to do fully Bayesian estimates, we could do that with

fit <- stan(file = "model_01.stan", data = stan_data)

Try fitting the Bayesian model with a short chain length (maybe 1000 iterations) and 1 MCMC chain.

But because we’re interested in doing this quickly, and running the model a bunch of times, we’ll try Stan’s optimizing function for MAP estimation.

m <- stan_model(file = "model_01.stan")
o2_model <- rstan::optimizing(m, data = stan_data, hessian = TRUE)

Let’s extract predictions from the fitted object,

data$pred <- o2_model$par[grep("pred", names(o2_model$par))]
ggplot(data, aes(date, pred)) + geom_line() + geom_point(aes(date, 
    oxygen), col = "red", alpha = 0.5)

Question how do we evaluate predictions? We’ll talk about that more next week, but for today we can think about it as RMSE(observations,predictions)

Question: modify the code above (just on the R side, not Stan) to fit a lag-2 or lag-3 model. Are the predictions similar?