13.11 NEON EFI Aquatics Challenge
So far, these models have only included oxygen. We can easily bring in the temperature data, which has it’s own standard error associated with it. We can set up a null model modeling temperature and oxygen independently, with different lags allowed
First we have to modify our data function to also generate temperature data.
create_stan_data <- function(data, last_obs, n_forecast, n_lag_o2,
n_lag_temp) {
# create test data
o2_test <- dplyr::filter(data, indx %in% seq(last_obs + 1,
(last_obs + n_forecast)))
temp_test <- dplyr::filter(data, indx %in% seq(last_obs +
1, (last_obs + n_forecast)))
o2_train <- dplyr::filter(data, indx <= last_obs, !is.na(oxygen))
o2_x <- o2_train$indx
o2_y <- o2_train$oxygen
o2_sd <- o2_train$oxygen_sd
n_o2 <- nrow(o2_train)
temp_train <- dplyr::filter(data, indx <= last_obs, !is.na(temperature))
temp_x <- temp_train$indx
temp_y <- temp_train$temperature
temp_sd <- temp_train$temperature_sd
n_temp <- nrow(temp_train)
stan_data <- list(n = last_obs, n_lag_o2 = n_lag_o2, n_lag_temp = n_lag_temp,
n_forecast = n_forecast, n_o2 = n_o2, o2_x = o2_x, o2_y = o2_y,
o2_sd = o2_sd, n_temp = n_temp, temp_x = temp_x, temp_y = temp_y,
temp_sd = temp_sd)
return(list(o2_train = o2_train, temp_train = temp_train,
stan_data = stan_data, o2_test = o2_test, temp_test = temp_test))
}m <- stan_model(file = "model_02.stan")where model_02.stan is the Stan model file that you can download here.
# Now we can try iterating over sets of data with a lag 1
# model
n_forecast <- 7
n_lag <- 1
rmse <- NA
for (i in 500:(nrow(data) - n_lag)) {
dat_list <- create_stan_data(data, last_obs = i, n_forecast = n_forecast,
n_lag_o2 = n_lag, n_lag_temp = n_lag)
# fit the model. opimizing can be sensitive to starting
# values, so let's try
best_map <- -1e+100
for (j in 1:10) {
test_fit <- rstan::optimizing(m, data = dat_list$stan_data)
if (test_fit$value > best_map) {
fit <- test_fit
best_map <- test_fit$value
}
}
# extract forecasts
o2_pred <- fit$par[grep("o2_forecast", names(fit$par))]
temp_pred <- fit$par[grep("temp_forecast", names(fit$par))]
pred <- c(o2_pred, temp_pred)
obs <- c(dat_list$o2_test$oxygen, dat_list$temp_test$temperature)
# evaluate predictions
rmse[i] <- sqrt(mean((obs - pred)^2, na.rm = T))
print(rmse)
}Question: why do the future predictions of either temp or oxygen sometimes fall to 0, and appear to do poorly?
Question: modify the script above to loop over various oxygen and temperature lags. Using a subset of data, say observations 500:800, which combination yields the best predictions?