13.11 NEON EFI Aquatics Challenge
So far, these models have only included oxygen. We can easily bring in the temperature data, which has it’s own standard error associated with it. We can set up a null model modeling temperature and oxygen independently, with different lags allowed
First we have to modify our data function to also generate temperature data.
<- function(data, last_obs, n_forecast, n_lag_o2,
create_stan_data
n_lag_temp) {# create test data
<- dplyr::filter(data, indx %in% seq(last_obs + 1,
o2_test + n_forecast)))
(last_obs <- dplyr::filter(data, indx %in% seq(last_obs +
temp_test 1, (last_obs + n_forecast)))
<- dplyr::filter(data, indx <= last_obs, !is.na(oxygen))
o2_train <- o2_train$indx
o2_x <- o2_train$oxygen
o2_y <- o2_train$oxygen_sd
o2_sd <- nrow(o2_train)
n_o2
<- dplyr::filter(data, indx <= last_obs, !is.na(temperature))
temp_train <- temp_train$indx
temp_x <- temp_train$temperature
temp_y <- temp_train$temperature_sd
temp_sd <- nrow(temp_train)
n_temp
<- list(n = last_obs, n_lag_o2 = n_lag_o2, n_lag_temp = n_lag_temp,
stan_data n_forecast = n_forecast, n_o2 = n_o2, o2_x = o2_x, o2_y = o2_y,
o2_sd = o2_sd, n_temp = n_temp, temp_x = temp_x, temp_y = temp_y,
temp_sd = temp_sd)
return(list(o2_train = o2_train, temp_train = temp_train,
stan_data = stan_data, o2_test = o2_test, temp_test = temp_test))
}
<- stan_model(file = "model_02.stan") m
where model_02.stan
is the Stan model file that you can download here.
# Now we can try iterating over sets of data with a lag 1
# model
<- 7
n_forecast <- 1
n_lag <- NA
rmse for (i in 500:(nrow(data) - n_lag)) {
<- create_stan_data(data, last_obs = i, n_forecast = n_forecast,
dat_list n_lag_o2 = n_lag, n_lag_temp = n_lag)
# fit the model. opimizing can be sensitive to starting
# values, so let's try
<- -1e+100
best_map for (j in 1:10) {
<- rstan::optimizing(m, data = dat_list$stan_data)
test_fit if (test_fit$value > best_map) {
<- test_fit
fit <- test_fit$value
best_map
}
}
# extract forecasts
<- fit$par[grep("o2_forecast", names(fit$par))]
o2_pred <- fit$par[grep("temp_forecast", names(fit$par))]
temp_pred
<- c(o2_pred, temp_pred)
pred <- c(dat_list$o2_test$oxygen, dat_list$temp_test$temperature)
obs # evaluate predictions
<- sqrt(mean((obs - pred)^2, na.rm = T))
rmse[i] print(rmse)
}
Question: why do the future predictions of either temp or oxygen sometimes fall to 0, and appear to do poorly?
Question: modify the script above to loop over various oxygen and temperature lags. Using a subset of data, say observations 500:800, which combination yields the best predictions?