Generalized Additive Models & Time Series

FISH 550 – Applied Time Series Analysis

Overview of today’s material

What is a Generalized Additive Model (GAM)?

GAM Formula

Why GAMs (mgcv) are very powerful

Smooth Functions in GAMs

Smooth Functions vs. Linear Models

Smooth Functions in mgcv

Very powerful R package for fitting GAMs

Univariate smooth terms are expressed with s()

gam(y ~ s(x1, k = 10, bs = "cr") + 
      s(x2, k = 10, bs = "cr"), data = data)

We’ll cover more complicated smooths later

Splines and Their Role in GAMs

What Are Knots in Splines?

Visualizing B-Splines and Their Role in GAMs

To better understand how B-splines work, let’s visualize basis functions and see how they combine to form a smooth function

Basis Functions and B-Splines

How Basis Functions Build Splines

How Basis Functions Build Splines

To understand how the weighted sum of basis functions creates a smooth function, let’s visualize how different basis functions combine to form the final curve.

weights <- c(0.4, -0.5, 0.3, 0.1, 0.6, -0.2)

What if we make the basis dimension smaller?

weights <- c(0.2, -0.1, 0.3)

What if we make the basis dimension smaller?

weights <- c(0.6, -0.1, 0.01)

What if we make the basis dimension larger?

Adjusting in mgcv

Setting k controls the number of basis functions, and number of knots is generally closely related (e.g. k-2 for cr)

gam(y ~ s(x1, k = 10, bs = "cr") + 
      s(x2, k = 10, bs = "cr"), data = data)

Extracting from mgcv

x <- seq(0, 1, length.out = 100)
sc <- smoothCon(s(x, bs = "cr", k = 10), 
      data = data.frame(x = x), 
      knots = NULL)
X_spline <- sc[[1]]$X

Extracting from mgcv

# simulate sinusoidal data + obs error
y <- sin(2 * pi * x) + rnorm(100, sd = 0.2)
# spline regression
fit <- lm(y ~ X_spline)

Key Take Homes

What are Knots in Splines?

Example: Effect of Knot Placement on Spline Fit

mgcv default is to distribute them based on data

mgcv default is to distribute them based on data

When does knot placement matter?

Common Smooth Types in GAMs

GAMs in Time Series Analysis

GAMs and DLMs:

GAMs and DLMs:

fit <- gam(log(shad) ~ as.factor(year) + s(jday, bs = "cr"),
           data = shad22_24)

GAMs and DLMs:

economics dataset

data("economics")
economics$time_num <- as.numeric(economics$date)
economics$ln_unemploy <- log(economics$unemploy)

GAMs and DLMs:

GAMs and DLMs:

# personal savings = s(time)
fit <- gam(psavert ~ s(time_num, bs = "cr"),
           data = economics)

GAMs and DLMs:

plot(fit)

GAMs and DLMs:

# savings rate ~ s(unemployment)
fit <- gam(psavert ~ s(ln_unemploy, bs = "cr"),
           data = economics)

GAMs and DLMs:

plot(fit)

GAMs and DLMs:

fit <- gam(psavert ~ s(ln_unemploy, time_num),
           data = economics)

GAMs and DLMs:

vis.gam(fit, view = c("time_num", "ln_unemploy"), plot.type = "contour", color = "heat")

GAMs and DLMs

# Simulate predictors
set.seed(123)
n <- 400
x1 <- runif(n, 0, 10)
x2 <- runif(n, 0, 10)

# Create a response with interaction
# The response surface is non-additive: x1's effect depends on x2
y <- sin(x1) * cos(x2) + rnorm(n, sd = 0.3)

dat <- data.frame(x1 = x1, x2 = x2, y = y)

model <- gam(y ~ s(x1, x2), data = dat)

vis.gam(model, view = c("x1", "x2"), plot.type = "contour", color = "topo")

GAMs and DLMs

GAMs and DLMs:

s(predictor, by = covariate)

GAMs and DLMs:

fit <- gam(psavert ~ s(ln_unemploy, by = time_num),
           data = economics)

GAMs and DLMs:

GAMs and DLMs:

plot(fit)

GAMs and DLMs:

fit <- gam(psavert ~ s(time_num, by = ln_unemploy),
           data = economics)

GAMs and DLMs:

GAMs and DLMs:

plot(fit)

Forecasts

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

k <- compare_gams(fit_nonlinear, fit_dlm, response="psavert", data = economics)
print(knitr::kable(k, digits = 3, caption = "Comparison of GAMs"))
Comparison of GAMs
Model AIC GCV R2 RMSE Shapiro_p
GCV.Cp Non-linear GAM 1421.12 0.759 0.918 1.146 0
GCV.Cp1 Time-varying GAM 1495.46 0.868 0.906 1.194 0

Time-varying slope vs intercept

Example: Time-varying slope vs intercept

data("SalmonSurvCUI")
# time varying intercept model
g1 <- gam(logit.s ~ s(year, k = 10), 
          data = SalmonSurvCUI)
# time varying slope model
g2 <- gam(logit.s ~ s(year, by = CUI.apr, k = 10), 
          data = SalmonSurvCUI)

Example: Time-varying slope vs intercept

Example: Time-varying slope vs intercept

Example: Time-varying slope vs intercept

Summary: Time-varying slope vs intercept

Lots of references on GAMs

There’s lots of great references out there, including: