Overview of today’s material

What are Generalized Additive Models (GAMs)?
Smooth Functions, Splines, Basis Functions, and Knots
Key Differences and Similarities with Dynamic Linear Models (DLMs)
GAMs in Time Series Analysis
Applications of GAMs

What is a Generalized Additive Model (GAM)?

GAMs are a class of statistical models used to model complex, non-linear relationships between the response and the predictors
Key idea: GAMs model the mean of the response variable as a sum of smooth functions of the predictors

GAM Formula

The expected value of the response variable is modeled as:

\[ E[Y] = \beta_0 + f_1(x_1) + f_2(x_2) + \dots + f_k(x_k) \]
Where:
- \(E[Y]\) is the expected value (mean) of the response variable \(Y\).
- \(\beta_0\) is the intercept.
- \(f_1(x_1), f_2(x_2), \dots\) are smooth functions of the predictors \(x_1, x_2, \dots\).

Why GAMs (mgcv) are very powerful

Accessible: Formula interface similar to lm(), gam(), etc
Smooths: Non-linear smooths, etc
Families: GAMs can use non-normal families for responses
Complexity: Automates model fit and smoothness via penalization
GAMM: random effects, autocorrelation, etc

Smooth Functions in GAMs

A smooth function allows us to model non-linear relationships in the data without needing to specify a fixed functional form
Smoothness is controlled by a penalty term, helping to avoid overfitting.
Common types of smooth functions used in GAMs:
- Splines (B-splines, cubic splines, thin-plate splines)
- Kernel smoothing
- Local regression (LOESS)
- Gaussian processes

Smooth Functions vs. Linear Models

Linear Models: Assume a linear relationship between predictors and the response \[ E[Y] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k \]
GAMs: Allow for flexible, non-linear relationships by fitting smooth functions: \[ E[Y] = \beta_0 + f_1(x_1) + f_2(x_2) + \dots + f_k(x_k) \]
Key Advantage: GAMs are more flexible in capturing complex trends in data that cannot be adequately modeled with a simple linear relationship (and have flexible families)

Smooth Functions in `mgcv`

Very powerful R package for fitting GAMs

Univariate smooth terms are expressed with s()

gam(y ~ s(x1, k = 10, bs = "cr") + 
      s(x2, k = 10, bs = "cr"), data = data)

We’ll cover more complicated smooths later

Splines and Their Role in GAMs

Splines: Mathematical functions that are piecewise polynomial and used to fit smooth curves to data
Why Splines?:
- Allow for flexible, smooth curves that can adapt to data patterns
- Avoid overfitting by controlling the complexity of the curve
Some Types of Splines:
- B-splines (Basis Splines): Commonly used due to computational efficiency and flexibility
- Cubic Splines: Splines with cubic polynomials between each pair of adjacent knots
- Thin-plate Splines: A generalization of B-splines, often used for higher-dimensional data

What Are Knots in Splines?

Knots: Points where the pieces of the spline curve join
Importance: Knots define the structure of the spline – more knots allows for more wiggliness
Placement: Knots can be placed:
- Evenly: Common approach, but not always optimal
- Data-driven placement: Placed where data points are dense or where significant changes in the response occur

Visualizing B-Splines and Their Role in GAMs

To better understand how B-splines work, let’s visualize basis functions and see how they combine to form a smooth function

Basis Functions and B-Splines

Basis Functions: Functions that form the building blocks of splines (used to construct the overall smooth function)
B-splines are defined using a set of basis functions, each associated with a knot. The smooth curve is formed by a weighted sum of these basis functions.
B-spline Basis: The basis functions for B-splines have the following properties:
- They are piecewise-defined polynomials
- They are non-negative and have local support (i.e., each basis function is non-zero only in a limited range)

How Basis Functions Build Splines

Weighted Sum of Basis Functions: The final spline is created by summing these basis functions, each weighted by a coefficient. This allows the spline to adapt to the data and capture non-linear relationships.

\[ f(x) = \sum_{i=1}^{n} w_i \cdot B_i(x) \]
- \(f(x)\) is the smooth function (spline) estimated by the GAM.
- \(B_i(x)\) are the individual basis functions.
- \(w_i\) are the coefficients (weights) that scale each basis function to fit the data.

How Basis Functions Build Splines

To understand how the weighted sum of basis functions creates a smooth function, let’s visualize how different basis functions combine to form the final curve.

weights <- c(0.4, -0.5, 0.3, 0.1, 0.6, -0.2)

What if we make the basis dimension smaller?

weights <- c(0.2, -0.1, 0.3)

What if we make the basis dimension smaller?

weights <- c(0.6, -0.1, 0.01)

What if we make the basis dimension larger?

Adjusting in `mgcv`

Setting k controls the number of basis functions, and number of knots is generally closely related (e.g. k-2 for cr)

gam(y ~ s(x1, k = 10, bs = "cr") + 
      s(x2, k = 10, bs = "cr"), data = data)

Extracting from mgcv

Basis function matrices can be extracted

x <- seq(0, 1, length.out = 100)
sc <- smoothCon(s(x, bs = "cr", k = 10), 
      data = data.frame(x = x), 
      knots = NULL)
X_spline <- sc[[1]]$X

Extracting from mgcv

Spline matrix can be passed to lm (spline regression)
Not exactly the same as mgcv because mgcv also uses penalty matrix

# simulate sinusoidal data + obs error
y <- sin(2 * pi * x) + rnorm(100, sd = 0.2)
# spline regression
fit <- lm(y ~ X_spline)

Key Take Homes

It doesn’t take many basis functions to create a flexible spline
Flexibility is controlled by:
- The number of knots and how they are placed
- The number of basis functions used in the spline construction
  - More basis functions (higher df) -> more flexibility
- We used 6 basis functions (df) in the last example
Important Considerations:
- Too few basis functions may lead to underfitting (not capturing the complexity of the data)
- Too many basis functions (too high df) -> overfitting, as spline becomes too wiggly
Knots and Basis Functions:
- Knots define where the individual pieces of the spline connect
- The placement of knots (evenly spaced vs. data-driven) can significantly impact the flexibility and accuracy of the model

What are Knots in Splines?

Knots: Points where the individual segments of the spline curve meet
Function of Knots:
- The placement of knots determines flexibility
More Knots:
- Allow for more flexibility, creating more complex curves
- Increase the risk of overfitting
Fewer Knots:
- Lead to smoother curves, but may underfit the data
- Lack the ability to capture complex non-linear patterns
Summary: The number and position of knots affect how well the spline can match patterns in the data

Example: Effect of Knot Placement on Spline Fit

`mgcv` default is to distribute them based on data

When does knot placement matter?

When the data is unevenly distributed: If the data is concentrated in certain regions, more knots can be placed in high density regions
When the relationship is non-linear: If the relationship between the predictor and response variable is highly non-linear, adding knots can help the spline fit better
When the response data has abrupt changes: If the data is affected by regimes / change points / big changes or discontinuities

Common Smooth Types in GAMs

Thin Plate Splines (tp): Flexible, non-linear smooths for complex, irregular data
Cubic Regression Splines (cr): Piecewise cubic polynomials, flexible with knots for general trends
Circular Splines (cc): For periodic data (e.g., angles, time-of-day), handles cyclical patterns
Gaussian Process (gp): Smooths with uncertainty estimates, used for spatial/time series models
B-Splines (bs): Piecewise polynomials, flexible but computationally efficient
P-Splines (ps): Combination of B-splines and a penalty for smoothness, good for large datasets

GAMs in Time Series Analysis

Why Use GAMs for Time Series?
- complex, non-linear trends and seasonality
- GAMs can capture these patterns by modeling trends as smooth functions of time
- Aren’t structured, can handle very patchy data
Handling Seasonality
- Seasonal effects (e.g., monthly, yearly) can be smoothly incorporated into the model
Incorporating Trend and Noise
- Smooth functions can represent underlying trends, while residuals capture random noise or irregularities

GAMs and DLMs:

Example: daily shad counts on the Columbia River (Bonneville)

GAMs and DLMs:

Use smooth to fill in the gaps
cor(obs, pred) ~ 0.86

fit <- gam(log(shad) ~ as.factor(year) + s(jday, bs = "cr"),
           data = shad22_24)

GAMs and DLMs:

economics dataset

psavert: Personal savings rate
unemploy: Unemployment rate
date: Date of observation

data("economics")
economics$time_num <- as.numeric(economics$date)
economics$ln_unemploy <- log(economics$unemploy)

GAMs and DLMs:

What is this model doing? Where have we seen something similar??

# personal savings = s(time)
fit <- gam(psavert ~ s(time_num, bs = "cr"),
           data = economics)

GAMs and DLMs:

This is fitting a smooth to describe changes in the mean, similar to an intercept-only Dynamic Linear Model

plot(fit)

GAMs and DLMs:

DLMs are flexible and allow a random walk in the intercept, covariates or both. Presumably we can do the same with a GAM.
How about a smooth / random walk on the covariate. Is this correct and or why not?

# savings rate ~ s(unemployment)
fit <- gam(psavert ~ s(ln_unemploy, bs = "cr"),
           data = economics)

GAMs and DLMs:

This is fitting a flexible non-linear model
However, non-linear smooth of unemployment totally ignores time aspect

plot(fit)

GAMs and DLMs:

If you thought we needed to be including time, you are correct
The prior model was fitting a non-linear smooth of the covariate
What about a smooth of the covariate and time?
Here we add a 2D smooth, s(ln_unemploy, time_num)

fit <- gam(psavert ~ s(ln_unemploy, time_num),
           data = economics)

GAMs and DLMs:

What is the 2D smooth of ln_unemploy and time doing?
There’s maybe some slight non-linearity here

vis.gam(fit, view = c("time_num", "ln_unemploy"), plot.type = "contour", color = "heat")

GAMs and DLMs

In some cases, the 2D smooth is more complciated

# Simulate predictors
set.seed(123)
n <- 400
x1 <- runif(n, 0, 10)
x2 <- runif(n, 0, 10)

# Create a response with interaction
# The response surface is non-additive: x1's effect depends on x2
y <- sin(x1) * cos(x2) + rnorm(n, sd = 0.3)

dat <- data.frame(x1 = x1, x2 = x2, y = y)

model <- gam(y ~ s(x1, x2), data = dat)

vis.gam(model, view = c("x1", "x2"), plot.type = "contour", color = "topo")

GAMs and DLMs

s(ln_unemploy, time_num) is fitting a 2D smooth that includes the interaction between the covariate and time
This is a flexible model, but not the same as a time varying slope

GAMs and DLMs:

Another way to model the interaction is with the by covariate
This lets the smooth vary / creates separate smooths for different values of the by covariate

s(predictor, by = covariate)

GAMs and DLMs:

For time series data, a common mistake is to use

fit <- gam(psavert ~ s(ln_unemploy, by = time_num),
           data = economics)

GAMs and DLMs:

Why?
s(ln_unemploy, by = time_num) is letting the shape / smooth effect of ln_unemploy vary by time step
But ordering of years isn’t preserved
Neighboring years should be similar

GAMs and DLMs:

plot(fit)

GAMs and DLMs:

Instead we can swap order of time_num and psavert
This allows the effect of time on the response to differ across different levels of the covariate

fit <- gam(psavert ~ s(time_num, by = ln_unemploy),
           data = economics)

Creates a smooth that is very similar to a DLM

GAMs and DLMs:

Now we have a time-varying slope

plot(fit)

Forecasts

Let’s think through how each of these models would do predictions

Non-linearity vs non-stationarity

These models have very different assumptions

Non-linearity vs non-stationarity

But they give nearly the exact same predictions (rho > 0.99)

Non-linearity vs non-stationarity

What about forecasts?
Let’s hold out the last 20 observations and try to forecast those

Non-linearity vs non-stationarity

What about forecast uncertainty?
What’s similar / different?

Non-linearity vs non-stationarity

How to decide between these 2 approaches?
Diagnostics: R2, RMSE, AIC, etc
compare_gams Available in slide Rmd

Non-linearity vs non-stationarity

k <- compare_gams(fit_nonlinear, fit_dlm, response="psavert", data = economics)
print(knitr::kable(k, digits = 3, caption = "Comparison of GAMs"))

Comparison of GAMs
	Model	AIC	GCV	R2	RMSE	Shapiro_p
GCV.Cp	Non-linear GAM	1421.12	0.759	0.918	1.146	0
GCV.Cp1	Time-varying GAM	1495.46	0.868	0.906	1.194	0

Time-varying slope vs intercept

Like with DLM, it might be hard to fit a model with a time-varying slope and intercept
This is because the model is trying to fit a smooth function of time and a smooth function of the covariate

Example: Time-varying slope vs intercept

data("SalmonSurvCUI")
# time varying intercept model
g1 <- gam(logit.s ~ s(year, k = 10), 
          data = SalmonSurvCUI)
# time varying slope model
g2 <- gam(logit.s ~ s(year, by = CUI.apr, k = 10), 
          data = SalmonSurvCUI)

Example: Time-varying slope vs intercept

Very different interpretations
Similar flexibility (though here, intercept model does better)

Example: Time-varying slope vs intercept

Residuals from time-varying intercept look ok

Example: Time-varying slope vs intercept

Residuals from time-varying slope look like they’re still pretty autocorrelated

Summary: Time-varying slope vs intercept

Time varying intercepts more often supported vs time varying slopes. Why?
They capture the overall trend (level) in the data
Covariate may not be variable enough
Intercepts may not be identifiable without priors, etc
This will vary case by case, and in general you want to be careful in comparing / constructing models

Generalized Additive Models & Time Series

FISH 550 – Applied Time Series Analysis

Overview of today’s material

What is a Generalized Additive Model (GAM)?

GAM Formula

Why GAMs (mgcv) are very powerful

Smooth Functions in GAMs

Smooth Functions vs. Linear Models

Smooth Functions in mgcv

Splines and Their Role in GAMs

What Are Knots in Splines?

Visualizing B-Splines and Their Role in GAMs

Basis Functions and B-Splines

How Basis Functions Build Splines

How Basis Functions Build Splines

What if we make the basis dimension smaller?

What if we make the basis dimension smaller?

What if we make the basis dimension larger?

Adjusting in mgcv

Extracting from mgcv

Extracting from mgcv

Key Take Homes

What are Knots in Splines?

Example: Effect of Knot Placement on Spline Fit

mgcv default is to distribute them based on data

mgcv default is to distribute them based on data

When does knot placement matter?

Common Smooth Types in GAMs

GAMs in Time Series Analysis

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs

GAMs and DLMs

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

GAMs and DLMs:

Forecasts

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Non-linearity vs non-stationarity

Time-varying slope vs intercept

Example: Time-varying slope vs intercept

Example: Time-varying slope vs intercept

Example: Time-varying slope vs intercept

Example: Time-varying slope vs intercept

Summary: Time-varying slope vs intercept

Lots of references on GAMs

Smooth Functions in `mgcv`

Adjusting in `mgcv`

`mgcv` default is to distribute them based on data

`mgcv` default is to distribute them based on data