Characteristics of time series (ts)
What is a ts?
Classifying ts
Trends
Seasonality (periodicity)
5 Jan 2021
What is a ts?
Classifying ts
Trends
Seasonality (periodicity)
\[ \{ x_1,x_2,x_3,\dots,x_n \} \]
\[ \{ 10,31,27,42,53,15 \} \]
Interval across real time; \(x(t)\)
Discrete time; \(x_t\)
Discrete (eg, total # of fish caught per trawl)
Continuous (eg, salinity, temperature)
Univariate/scalar (eg, total # of fish caught)
Multivariate/vector (eg, # of each spp of fish caught)
Integer (eg, # of fish in 5 min trawl = 2413)
Rational (eg, fraction of unclipped fish = 47/951)
Real (eg, fish mass = 10.2 g)
Complex (eg, cos(2π2.43) + i sin(2π2.43))
Univariate \((x_t)\)
Multivariate \(\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}_t\)
Time series objects have a special designation in R: ts
ts(data, start, end, frequency )
Time series objects have a special designation in R: ts
ts(data, start, end, frequency )
data
should be a vector (univariate)
or a data frame or matrix (multivariate)
Time series objects have a special designation in R: ts
ts(data, start, end, frequency )
start
and end
give the first and last time indices
For monthly series, specify them as c(year, month)
Time series objects have a special designation in R: ts
ts(data, start, end, frequency )
frequency
is the number of observations per unit time
For annual series, frequency = 1
For monthly series, frequency = 12
Time series objects have a special designation in R: ts
ts(data, start, end, deltat )
deltat
is the fraction of the sampling period
For annual series, deltat = 1
For monthly series, deltat = 1/12
set.seed(507) ## annual data dat_1 <- rnorm(30) dat_yr <- ts(dat_1, start = 1991, end = 2020, frequency = 1) ## monthly data dat_2 <- rnorm(30*12) dat_mo <- ts(dat_2, start = c(1991, 1), end = c(2020, 12), frequency = 12)
There is a designated function for plotting ts
objects: plot.ts()
plot.ts(ts_object)
We can specify some additional arguments to plot.ts
plot.ts(dat_yr, ylab = expression(italic(x[t])), las = 1, col = "blue", lwd = 2)
Most statistical analyses are concerned with estimating properties of a population from a sample
For example, we use fish caught in a seine to infer the mean size of fish in a lake
Time series analysis, however, presents a different situation:
Time series analysis, however, presents a different situation:
For example, one can’t observe today’s closing price of Microsoft stock more than once
Thus, conventional statistical procedures, based on large sample estimates, are inappropriate
Number of users connected to the internet
Number of lynx trapped in Canada from 1821-1934
A time series model for \(\{x_t\}\) is a specification of the joint distributions of a sequence of random variables \(\{X_t\}\), of which \(\{x_t\}\) is thought to be a realization
White noise: \(x_t \sim N(0,1)\)
Random walk: \(x_t = x_{t-1} + w_t,~\text{with}~w_t \sim N(0,1)\)
\(x_t = m_t + s_t + e_t\)
We need a way to extract the so-called signal from the noise
One common method is via “linear filters”
Linear filters can be thought of as “smoothing” the data
Linear filters typically take the form
\[ \hat{m}_t = \sum_{i=-\infty}^{\infty} \lambda_i x_{t+1} \]
For example, a moving average
\[ \hat{m}_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]
For example, a moving average
\[ \hat{m}_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]
If \(a = 1\), then
\[ \hat{m}_t = \frac{1}{3}(x_{t-1} + x_t + x_{t+1}) \]
For example, a moving average
\[ \hat{m}_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]
As \(a\) increases, the estimated trend becomes more smooth
Monthly airline passengers from 1949-1960
Monthly airline passengers from 1949-1960
Monthly airline passengers from 1949-1960
Monthly airline passengers from 1949-1960
Once we have an estimate of the trend \(\hat{m}_t\), we can estimate \(\hat{s}_t\) simply by subtraction:
\[ \hat{s}_t = x_t - \hat{m}_t \]
Seasonal effect (\(\hat{s}_t\)), assuming \(\lambda = 1/9\)
But, \(\hat{s}_t\) really includes the remainder \(e_t\) as well
\[ \begin{align} \hat{s}_t &= x_t - \hat{m}_t \\ (s_t + e_t) &= x_t - m_t \end{align} \]
So we need to estimate the mean seasonal effect as
\[ \hat{s}_{Jan} = \sum \frac{1}{(N/12)} \{s_1, s_{13}, s_{25}, \dots \} \\ \hat{s}_{Feb} = \sum \frac{1}{(N/12)} \{s_2, s_{14}, s_{26}, \dots \} \\ \vdots \\ \hat{s}_{Dec} = \sum \frac{1}{(N/12)} \{s_{12}, s_{24}, s_{36}, \dots \} \\ \]
Now we can estimate \(e_t\) via subtraction:
\[ \hat{e}_t = x_t - \hat{m}_t - \hat{s}_t \]
Log-transform data
Linear trend
Monthly airline passengers from 1949-1960