2.6 Seasonal effect as a factor

Let’s imagine that the data were taken consecutively in time by quarter. We want to model the seasonal effect as an intercept change. We will drop all other effects for now. If the data were collected in quarter 1, the model is \[\begin{equation} \tag{2.19} stack.loss_i = \alpha_1 + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] If collected in quarter 2, the model is \[\begin{equation} \tag{2.20} stack.loss_i = \alpha_2 + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] etc.

We add a column to our dataframe to account for season:

dat = cbind(dat, qtr = paste(rep("qtr", 4), 1:4, sep = ""))
dat
  Air.Flow Water.Temp Acid.Conc. stack.loss reg owner  qtr
1       80         27         89         42   n     s qtr1
2       80         27         88         37   s     a qtr2
3       75         25         90         37   n     s qtr3
4       62         24         87         28   s     a qtr4

And we can easily fit this model with lm().

coef(lm(stack.loss ~ -1 + qtr, data = dat))
qtrqtr1 qtrqtr2 qtrqtr3 qtrqtr4 
     42      37      37      28 

The -1 is added to the lm() call to get rid of \(\alpha\). We just want the \(\alpha_1\), \(\alpha_2\), etc. intercepts coming from our quarters.

For comparison look at

coef(lm(stack.loss ~ qtr, data = dat))
(Intercept)     qtrqtr2     qtrqtr3     qtrqtr4 
         42          -5          -5         -14 

Why does it look like that when -1 is missing from the lm() call? Where did the intercept for quarter 1 go and why are the other intercepts so much smaller?

2.6.1 Seasonal intercepts written in Form 1

Remembering that lm() puts models in Form 1, look at the \(\mathbf{Z}\) matrix for Form 1:

fit4 = lm(stack.loss ~ -1 + qtr, data = dat)
Z = model.matrix(fit4)
Z[1:4, ]
  qtrqtr1 qtrqtr2 qtrqtr3 qtrqtr4
1       1       0       0       0
2       0       1       0       0
3       0       0       1       0
4       0       0       0       1

Written in Form 1, this model is \[\begin{equation} \tag{2.21} \begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix} = \begin{bmatrix}1&0&0&0\\ 0&1&0&0 \\ 0&0&1&0\\ 0&0&0&1\end{bmatrix} \begin{bmatrix}\alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \alpha_4 \end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix}=\mathbf{Z}\mathbf{x}+\mathbf{e} \end{equation}\]

Compare to the model that lm() is using when the intercept included. What does this model look like written in matrix form?

fit5 = lm(stack.loss ~ qtr, data = dat)
Z = model.matrix(fit5)
Z[1:4, ]
  (Intercept) qtrqtr2 qtrqtr3 qtrqtr4
1           1       0       0       0
2           1       1       0       0
3           1       0       1       0
4           1       0       0       1

2.6.2 Seasonal intercepts written in Form 2

We do not need to add 1s and 0s to our \(\mathbf{Z}\) matrix in Form 2; we just add subscripts to our intercepts matrix like we did when we had north-south intercepts. In this model, we do not have any explanatory variables so \(\mathbf{Z}\mathbf{x}\) does not appear. \[\begin{equation} \tag{2.22} \begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix} = \begin{bmatrix} \alpha_1\\ \alpha_2\\ \alpha_3\\ \alpha_4 \end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix}=\mathbf{a}+\mathbf{e} \end{equation}\]