2.4 Groups of intercepts

Let’s say that the odd numbered plants are in the north and the even numbered are in the south. We want to include this as a factor in our model that affects the intercept. Let’s go back to just having air flow be our explanatory variable. Now if the plant is in the north our model is \[\begin{equation} \tag{2.11} stack.loss_i = \alpha_n + \beta air_i + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] If the plant is in the south, our model is \[\begin{equation} \tag{2.12} stack.loss_i = \alpha_s + \beta air_i + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] We’ll add north/south as a factor called `reg’ (region) to our dataframe:

dat = cbind(dat, reg = rep(c("n", "s"), 4)[1:4])
dat
  Air.Flow Water.Temp Acid.Conc. stack.loss reg
1       80         27         89         42   n
2       80         27         88         37   s
3       75         25         90         37   n
4       62         24         87         28   s

And we can easily fit this model with lm().

fit2 = lm(stack.loss ~ -1 + Air.Flow + reg, data = dat)
coef(fit2)
  Air.Flow       regn       regs 
 0.5358166 -2.0257880 -5.5429799 

The -1 is added to the lm() call to get rid of \(\alpha\). We just want the \(\alpha_n\) and \(\alpha_s\) intercepts coming from our regions.

2.4.1 North/South intercepts in Form 1

Written in matrix form, Form 1 for this model is \[\begin{equation} \tag{2.13} \begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix} = \begin{bmatrix}air_1&1&0\\ air_2&0&1 \\air_3&1&0\\air_4&0&1\end{bmatrix} \begin{bmatrix}\beta \\ \alpha_n \\ \alpha_s \end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix} \end{equation}\] Notice that odd plants get \(\alpha_n\) and even plants get \(\alpha_s\). Use model.matrix() to see that this is the \(\mathbf{Z}\) matrix that lm() formed. Notice the matrix output by model.matrix() looks exactly like \(\mathbf{Z}\) in Equation (2.13).

Z = model.matrix(fit2)
Z[1:4, ]
  Air.Flow regn regs
1       80    1    0
2       80    0    1
3       75    1    0
4       62    0    1

We can solve for the parameters using \(\mathbf{x} = (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{y}\) as we did for Form 1 before by adding on the 1s and 0s columns we see in the \(\mathbf{Z}\) matrix in Equation (2.13). We could build this \(\mathbf{Z}\) using the following R code:

Z = cbind(dat$Air.Flow, c(1, 0, 1, 0), c(0, 1, 0, 1))
colnames(Z) = c("beta", "regn", "regs")

Or just use model.matrix(). This will save time when models are more complex.

Z = model.matrix(fit2)
Z[1:4, ]
  Air.Flow regn regs
1       80    1    0
2       80    0    1
3       75    1    0
4       62    0    1

Now we can solve for the parameters:

y = matrix(dat$stack.loss, ncol = 1)
solve(t(Z) %*% Z) %*% t(Z) %*% y
               [,1]
Air.Flow  0.5358166
regn     -2.0257880
regs     -5.5429799

Compare to the output from lm() and you will see it is the same.

coef(fit2)
  Air.Flow       regn       regs 
 0.5358166 -2.0257880 -5.5429799 

2.4.2 North/South intercepts in Form 2

We would write this model in Form 2 as \[\begin{equation} \tag{2.14} \begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix} = \begin{bmatrix} \beta&0&0&0\\ 0&\beta&0&0\\ 0&0&\beta&0\\ 0&0&0&\beta \end{bmatrix}\begin{bmatrix}air_1\\air_2\\air_3\\air_4\end{bmatrix} + \begin{bmatrix} \alpha_n\\ \alpha_s\\ \alpha_n\\ \alpha_s \end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix}=\mathbf{Z}\mathbf{x}+\mathbf{a}+\mathbf{e} \end{equation}\]