2.4 Groups of intercepts
Let’s say that the odd numbered plants are in the north and the even numbered are in the south. We want to include this as a factor in our model that affects the intercept. Let’s go back to just having air flow be our explanatory variable. Now if the plant is in the north our model is \[\begin{equation} \tag{2.11} stack.loss_i = \alpha_n + \beta air_i + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] If the plant is in the south, our model is \[\begin{equation} \tag{2.12} stack.loss_i = \alpha_s + \beta air_i + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] We’ll add north/south as a factor called `reg’ (region) to our dataframe:
= cbind(dat, reg = rep(c("n", "s"), 4)[1:4])
dat dat
Air.Flow Water.Temp Acid.Conc. stack.loss reg
1 80 27 89 42 n
2 80 27 88 37 s
3 75 25 90 37 n
4 62 24 87 28 s
And we can easily fit this model with lm()
.
= lm(stack.loss ~ -1 + Air.Flow + reg, data = dat)
fit2 coef(fit2)
Air.Flow regn regs
0.5358166 -2.0257880 -5.5429799
The -1 is added to the lm()
call to get rid of \(\alpha\). We just want the \(\alpha_n\) and \(\alpha_s\) intercepts coming from our regions.
2.4.1 North/South intercepts in Form 1
Written in matrix form, Form 1 for this model is
\[\begin{equation}
\tag{2.13}
\begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix}
=
\begin{bmatrix}air_1&1&0\\ air_2&0&1 \\air_3&1&0\\air_4&0&1\end{bmatrix}
\begin{bmatrix}\beta \\ \alpha_n \\ \alpha_s \end{bmatrix}
+
\begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix}
\end{equation}\]
Notice that odd plants get \(\alpha_n\) and even plants get \(\alpha_s\). Use model.matrix()
to see that this is the \(\mathbf{Z}\) matrix that lm()
formed. Notice the matrix output by model.matrix()
looks exactly like \(\mathbf{Z}\) in Equation (2.13).
= model.matrix(fit2)
Z 1:4, ] Z[
Air.Flow regn regs
1 80 1 0
2 80 0 1
3 75 1 0
4 62 0 1
We can solve for the parameters using \(\mathbf{x} = (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{y}\) as we did for Form 1 before by adding on the 1s and 0s columns we see in the \(\mathbf{Z}\) matrix in Equation (2.13). We could build this \(\mathbf{Z}\) using the following R code:
= cbind(dat$Air.Flow, c(1, 0, 1, 0), c(0, 1, 0, 1))
Z colnames(Z) = c("beta", "regn", "regs")
Or just use model.matrix()
. This will save time when models are more complex.
= model.matrix(fit2)
Z 1:4, ] Z[
Air.Flow regn regs
1 80 1 0
2 80 0 1
3 75 1 0
4 62 0 1
Now we can solve for the parameters:
= matrix(dat$stack.loss, ncol = 1)
y solve(t(Z) %*% Z) %*% t(Z) %*% y
[,1]
Air.Flow 0.5358166
regn -2.0257880
regs -5.5429799
Compare to the output from lm()
and you will see it is the same.
coef(fit2)
Air.Flow regn regs
0.5358166 -2.0257880 -5.5429799
2.4.2 North/South intercepts in Form 2
We would write this model in Form 2 as \[\begin{equation} \tag{2.14} \begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix} = \begin{bmatrix} \beta&0&0&0\\ 0&\beta&0&0\\ 0&0&\beta&0\\ 0&0&0&\beta \end{bmatrix}\begin{bmatrix}air_1\\air_2\\air_3\\air_4\end{bmatrix} + \begin{bmatrix} \alpha_n\\ \alpha_s\\ \alpha_n\\ \alpha_s \end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix}=\mathbf{Z}\mathbf{x}+\mathbf{a}+\mathbf{e} \end{equation}\]