2.5 Groups of \(\beta\)’s

Now let’s say that the plants have different owners, Sue and Aneesh, and we want to have \(\beta\) for the air flow effect vary by owner. If the plant is in the north and owned by Sue, the model is \[\begin{equation} \tag{2.15} stack.loss_i = \alpha_n + \beta_s air_i + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] If it is in the south and owned by Aneesh, the model is \[\begin{equation} \tag{2.16} stack.loss_i = \alpha_s + \beta_a air_i + e_i, \text{ where } e_i \sim \text{N}(0,\sigma^2) \end{equation}\] You get the idea.

Now we need to add an operator variable as a factor in our stackloss dataframe. Plants 1,3 are run by Sue and plants 2,4 are run by Aneesh.

dat = cbind(dat, owner = c("s", "a"))
dat
  Air.Flow Water.Temp Acid.Conc. stack.loss reg owner
1       80         27         89         42   n     s
2       80         27         88         37   s     a
3       75         25         90         37   n     s
4       62         24         87         28   s     a

Since the operator names can be replicated the length of our data set, R fills in the operator colmun by replicating our string of operator names to the right length, conveniently (or alarmingly).

We can easily fit this model with lm() using the “:” notation.

coef(lm(stack.loss ~ -1 + Air.Flow:owner + reg, data = dat))
           regn            regs Air.Flow:ownera Air.Flow:owners 
          -38.0            -3.0             0.5             1.0 

Notice that we have 4 datapoints and are estimating 4 parameters. We are not going to be able to estimate any more parameters than data points. If we want to estimate any more, we’ll need to use the fuller stackflow dataset (which has 21 data points).

2.5.1 Owner \(\beta\)’s in Form 1

Written in Form 1, this model is \[\begin{equation} \tag{2.17} \begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix} = \begin{bmatrix}1&0&0&air_1\\ 0&1&air_2&0 \\ 1&0&0&air_3\\ 0&1&air_4&0\end{bmatrix} \begin{bmatrix}\alpha_n \\ \alpha_s \\ \beta_a \\ \beta_s \end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix}=\mathbf{Z}\mathbf{x}+\mathbf{e} \end{equation}\] The air data have been written to the right of the 1s and 0s for north/south intercepts because that is how lm() writes this model in Form 1 and I want to duplicate that (for teaching purposes). Also the \(\beta\)’s are ordered to be alphabetical because lm() writes the \(\mathbf{Z}\) matrix like that.

Now our model is more complicated and using model.matrix() to get our \(\mathbf{Z}\) saves us a lot tedious matrix building.

fit3 = lm(stack.loss ~ -1 + Air.Flow:owner + reg, data = dat)
Z = model.matrix(fit3)
Z[1:4, ]
  regn regs Air.Flow:ownera Air.Flow:owners
1    1    0               0              80
2    0    1              80               0
3    1    0               0              75
4    0    1              62               0

Notice the matrix output by model.matrix() looks exactly like \(\mathbf{Z}\) in Equation (2.17) (ignore the attributes info). Now we can solve for the parameters:

y = matrix(dat$stack.loss, ncol = 1)
solve(t(Z) %*% Z) %*% t(Z) %*% y
                 [,1]
regn            -38.0
regs             -3.0
Air.Flow:ownera   0.5
Air.Flow:owners   1.0

Compare to the output from lm() and you will see it is the same.

2.5.2 Owner \(\beta\)’s in Form 2

To write this model in Form 2, we just add subscripts to the \(\beta\)’s in our Form 2 \(\mathbf{Z}\) matrix: \[\begin{equation} \tag{2.18} \begin{bmatrix}stack.loss_1\\ stack.loss_2\\ stack.loss_3\\ stack.loss_4\end{bmatrix} = \begin{bmatrix} \beta_s&0&0&0\\ 0&\beta_a&0&0\\ 0&0&\beta_s&0\\ 0&0&0&\beta_a \end{bmatrix}\begin{bmatrix}air_1\\air_2\\air_3\\air_4\end{bmatrix} + \begin{bmatrix} \alpha_n\\ \alpha_s\\ \alpha_n\\ \alpha_s \end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix}=\mathbf{Z}\mathbf{x}+\mathbf{a}+\mathbf{e} \end{equation}\]