2.2 Matrix Form 1

In this form, we have the explanatory variables in a matrix on the left of our parameter matrix: \[\begin{equation} \tag{2.2} \begin{bmatrix}stack.loss_1\\stack.loss_2\\stack.loss_3\\stack.loss_4\end{bmatrix} = \begin{bmatrix}1&air_1\\1&air_2\\1&air_3\\1&air_4\end{bmatrix} \begin{bmatrix}\alpha\\ \beta\end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix} \end{equation}\] You should work through the matrix algebra to make sure you understand why Equation (2.2) is Equation (2.1) for all the \(i\) data points together.

We can write the first line of Equation (2.2) succinctly as \[\begin{equation} \tag{2.3} \mathbf{y} = \mathbf{Z}\mathbf{x} + \mathbf{e} \end{equation}\] where \(\mathbf{x}\) are our parameters, \(\mathbf{y}\) are our response variables, and \(\mathbf{Z}\) are our explanatory variables (with a 1 column for the intercept). The lm() function uses Form 1, and we can recover the \(\mathbf{Z}\) matrix for Form 1 by using the model.matrix() function on the output from a lm() call:

fit = lm(stack.loss ~ Air.Flow, data = dat)
Z = model.matrix(fit)
Z[1:4, ]

  (Intercept) Air.Flow
1           1       80
2           1       80
3           1       75
4           1       62

2.2.1 Solving for the parameters

Note: You will not need to know how to solve linear matrix equations for this course. This section just shows you what the lm() function is doing to estimate the parameters.

Notice that \(\mathbf{Z}\) is not a square matrix and its inverse does not exist but the inverse of \(\mathbf{Z}^\top\mathbf{Z}\) exists—if this is a solveable problem. We can go through the following steps to solve for \(\mathbf{x}\), our parameters \(\alpha\) and \(\beta\).

Start with \(\mathbf{y} = \mathbf{Z}\mathbf{x} + \mathbf{e}\) and multiply by \(\mathbf{Z}^\top\) on the left to get \[\begin{equation*} \mathbf{Z}^\top\mathbf{y} = \mathbf{Z}^\top\mathbf{Z}\mathbf{x} + \mathbf{Z}^\top\mathbf{e} \end{equation*}\] Multiply that by \((\mathbf{Z}^\top\mathbf{Z})^{-1}\) on the left to get \[\begin{equation*} (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{y} = (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{Z}\mathbf{x} + (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{e} \end{equation*}\] \((\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{Z}\) equals the identity matrix, thus \[\begin{equation*} (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{y} = \mathbf{x} + (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{e}\\ \end{equation*}\] Move \(\mathbf{x}\) to the right by itself, to get \[\begin{equation*} (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{y} - (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{e} = \mathbf{x} \end{equation*}\]

Let’s assume our errors, the \(\mathbf{e}\), are i.i.d. which means that \[\begin{equation*} \mathbf{e} \sim \text{MVN}\begin{pmatrix}0, \begin{bmatrix} \sigma^2&0&0&0\\ 0&\sigma^2&0&0\\ 0&0&\sigma^2&0\\ 0&0&0&\sigma^2 \end{bmatrix} \end{pmatrix} \end{equation*}\] This equation means \(\mathbf{e}\) is drawn from a multivariate normal distribution with a variance-covariance matrix that is diagonal with equal variances. Under that assumption, the expected value of \((\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{e}\) is zero. So we can solve for \(\mathbf{x}\) as \[\begin{equation*} \mathbf{x} = (\mathbf{Z}^\top\mathbf{Z})^{-1}\mathbf{Z}^\top\mathbf{y} \end{equation*}\]

Let’s try that with R and compare to what you get with lm():

y = matrix(dat$stack.loss, ncol = 1)
Z = cbind(1, dat$Air.Flow)  #or use model.matrix() to get Z
solve(t(Z) %*% Z) %*% t(Z) %*% y

            [,1]
[1,] -11.6159170
[2,]   0.6412918

coef(lm(stack.loss ~ Air.Flow, data = dat))

(Intercept)    Air.Flow 
-11.6159170   0.6412918

As you see, you get the same values.

2.2.2 Form 1 with multiple explanatory variables

We can easily extend Form 1 to multiple explanatory variables. Let’s say we wanted to fit this model: \[\begin{equation} \tag{2.4} stack.loss_i = \alpha + \beta_1 air_i + \beta_2 water_i + \beta_3 acid_i + e_i \end{equation}\] With lm(), we can fit this with

fit1.mult = lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., 
    data = dat)

Written in matrix form (Form 1), this is \[\begin{equation} \tag{2.5} \begin{bmatrix}stack.loss_1\\stack.loss_2\\stack.loss_3\\stack.loss_4\end{bmatrix} = \begin{bmatrix}1&air_1&water_1&acid_1\\1&air_2&water_2&acid_2\\1&air_3&water_3&acid_3\\1&air_4&water_4&acid_4\end{bmatrix} \begin{bmatrix}\alpha\\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix} + \begin{bmatrix}e_1\\e_2\\e_3\\e_4\end{bmatrix} \end{equation}\] Now \(\mathbf{Z}\) is a matrix with 4 columns and \(\mathbf{x}\) is a column vector with 4 rows. We can show the \(\mathbf{Z}\) matrix again directly from our lm() fit:

Z = model.matrix(fit1.mult)
Z

  (Intercept) Air.Flow Water.Temp Acid.Conc.
1           1       80         27         89
2           1       80         27         88
3           1       75         25         90
4           1       62         24         87
attr(,"assign")
[1] 0 1 2 3

We can solve for \(\mathbf{x}\) just like before and compare to what we get with lm():

y = matrix(dat$stack.loss, ncol = 1)
Z = cbind(1, dat$Air.Flow, dat$Water.Temp, dat$Acid.Conc)
# or Z=model.matrix(fit2)
solve(t(Z) %*% Z) %*% t(Z) %*% y

            [,1]
[1,] -524.904762
[2,]   -1.047619
[3,]    7.619048
[4,]    5.000000

coef(fit1.mult)

(Intercept)    Air.Flow  Water.Temp  Acid.Conc. 
-524.904762   -1.047619    7.619048    5.000000

Take a look at the \(\mathbf{Z}\) we made in R. It looks exactly like what is in our model written in matrix form (Equation (2.5)).

2.2.3 When does Form 1 arise?

This form of writing a regression model will come up when you work with dynamic linear models (DLMs). With DLMs, you will be fitting models of the form \(\mathbf{y}_t=\mathbf{Z}_t\mathbf{x}_t+\mathbf{e}_t\). In these models you have multiple \(\mathbf{y}\) at regular time points and you allow your regression parameters, the \(\mathbf{x}\), to evolve through time as a random walk.

2.2.4 Matrix Form 1b: The transpose of Form 1

We could also write Form 1 as follows: \[\begin{equation} \tag{2.6} \begin{split} \begin{bmatrix}stack.loss_1&stack.loss_2&stack.loss_3 &stack.loss_4\end{bmatrix} = \\ \begin{bmatrix}\alpha& \beta_1 & \beta_2 & \beta_3 \end{bmatrix} \begin{bmatrix}1&1&1&1\\air_1&air_2&air_3&air_4\\wind_1&wind_2&wind_3&wind_4\\acid_1&acid_2&acid_3&acid_4\end{bmatrix} + \begin{bmatrix}e_1&e_2&e_3&e_4\end{bmatrix} \end{split} \end{equation}\] This is just the transpose of Form 1. Work through the matrix algebra to make sure you understand why Equation (2.6) is Equation (2.1) for all the \(i\) data points together and why it is equal to the transpose of Equation (2.2). You’ll need the relationship \((\mathbf{A}\mathbf{B})^\top=\mathbf{B}^\top \mathbf{A}^\top\).

Let’s write Equation (2.6) as \(\mathbf{y} = \mathbf{D}\mathbf{d}\), where \(\mathbf{D}\) contains our parameters. Then we can solve for \(\mathbf{D}\) following the steps in Section 2.2.1 but multiplying from the right instead of from the left. Work through the steps to show that \(\mathbf{d} = \mathbf{y}\mathbf{d}^\top(\mathbf{d}\mathbf{d}^\top)^{-1}\).

y = matrix(dat$stack.loss, nrow = 1)
d = rbind(1, dat$Air.Flow, dat$Water.Temp, dat$Acid.Conc)
y %*% t(d) %*% solve(d %*% t(d))

          [,1]      [,2]     [,3] [,4]
[1,] -524.9048 -1.047619 7.619048    5

coef(fit1.mult)

(Intercept)    Air.Flow  Water.Temp  Acid.Conc. 
-524.904762   -1.047619    7.619048    5.000000