## 2.8 Models with confounded parameters*

Try adding region as another factor in your model along with quarter and fit with lm():

coef(lm(stack.loss ~ -1 + Air.Flow + reg + qtr, data = fulldat))
  Air.Flow       regn       regs    qtrqtr2    qtrqtr3    qtrqtr4
1.066524 -49.024320 -44.831760  -3.066094   3.499428         NA 

The estimate for quarter 1 is gone (actually it was set to 0) and the estimate for quarter 4 is NA. Look at the $$\mathbf{Z}$$ matrix for Form 1 and see if you can figure out the problem. Try also writing out the model for the 1st plant and you’ll see what part of the problem is and why the estimate for quarter 1 is fixed at 0.

fit = lm(stack.loss ~ -1 + Air.Flow + reg + qtr, data = fulldat)
Z = model.matrix(fit)

But why is the estimate for quarter 4 equal to NA? What if the ordering of north and south regions was different, say 1 through 4 north, 5 through 8 south, 9 through 12 north, etc?

fulldat2 = fulldat
fulldat2\$reg2 = rep(c("n", "n", "n", "n", "s", "s", "s", "s"),
3)[1:21]
fit = lm(stack.loss ~ Air.Flow + reg2 + qtr, data = fulldat2)
coef(fit)
(Intercept)    Air.Flow       reg2s     qtrqtr2     qtrqtr3     qtrqtr4
-45.6158421   1.0407975  -3.5754722   0.7329027   3.0389763   3.6960928 

Now an estimate for quarter 4 appears.

The problem is two-fold. First by having both region and quarter intercepts, we created models where 2 intercepts appear for one $$i$$ model and we cannot estimate both. lm() helps us out by setting one of the factor effects to 0. It will chose the first alphabetically. But as we saw with the model where odd numbered plants were north and even numbered were south, we can still have a situation where one of the intercepts is non-identifiable. lm() helps us out by alerting us to the problem by setting one to NA.

Once you start developing your own models, you will need to make sure that all your parameters are identifiable. If they are not, your code will simply `chase its tail’. The code will generally take forever to converge or if you did not try different starting conditions, it may look like it converged but actually the estimates for the confounded parameters are meaningless. So you will need to think carefully about the model you are fitting and consider if there are multiple parameters measuring the same thing (for example 2 intercept parameters).