3 Linear Regression: Estimation and Distribution Theory
3.1
LEAST SQUARES ESTIMATION
Let Y be a random variable that fluctuates about an unknown parameter 1}i that is, Y = r} + c, where c is the fluctuation or error. For example, c may be a "natural" fluctuation inherent in the experiment which gives rise to 1}, or it may represent the error in measuring r}, so that r} is the true response and Y is the observed response. As noted in Chapter 1, our focus is on linear models, so we assume that r} can be expressed in the form r}
=
(30
+ (31Xl + ... + (3p-1X p - l ,
where the explanatory variables Xl,X2, ••• ,Xp - l are known constants (e.g., experimental variables that are controlled by the experimenter and are measured with negligible error), and the (3j (j = O,I, ... ,p - 1) are unknown parameters to be estimated. If the Xj are varied and n values, Y1 , Y2 , ... , Yn , of Yare observed, then
(i = 1,2, . .. ,n), where have
Xij
(3.1)
is'the ith value of Xj. Writing these n equations in matrix form, we
Y1 Y2
XlO
Xu
X12
Xl,p-l
(30
Cl
X20
X2l
X22
X2,p-1
(31
c2
Yn
]XnO
Xnl
Xn2
Xn,p-l
(3p-l
+
cn
or
Y = X{3+c:,
(3.2) 35