4 Hypothesis Testing
4.1
INTRODUCTION
In this chapter we develop a procedure for testing a linear hypothesis for
a linear regression model. To motivate the general theory given below, we consider several examples. EXAMPLE 4.1 From (1.1) we have the model logF = loge - /3logd, representing the force of gravity between two bodies distance d apart. Setting Y = log F and x = - log d, we have the usual linear model Y = /30 + /31 X + c, where an error term c has been added to allow for uncontrolled fluctuations in the experiment. The inverse square law states that /3 = 2, and we can test this by taking n pairs of observations (Xi, Yi) and seeing if the least squares line has a slope close enough to 2, given the variability in the data. 0 Testing whether a particular /3 in a regression model takes a value other than zero is not common and generally arises in models constructed from some underlying theory rather than from empirical considerations. EXAMPLE 4.2 From (1.2) we have the following model for comparing two straight lines: E[Y] = /30 + /31 Xl + /32 X 2 + /33 x 3, where /30 = 01, /31 = 1'1, /32 = 02 - 01, and /33 = 1'2 - 1'1路 To test whether the two lines have the same slope, we test /33 = 0; while to test whether the two lines are identical, we test /32 = /33 = O. Here we are interested in testing whether certain prespecified /3i are zero. 0 97
98
HYPOTHESIS TESTING
G : Yi
= /30 + fhxil +- ... + flp-1X路;,:o-1 + Ci,
or Y = X(3 + E:. When p is large We will usually be interested in considering whether we can set some of the fli equal to zero. This is the problem of model selection discussed in Chapter 12. If we test the hypothesis flr = flr+l = ... = flp-l = 0, then our model becomes H : Yi = flo
or Y = X r(3
+ e.
+ fllXil + ... + flr-lXi,r-l + Ci,
Here Xr consists of the first
l'
o
columns of X.
Examples 4.1 and 4.2 are special cases of Example 4.3 whereby we wish to test a submodel H versus the full model G. The same computer package used to fit G and obtain RSS can also be used to fit H and obtain RSSH = IIY - Xr.BHW路 We can also express the hypothesis constraints in the matrix form
0=
o o
010 001
o o
o
000
1
= A(3,
where the rows of A are linearly independent. Combining the three examples above, a general hypothesis can be expressed in the form H : A(3 == c. In the next section we develop a likelihood ratio test for testing H.
4.2
LIKELIHOOD RATIO TEST
Given the linear model G : Y = X(3 + E:, where X is n x p of rank p and E: ~ Nn(O, (72In), we wish to test the hypothesis H : A(3 == c, where A is q x P of rank q. The likelihood function for G is
L((3, (72) = (27r(72)-n/2 exp [- 2~211Y - X(3W] . In Section 3.5 we showed that the maximum likelihood estimates of (3 and (72 are.B = (X'X)-lX'Y, the least squares estimate, and {;2 = IIY - X.BW/n. The maximum value of the likelihood is given by [see equation (3.18)]
L(.B, (2) == (27r{;2)-n/2 e -n/2. The next step is to find the maximum likelihood estimates subject to the constraints H. This requires use of the Lagrange multiplier approach of Section 3.8, where we now consider l'
-
logL((3, (72) constant -
+ ((3'A' - c')>' ; (72 - 2~211Y - X(311 2 + ((3' A' -
c')>'.
F-TEST
99
Using algebra almost identical to that which led to (:JH of (3.38), we find that the maximum likelihood estimates are (:JH and 0-1ÂŁ = IIY - X(:JH11 2 /n with a maximum of A2 ) _ (2 A 2 )-n/2 -n/2 L((3 H,(jH - 7r(jH e . (4.1) A
The likelihood ratio test of H is given by (4.2)
and according to the likelihood principle, we reject H if A is too small. Unfortunately, A is not a convenient test statistic and we show in the next section that F = n - p (A -2/n _ 1) q
has an Fq,n-p distribution when H is true. We then reject H when F is too large.
4.3 4.3.1
F-TEST Motivation
Since we want to test H : A(3 = c, a natural statistic for testing this is A(:J - c; H will be rejected if A(:J is sufficiently different from c. However, not every element in A(:J should be treated the same, as they have different precisions. One way of incorporating the precision of each [3i into a a suitable distance measure is to use the quadratic (A!3 - c)'(Var[A(:Jl) -1 (A!3 - c), where Var[A!31 = (j2 A(X'X) -1 A'. If we estimate (j2 by its unbiased estimate 8 2 = RSS/(n - p), we arrive at (A(:J -- c)'[A(X'X)-l A'l-l(A(:J - c)/8 2 • We will now derive a test statistic which is a constant times this quadratic measure.
4.3.2
Derivation
Before we derive our main theorem, we recall some notation. We have RSS
= IIY - x!3W = IIY - Yl12
and where, from (3.38),
100
HYPOTHESIS TESTING
Here nss H is the minimum value of e'e subject to A(3 = c. An F-statistic for testing H is described in the following theorem. THEOREM 4.1 (i) RSSH - RSS
= IIY - Y HW = (A/3 -
c)'[A(X'X)-l A')-l(A/3 - c).
(ii)
+ (A(3 - c)'[A(X'X)-l A,]-l(A(3 (J2q + (RSSH - RSS)Y=E[Yj.
E[RSSH - RSS]
(J2q -
c)
(iii) When H is true, F = (RSSH - RSS)/q = (A/3 - c),[A(X'X)-l A']-l(A/3 - c)
RSS/(n -
p)
qS2
is distributed as Fq,n':"'p (the freedom, respectively).
F
-distribution with q and n - p degrees of
(iv) When c = 0, F can be expressed in the form F= n-pY'(P-PH)Y q Y'(I n - P)Y , where PHis symmetric and idempotent, and PH P = PP H = PH. Proof. (i) From (3.43) and (3.44) in Section 3.8.1 we have
RSS H - RSS
-
IIY - YHII2
-
(/3 - /3H)'X'X(/3 - /3H),
and substituting for /3 - /3H using equation (4.3) leads to the required result. (ii) The rows of A are linearly independent and /3 '" N p ((3,(J2(X'X)-1), so from Theorem 2.2 in Section 2.2, we get A/3 '" N q (A(3,(J2A(X'X)-lA'). Let Z = c and B = A(X'X)-l A'; then E[Z) = A(3 - c and
Ai3 -
Var[Z) = Var[A/3) =
(J2B,
Hence, using Theorem 1.5 in Section 1.5, E[RSSH - RSS)
-
[by (i)] tr((J2B- 1B) + (A(3 - c)'B- 1(A(3 - c) tr((J2Jq) + (A(3 - c)'B- 1(A(3 - c) (J2q + (A(3 - c)'B- 1(A(3 - c).
E[Z'B-1Z)
(4.4)
F-TE5T
101
(iii) From (i), RSSH - RSS is a continuous function of fj and is therefore independent of RSS [by Theorem 3.5(iii) in Section 3.4 and Example 1.11 in Section 1.5]. Also, when H is true, A/:J '" N q (c,a 2 A(X'X)-lA'), so that by Theorem 2.9, RSSH - RSS = (Afj _ c)'(Var[Afj])-l(Afj - c) a2 is X~. Finally, since RSS/a2
'"
X~-p [Theorem 3.5(iv)], we have that
F = (RSSH - RSS)/a 2 q RSS/a 2 (n - p) is of the form [x~/q]/[X~_p/(n - p)] when H is true. Hence F '" Fq,n-p when H is true. (iv) Using equation (4.3) with c = 0, we have
YH -
XfjH
_
{X(X'X)-lX' - X(X'X)-l A'[A(X'X)-l A']-I A(X'X)-IX'} Y
-
(P - PdY
(4.5)
-
PHY,
(4.6)
say, where PHis symmetric. Multiplying the matrices together and canceling matrices with their inverses where possible, we find that PI is symmetric and idempotent and PIP = PPI = Pl' Hence P~
-
p2 - PIP - PP I + pi P - 2P I +P I P -PI PH, (P - PI)P = P - PI = PH
(4.7) (4.8)
and taking transposes, PP H = PH. To complete the proof, we recall that RSS = Y'(I n - P)Y and, in a similar fashion, obtain •
IIY -X,8HII = =
Y'(I n Y'(In
2
-
PH)2y
-
PH)Y.
Thus RSSH - RSS = Y'(P - PH)Y. We note that if S'iI = (RSSH - RSS)/q, then from Theorem 4.1(ii),
E[S'iIJ _
(A,8 - c)'[A(X,X)-1 A']-I(A,8 - c) q a 2 + 6, say, a
2
+ -'--'----'-'----'---:"'---'--'---'---'-
(4.9)
0
102
HYPOTHESIS TESTING
where <5 > 0 [since A(X'X)-l AI = Var[A,B]/a 2 is positive-definite]. Also (Theorem 3.3, Section 3.3), When H is true, <5 = 0 and SiI and 8 2 are both unbiased estimates of a 2; that is, F = 8iI/S2 ~ 1. When H is false, <5 > 0 and E[SiI] > E[8 2], so that
E[F] = E[SiI]E
[;2] > E[SiI]/E[S2] >
1
(by the independence of SiI and S2, and A.13.3). Thus F gives some indication as to the "true state of affairs"; H is rejected if F is significantly large. When q > 2 it is usually more convenient to obtain RSS and RSSH by finding the unrestricted and restricted minimum values of e ' e directly. However, if q < 2, F can usually be found most readily by applying the general matrix theory above; the matrix [A(X'X)-l A'] to be inverted is only of order one or two. It can also be found directly using the fact that [A(X'X)-l A']-l = Var[A.Bl/a 2 â&#x20AC;˘ Examples are given in Section 4.3.3. It should be noted that since RSS H is unique, it does not matter what method we use for obtaining it. We could, for example, use the constraints A(3 = c to eliminate some of the (3j and then minimize e ' e with respect to the remaining (3;'s.
Part (iv) of Theorem 4.1 highlights the geometry underlying the F-test. This geometry can be used to extend the theory to the less-than-full-rank case (cf. Theorem 4.3 in Section 4.7). From aiI = RSSH/n and 17 2 = RSS/n we see that
F
n - p
-2
O'H -
a-2
q
17 2
n - p
(~iI -1) 0'2
q
n - P(A-2/n -1), q
where A is the likelihood ratio test statistic (4.2). EXERCISES 4a
1. Prove that RSSH - RSS
> o.
2. If H : A(3 = c is true, show that F can be expressed in the form n-p e'(P-PH)e q
. el(In - P)e .
F-TEST
103
3. If ~H is the least squares estimate of the Lagrange multiplier associated with the constraints A(3 = c (cf. Section 3.8), show that RSSH - RSS = (1"2~~(Var[~H])-1~H. (This idea is used to construct Lagrange multiplier tests.) 4. Suppose that we want to test A(3 = 0, where A is q x p of rank q. Assume that the last q columns of A, A2 say, are linearly independent, so that A = (AI, A 2), where A2 is a nonsingular matrix. By expressing (32 in terms of (31, find a matrix X A so that under H the linear model becomes E[Y] == XA'i'. Prove that X A has full rank.
5. Consider the full-rank model with X(3 = is n x q.
(X1,X2}ÂŤ(3~,(3~)',
where X 2
(a) Obtain a test statistic for testing H : (32 == 0 in the form of the right-hand side of Theorem 4.1(i). Hint: Use A.9.1. (b) Find E[RSS H - RSS].
4.3.3
Some Examples
EXAMPLE 4.4 Let Y1
1'2 Y3
-
a1 + e1, 2a1 - a2 a1 + 2a2
+ e2,
+ e3,
where e '" N3(0, (1"213). We now derive the F-statistic for testing H : a1 = a2. We note first that
or Y = X(3
+ e,
where X is 3 x 2 of rank 2. Also, H is equivalent to
(1, -1) (
~~
) == 0,
or A(3 = 0, where A is 1 x 2 of rank 1. Hence the theory above applies with n = 3, p = 2, and q = 1. The next step is to find XiX = (
~
2 -1
-~ ) ~ ( 6o 05 ).
104
HYPOTHESIS TESTING
Then
and from equation (3.9), RSS
y'y - (3 'X'X(3
-
- Y? + y 22 + Y; -
6&f - 5&~ ..
We have at least two methods of finding the F-statistic. Method 1
A(X'X)-1 A' = (1, -1) (
g
and
F -
where S2 = RSS/(n - p) = RSS. When H is true, F,..., Fq,n-p = FI,I' Method 2 Let a1 = a2 = a. When H is true, we have
e'e = (Y1
-
a)2
+ (Y2 -
and 8e'el8a = 0 implies that &H = RSSH = (YI
-
&H)2
and F
a)2
+ (Y3
-
3a)2
1\ (YI + Y2+ 3Y3 ).
Hence
+ (1'2 - &H)2 + (1'3 - 3&H)2
= RSSH - RSS RSS
.
(4.10)
o
EXAMPLE 4.5 Let Ul> . .. , Un, be sampled independently from N(J.LI, (T2), and let VI, ... , Vn2 be sampled independently from N(J.L2, (T2). We now derive a test statistic for H : J.L1 = J.L2. Writing (i = 1,2, ... , n1)
F-TEST
105
and
Vj = /.L2
+ cn,+j
we have the matrix representation
U1 U2
1 1
0 0
Un,
1
0
V2
0 0
1 1
Vnz
0
1
-
VI
(4.11)
where n = nl + n2' Thus our model is of the form Y = X(3 + e, where X is n x 2 of rank 2 and e '"" Nn(O, cr2In). Also, as in Example 4.4, H takes the form A(3 = 0, so that our general regression theory applies with p = 2 and q = 1. Now
X'X _ (nl -
0)
0
n2
'
so that
)
~
(X'X)-'X'Y A/3
1,) (
~ (~
= fJ,2 -
fJ,2
)=(~),
= U - v,
and
RSS
-
y'y - /3 'x'x/3
- L: ul + L: v/- nl u i
-
2 -
n2
v2
j
' " L)Ui
- U)2
" + 'L-(V; - V)2 .
i
j
Also,
so that the F -statistic for H is F
=
(A/3)'[A(X'X)-l A']-l A/3 qS2
(U-V?
(4.12)
106
HYPOTHESIS TESTING
where 52 = RSS/(n - p) = RSS/(ni + n2 - 2). When H is true, F "" F 1 ,n,+n2- 2 . Since, distribution-wise, we have the identity F1,k t%, the F-statistic above is the square of the usual t-statistic for testing the difference of two normal means (assuming equal variances). 0
=
EXAMPLE 4.6 Given the general linear model
(i - 1,2, ... ,n), we can obtain a test statistic for H : {3j = c, where j We first need the following partition:
(X/X)-l = (
:u
m' )
D
> O.
'
where l is 1 x 1. Now H is of the form a' (3 = c, where a' is the row vector with unity in the (j + l)th position and zeros elsewhere. Therefore, using the general matrix theory, a'(X/X)-la = d jj (the jth diagonal element of D), a ' jJ - c = {lj - c, and the F-statistic is F =
({lj -
5 2d
C)2 jj
(4.13)
,
which has the F1,n-p distribution when H is true. As in Example 4.5, F is again the square of the usual t-statistic. The matrix D can be identified using the method of A.9 for inverting a partitioned symmetric matrix. Let In be an n x 1 column vector of l's and let x' = (X'l, X.2, ... ,X.p-l)' Then X = (ln, Xd,
X'X
= ( nx ~
),
and by A.9.1,
1 +-/y-l( X'x)-I = ( n x - l x, - Y x, where Y = (Vjk) = X~Xl -
nxx'
_x'y-l V-I
),
(4.14)
and
i
2:(Xij - X.j)(Xik - X.k).
(4.15)
i
Thus D is the inverse of Y, where Y is the matrix of corrected sums of squares and products of the x's. In the notation of Section 3.11, Y = X'X. Similar examples are considered in Section 9.7. 0
F-TEST
107
EXAMPLE 4.7 Suppose that in Example 4.6 we want to test H : a' (3 = c .. Then q = 1, and (a'(3-c)2
F= s2a'(X'X)-la , which is distributed as F 1 ,n-p when H is true. Again this is the square of usual t-statistic, which we can also derive directly as follows. By Theorem 2.2, a'(3 '" N(a'(3,a 2 a'(X'X)-l a ), so that
Ui =
a' (3 - a' (3 '" N(O 1). a {a'(X'X)-l a }1/2 '
Also, by Theorem 3.5 (Section 3.4), V = (n - p)8 2 /a 2 '" X~-p, and since 8 2 is statistically independent of (3, V is independent of U. Hence
=
T
U
ylV/(n - p) a' (3 - a' (3 S{a'(X'X)-la}1/2
(4.16)
has the t n - p distribution. To test H : a' (3 = c we set a' (3 equal to c in T and reject H at the 0 level of significance if ITI > t~l!;)"'j here t~r.!.;)'" is the upper 0/2 point of the t n - p distributionj that is, pr(T > t~!;)"') = 0/2. Alternatively, we can construct a 100(1 - 0)% confidence interval for a' (3, namely, (4.17) or since 8 2 {a'(X'X)-la} is an unbiased estimate of a 2a'(X'X)-la (the variance of a' (3), '(3' Âą t(l/2)"" . a n-p a a' (:J' and see if the interval above contains c. 4.3.4
say,
(4.18)
o
The Straight Line
Let Yi = (30 + (31xi H : (31 = c. Then X X'X = (
n..:,
nx,
+ Ci (i
= 1,2, ... , n), and suppose that we wish to test = (In' x),
nx
Ex~
)
,
(X'X)-l
and X'y= (
= E(Xi1-
).
x)2
108
HYPOTHESIS TESTING
Also, from
13 = (X'X)-lX'Y
Bo -
Y -
we have, after some simplification,
BlX,
E Yi(Xi
- x) E(Yi - Y)(Xi - x) E(Xi - X)2 E(Xi - X)2
and
Yi - Bo + BlXi Y + Bl(Xi - x). (Actually, Bo and Bl can be obtained more readily by differentiating e'e with respect to f30 and f3l.) Finally, from Example 4.6 with p = 2, the F-statistic for testing H is given by F =
(Bl - C)2 8 2dl1
(Bl - C)2
=
8 2/
,
E(Xi - X)2
(4.19)
where
(4.20) (4.21) We note from (4.21) that 'L)Yi "'
(4.22)
- Y)2
( 4.23) where
E(Yi - y)2 E(Yi - Y)2
B? E(Xi -
X)2 E(Yi - y)2
[E(Yi - Y)(Xi - X)]2 E(Yi - y)2 E(Xi - X)2
(4.24)
is the square of the sample correlation between Y and x. Also, r is a measure of the degree of linearity between Y and x since, from (4.23),
RSS
-
'L...-(Yi "'
â&#x20AC;˘ 2 - Yi)
-
(1 - r ) L...- (Yi - Y) ,
2,",
-2
(4.25)
F-TEST
109
so that the larger the value of r2, the smaller RSS and the better the fit of the estimated regression line to the observations. Although 1 - r2 is a useful measure of fit, the correlation r itself is of doubtful use in making inferences. Thkey [1954] makes the provocative but not unreasonable statement that "correlation coefficients are justified in two and only two circumstances, when they are regression coefficients, or when the measurement of one or both variables on a determinate scale is hopeless." The first part of his statement refers to the situation where X and Y have a bivariate normal distribution; we have (Example 2.9) E[YIX = x]
_
J-ty
+ /JY (x -
f30
+ f31X,
Ux
J-tx)
and when u3c = u}, f31 = p. One area where correlation coefficients are widely used, and determinate scales seem hopeless, is in the social sciences. Here the measuring scales are often completely arbitrary, so that observations are essentially only ranks. A helpful discussion on the question of correlation versus regression is given by Warren [1971]. We note that when c = 0, the F-statistic (4.19) can also be expressed in terms of r2. From equation (4.25) we have
so that F
B~ E(Xi - x)2(n - 2)
(1 - r 2) E(Yi - y)2 r2(n-'--2) 1- r2 .
The usual t-statistic for testing f31 = 0 can also be expressed in the same form, namely, r (4.26) T = -V/T.(l;==r=;;2'F)/;'7'(n=~2) EXERCISES 4b
Yi == f30 + f31xil + '" + f3p-1Xi,p-l + Ci, i = 1,2, ... , n, where the are independent N(0,u 2 ). Prove that the F-statistic for testing the hypothesis H : f3q = f3 q+l == ... = f3 p-l = 0 (0 < q < p-l) is unchanged if a constant, c, say, is subtracted from each Yo.
1. Let Ci
2. Let Yi == f30 N(O, ( 2 ).
+ f31Xi + ci, (i =
1,2, ... , n), where the ci are independent
(a) Show that the correlation coefficient of
Bo and Bl is -nx/(nJE x~).
110
HYPOTHESIS TESTING
(b) Derive an F-statistic for testing H : f30 = 0. 3. Given that x = 0, derive an F-statistic for testing the hypothesis H : f30 = f31 in Exercise No.2 above. Show that it is equivalent to a certain t-test. 4. Let
Y1 Y2
-
(}1
+ (}2 + C1,
2(}2 + C2,
and
Y3 =
-(}1
+ (}2 + C2,
where the ci (i = 1,2,3) are independent N(O, (72). Derive an F-statistic for testing the hypothesis H : (}1 = 2(}2. 5. Given Y = O+e, where e '" N4(O, (7214) and (}1 +(}2 +(}3 +(}4 that the F -statistic for testing H : (}1 = (}3 is
4.4
= 0, show
MULTIPLE CORRELATION COEFFICIENT
For a straight line, from equation (4.25) we have
Thus, r2 is a measure of how well the least squares line fits the data. Noting that Pi = [30 + [31Xi = Y + [31(Xi - x), we have . r
=
[3d(L:(Yi - Y)2(L:(Xi - x)2]1/2 L:(Yi - Y)(Pi - Y)
which is the correlation coefficient of the pairs (Yi , Pi). To demonstrate this, we note~hat L:(Yi - Pi) = L:[Yi - Y i - [31 (Xi - x)] = 0, so that the mean of the Pi, Y say, is the same as Y. This reformulation of r suggests how we might generalize this measure from a straight line to a general linear model. We can now define the sample
MULTIPLE CORRELATION COEFFICIENT
111
multiple correl"at-ion coefficient R as the correlation coefficient of the pairs CYi, Pi), namely,
C4.27)
The quantity R2 is commonly called the coefficient of determination. We now prove a useful theorem that generalizes equations C4.22) and C4.24).
THEOREM 4.2 (i) '"
-2",
A2
L.,..CYi - Y) = L.,..CYi - Yi) i
i
,A_ 2 +" L.,..(Yi - Y) .
i
(ii) A
_
Proof. Ci)
Y=
2
-
ECYi - Y) ECYi - y)2 1RSS ECYi - y)2
PY, so that
Y'y = Y'p 2 y
= Y'PY = y'Y.
C4.28)
Also, by differentiating EiCYi - /10 - /1ixil - ... - {3P-ixi,p_d 2 with respect to {30, we have one of the normal equations for /1, namely, ECYi - /10 - /1iXii - ... - {3p-iXi,p-d = 0 or EcYi -
Pi) = o.
C4.29)
i
Hence EcYi - y)2
Vi + Pi _ y)2
-
ECYi -
-
ECYi - "fi)2
+ EcYi - y)2,
since
~)Yi -
Yi){Yi -
Y)
ECYi -
Vi)Yi
-
CY-Y)'Y
-
0
[by equation C4.29»)
[by equation C4.28»).
112
HYPOTHESIS TESTING
(ii) From equation (4.29), we get
Y=
Y, so that
z)Yi - Y) (Yi - Y) 2)Yi - Yi + Yi - Y)(Yi 2)Yi - y)2,
Y)
and the required expression for R2 follows immediately from (4.27). The second expression for R2 follows from (i). 0 From the theorem above, we have a generalization of (4.25), namely,
(4.30) and the greater the value of R2, the closer the fit of the estimated surface to the observed data; if Yi = Yi, we have a perfect fit and R2 = 1, otherwise R2 < 1. When there is just a single x-regressor then R2 = r2. By writing P = X(X'X)-X', where (X'X)--: is a generalized inverse of X'X, we find that the theorem above still holds even when X is not of full rank. Alternatively, we can write P = Xl (X~ Xl) -1 X~, where Xl is the matrix of linearly independent columns of X.
EXAMPLE 4.8 Given the linear model Yi = !3o + !31Xi1 + ... + !3p-lxx,p-l + E:i (i = 1,2, ... , n), suppose that we wish to test whether or not the regression on the regressor variables is significant; that is, test H : !3l = !32 = ... = !3p-l = O. Then H takes the form Af3 = 0, where A = (0, Ip-d is a (P-1) xp matrix of rank p - 1, so that the general regression theory applies with q = p - 1. We therefore find that minimum 2)Yi f30
.
-
(30)2
â&#x20AC;˘
L..,.(Yi -
Y)2 ,
"
and by Theorem 4.2 and (4.30),
F
RSS)/(P - 1) RSS/(n - p)
(RSS H
-
2:(Yi - Ji')2 - 2:(Yi - Yi)2 n -
p p-1
RSS
2:(Yi - Y)2 (1 - R2) 2:(Yi R2 n-p 1 - R2 P - l'
where F
~
Fp-1,n-p when H is true.
- y)2
n - p p - 1
(4.31)
CANONICAL FORM FOR H
113
The statistic F provides a test for "overall" regression, and we reject H if F > F;-l,n-p, F;'-l,n_p being the upper a point for the Fp-1,n-p distribution. If we reject H, we say that there is a significant regression and the Xij values cannot be totally ignored. However, the rejection of H does not mean that the fitted equation Y = xi:J is necessarily adequate, particularly for predictive purposes. Since a large R2 leads to a large F statistic, a working rule suggested by Draper and Smith [1998: p. 247) for model adequacy is that the observed F-ratio must be at least four or five times F;_l,n_p. 0 EXERCISES 4c 1. Suppose that {31
= {32 = ... = (3p-l = O.
Find the distribution of R2
and hence prove that E[R2) = P - 1.
n-1
2. For the general linear full-rank regression model, prove that R2 and the F-statistic for testing H : {3j = 0 (j =I 0) are independent of the units in which the }Ii and the Xij are measured. . 3. Given the full-rank model, suppose that we wish to test H : {3j = 0, j =I O. Let Rk be the coefficient of determination for the model with {3j = o. (a) Prove that the F-statistic for testing H is given by
R2 - Rk F =
1- R2
n-p
.
1
(This result shows that F is a test for a significant reduction in R2.) (b) Deduce that R2 can never increase when a {3 coefficient is set equal to zero.
4.5
CANONICAL FORM FOR H
Suppose that we wish to test H : A(3 = 0, where A is q x p of rank q, for the full-rank model Y = X(3 + g. Since A has q linearly independent columns, we can assume without loss of generality (by relabeling the (3j if necessary) that these are the last q columns; thus A = (AI, A 2 ), where A2 is a q x q nonsingular matrix. Partitioning (3 in the same way, we have
114
f-JYPOT!-JES!S TESTiNG
and multiplying tirolig~c by Ail leads to
(4.32) This means that under the hypothesis H, the regression model takes the "canonical" form
Xf3
-
(Xl, X 2)f3 X l f31 + X 2f32 (Xl - X 2 A 21 Adf31
(4.33)
XH'Y,
say, where XH is n x (p - q) of rank p - q and 'Y = linearly independent columns since
f3I.
The matrix X H has
By expressing the hypothesized model H : E[Y] = XH'Y in the same form as the original model E[Y] = Xf3, we see that the same computer package can be used for calculating both RSS and RSSH, provided, of course, that XH can be found easily and accurately. If X H is not readily found, then the numerator of the F -statistic for testing H can be computed directly using the method of Section 11.11. We note that q = rank(X) - rank(XH). One very simple application of the theory above is to test H : f32 = OJ XH is simply the first p - q columns of X. Further applications are given in Section 6.4, Chapter 8, and in Section 4.6. EXERCISES 4d
1. Express the hypotheses in Examples 4.4 and 4.5 in canonical form. 2. Suppose that we have nI observations on WI, W2, the model Ui
(1)
(1)
= 10 + II
Wi!
(1)
+ ... + Ip-I Wi,p-l + 'f/i
•.• , Wp-I
and U, giving
(i=I,2, ... ,nd·
We are now given n2 (> p) additional observations which can be expresl'led in the same way, namely, U·• = '"'1(2) ,0 (i =
+ '"'I(2)W'I W' 1 +..,. , 1 ' + ... + '"'1(2) ,p-I ',p." nI + 1, n2 + 2, ... , nI + n2)'
Derive an F-statistic for testing the hypothesis H that the additional observations come from the same model. , "
GOODNESS-OF-FIT TEST
4.6
115
GOODNESS-Of-FIT TEST
Suppose that for each set of values taken by the regressors in the model (4.34) we have repeated observations on Y, namely, (4.35) where E[cir) = 0, var[cir) = (12, r = 1,2, ... ,Ri, and i = 1,2, ... ,n. We assume that the R;, repetitions Yir for a particular set (XiI, ... , Xi,p-l) are genuine replications and not just repetitions of the same reading for Yi in a given experiment. For example, if p = 2, Y is yield and Xl is temperature, then the replicated observations Yir (r = 1,2, ... , Ri) are obtained by having Ri experiments with Xl = XiI in each experiment, not by having a single experiment with Xl = XiI and measuring the yield Ri times. Clearly, the latter method would supply only information on the variance of the device for measuring yield, which is just part of the variance (12; our definition of (12 also includes the variation in yield between experiments at the same temperature. However, given genuine replications, it is possible to test whether the model (4.34) is appropriate using the F-statistic derived below. Let lir = cPi + cir, say. Then writing Y' = (Yll , Y I 2, â&#x20AC;˘.. , YIR1 ,¡ .. , Ynl , Yn2 , . .. , YnRn ), etc., we have Y = Wl/J
+ e,
Wl/J =
where IRl 0
0 IR.
0 0
cPI cP2
0
0
IRn
cPn
(4.36)
Defining N = ~iR;" then W is an N x n matrix of rank n; we also assume that e '" NN(0,(12IN). Now testing the adequacy of (4.34) is equivalent to testing the hypothesis
(i = 1,2, ... ,n) or H : l/J = X(3, where X is n x p of rank p. We thus have the canonical form (cf. Section 4.5) E[Y) = WX(3. We note in passing that H can be converted into the more familiar constraint equation form using the following lemma.
LEMMA l/J E C(X) if and only if of rank n - p. Proof. Let P = X(X'X)-IX'. If (3, then (In - P)l/J == (In - P)X(3 if (In - P)l/J = 0, then l/J = Pl/J =
Al/J = 0 for some (n - p) x n matrix A l/J E C(X), that is, l/J = X(3 for some = 0 [by Theorem 3.1(iii)). Conversely, X(X'X)-IX'l/J = X, E C(X). Hence
116
HYPOTHESIS TESTING
¢ E C(X) if and only if (In - P)¢ = o. By Theorem 3.1(ii) the n x n matrix In - P has rank n - p and therefore has n - p linearly independent rows which we can take as our required matrix A.
0
Using the Lemma above or the canonical form, we see that the general regression theory applies to H, but with n, p, and q replaced by N, n, and n - p, respectively; hence F = (RSSH
RSS)/(n - p) . RSS/(N - n) -
Here RSS is found directly by minimizing L:i L:r (Yir - ¢) 2. Thus differentiating partially with respect to ¢, we have
rpi A
= L:HiYir ::::: -Y i .
and
" 'L....J(Yir " RSS = 'L....J - -Yd 2.
To find RSSH we minimize L:i L:r(Yir -!3o - !31Xil - ... - !3P_lxi,P_l)2 (= d, say). Therefore, setting 8d/8[Jo ::::: 0 and 8d/8!3j ::::: 0 (j =f. 0), we have
L Ri(Yi. -
!3o - !31xil - ... - !3p-lXi,p-t} = 0
(4.37)
i
and
LL i
Xij (Yir - !3o - !31Xil - ... - !3p-lXi,p-I) ::::: 0
(j = 1,2, ... ,p-1),
r
that is, LRiXij(Y i . -!3o - !31Xil _ ... - !3p-lXi,p-t} = O.
(4.38)
i
Since equations (4.37) and (4.38) are identical to the usual normal equations, except that Yi is replaced by Zi = Y i., we have
and
4.7
F-TEST AND PROJECTION MATRICES
The theory of Theorem 4.1 can be generalized to the case when X has less than full rank and the rows of A in testing H : A(3 == 0 are linearly dependent, so that some of the hypothesis constraints are redundant. However, the algebra involves the use of generalized inverses, and the resulting formulation is not the one used to actually carry out the computations. Theorem 4.1(iv) suggests
F-TEST AND PROJECTION MATRICES
117
that a more elegant approach is to use projection matrices. To set the scene, suppose that we have the model Y = (J + e, where (J E n (an r-dimensional subspace of ~n), and we wish to test H : (J E w, where w is an (r - q)dimensional subspace of n. Then we have the following theorem.
THEOREM 4.3 When H is true and e '" Nn(O, (72In),
F == (RSS H - RSSVq RSS/(n - r
=
e'(Pn - pw~e/q ) '" Fq,n-r, e'(I n - Pn)e/ n - r
where Pn and P ware the symmetric idempotent matrices projecting nand w, respectively (Appendix B).
~n
onto
Proof. 0 = P n Y and OH = P wY are the respective least squares estimates of (J, so that RSS = IIY - 011 2 = Y'(I n - Pn)Y
and
RSS H = Y'(I n - Pw)Y. Also, (In - Pn)(J = 0 (since (J En), which implies that RSS == (Y - (J)'(I n
-
Pn)(Y - (J) = e'(I n
-
Pn)e.
Similarly, when H is true, (J E wand
RSS H = e'(I n - Pw)e. Now (In - Pn) and (Pn - P w ) project onto n.l and w.l n n (by B.1.6 and B.3.2), so that these matrices are symmetric and idempotent (B.1.4) and have ranks n - r and r - (r - q) == q by B.1.5. Since PnPw = P w we have (In Pn)(Pn - P w ) = O. Hence by Theorem 2.7 and Example 2.12 in Section 2.4, e'(Pn - P w )e/(72 and e(In - Pn)e/(72 are independently distributed as X~ and X;-r, respectively. Thus F '" Fq,n-r. 0 It is readily seen that Theorem 4.1(iv) is a special case of the above; there n = C(X), and when c = 0, w = N(A(X'X)-lX') n n. MISCELLANEOUS EXERCISES 4
1. Aerial observations Y1 , Y 2 , Y3 , and Y4 are made of angles (h, (h, (}3, and (}4, respectively, of a quadrilateral on the ground. If the observations are subject to independent normal errors with zero means and common variance (72, derive a test statistic for the hypothesis that the quadrilateral is a parallelogram with (}1 = (}3 and (}2 == (}4. (Adapted from Silvey [1970].) 2. Given the two regression lines
(k = 1,2; i = 1,2, ... ,n),
118
HYPOTHESIS TESTING
show that the P-"tatistic for teS"Cirlg H : (31 = (32 can be put in the form
3" Show that the usual full-rank regression model and hypothesis H : Af3 = o can be transformed to the model Z = Jl. + 'lJ, where J.tP+l = J.tp+2 = ... = J.tn = 0 and 'lJ '" Nn(O, (7 2I n), and the hypothesis H : J.t1 = J.t2 = ... = J.tq = O. Hint: Choose an orthonormal basis of p - q vectors
for C(XA)' where XA is defined in Exercises 4a, No.4; extend this to an orthonormal basis {01,02, ... ,Op} for C(X); and then extend once more to an orthonormal basis {01, 02, ... , On} for ~n. Consider the transformation Z = Tty, where T = (01,02, ... , On) is orthogonal. {Oq+l,Oq+2, ... ,Op}
4. A series of n + 1 observations Yi (i = 1,2, ... , n + 1) are taken from a normal distribution with unknown variance (72. After the first n observations it is suspected that there is a sudden change in the mean of the distribution. Derive a test statistic for testing the hypothesis that the (n + 1)th observation has the same population mean as the previous observations.