Appendix A Some Matrix Algebra
A.1
TRACE AND EIGENVALUES
Provided that the matrices are conformable: A.I.I. tr(A + B)
= tr A + tr B.
A.I.2. tr(AC) = tr(CA). The proofs are straightforward. A.I.3. If A is an n x n matrix with eigenvalues Ai (i n
tr(A)
=
L Ai ~l
=
1,2, ... ,n), then n
and
det(A)
= II Ai. ~l
Proof. det(Al n - A) = I1i(A - Ai) = An - An- l (AI + A2 + ... An) + ... + (_l)n AlA2路路路 An. Expanding det(Al n - A), we see that the coefficient of An- l is -(all + a22 + ... + ann), and the constant term is det( -A) = (_l)n det(A). Hence the sum of the roots is tr(A), and the product det(A). A.1.4. (Principal axis theorem) If A is an n x n symmetric matrix, then there exists an orthogonal matrix T = (tl, t2, ... ,t n ) such that T' AT = A, 457
458
SOME MATRIX ALGEBRA
where A = diag(A1,A2, ... ,A n ). Here the Ai are the eigenvalues of A, and Ati = Aiti. The eigenvectors ti form an orthonormal basis for ~n. The factorization A = TAT' is known as the spectral decomposition of A.
In the next three results we assume that A is symmetric. A.1.5. tr(AS) = L~==l Af. A.1.6. If A is nonsingular, the eigenvalues of A -1 are Ail (i hence tr(A -1) = L~=l Ail. A.1.7. The eigenvalues of (In + cA) are 1 +
CAi
=
1, ... , n), and
(i = 1, ... ,n).
Proof. Let A = TAT' be the spectral decomposition of A. Then AS = T'ATT'AT路路路T'AT = T'AsT. We again apply A.1.3. When A is nonsingular, T'A- 1T = (T'AT)-l = A-I and the eigenvalues of A- 1 are A- l . We then apply A.1.3. Also, T'(In+cA)T = In+cT' AT = In + cA, which is a diagonal matrix with elements 1 + CAi.
A.2
RANK
A.2.1. If A and B are conformable matrices, then rank(AB) < minimum(rankA,rankB). Proof. The rows of AB are linear combinations of the rows of B, so that the number of linear independent rows of AB is less than or equal to those of B; thus rank(AB) < rank(B). Similarly, the columns of AB are linear combinations of the columns of A, so that rank(AB) < rank(A).
A.2.2. If A is any matrix, and P and Q are any conformable nonsingular matrices, then rank(PAQ) = rank(A). Proof. rank(A) < rank(AQ) < rank(AQQ-l) = rank(A), so that rank(A) = rank(AQ), etc.
A.2.3. Let A be any m x n matrix such that r = rank(A) and 8 = nullitY(A), [the dimension of N(A), the null space or kernel of A, i.e., the dimension of {x : Ax = O}]. Then r + s = n. Proof. Let al, a2, ... , as be a basis for N(A). Enlarge this set of vectors to give a basis a1, a2, ... , as, f31, f32, ... , f3t for ~n, n-dimensional
RANK
459
Euclidean space. Every vector in C(A), the column space of A (the space spanned by the columns of A), can be expressed in the form Ax
A
(t,~o, + t,bj~j) t
A Lbj{3j j=1
j=1
say. Now suppose that t
L Cj"lj = 0; j",,1
then t
= LCnj =0 j=1
and L Cj{3j E N(A). This is possible only if Cl = C2 = ... = Ct = 0, that is, "11, "12, ... ,"It are linearly independent. Since every vector Ax in C(A) can be expressed in terms of the "I/s, the "Ii's form a basis for C(A); thus t = 1'. Since s + t = n, our proof is complete. A.2.4. rank(A) = rank(A/) = rankeA' A) = rank(AA /). Proof. Ax = 0 ==} AI AX = 0 and AI AX = 0 ==} Xl AI AX = 0 ==} AX = O. Hence the nulls paces of A and AI A are the same. Since A and AI A have the same number of columns, it follows from A.2.3 that rank(A) = rank(A IA). Similarly, rank(A /) = rank(A IA) and the result follows. A.2.5. If C(A) is the column space of A (the space spanned by the columns of A), then C(A'A) = C(A/). Proof. AI Aa = Alb for b = Aa, so that C(AI A) C C(A/). However, by A.2.4, these two spaces must be the same, as they have the same dimension.
A.2.6. If A is symmetric, then rank(A) is equal to the number of nonzero eigenvalues. Proof. By A.2.2, rank(A) = rank(T IAT) = rank(A).
460
SOME MATRIX ALGEBRA
A.2.7. Any n x n symmetric matrix A has a set of n orthonormal eigenvectors, and C(A) is the space spanned by those eigenvectors corresponding to nonzero eigenvalues. Proof. From T' AT = A we have AT = TA or Ati = Aiti, where T = (ti, ... , t n ); the ti are orthonormal, as T is an orthogonal matrix. Suppose that Ai = 0 (i = r + 1, r + 2, ... , n) and x = L:~=1 aiti. Then
Ax
n
n
r
T
i=1
i=l
i=1
i=l
= A 2: aiti = 2: aiAti ::::: 2: aiAiti = 2: biti,
and C(A) is spanned by t 1, t 2 , ... , t T â&#x20AC;˘
A.3
POSITIVE-SEMIDEFINITE MATRICES
A symmetric matrix A is said to be positive-semidefinite1 (p.s.d.) if and only if x' Ax > 0 for all x. A.3.1. The eigenvalues of a p.s.d. matrix are nonnegative. Proof. If T' AT ::::: A, then substituting x = Ty, we have x' Ax y'T' ATy = A1Yf + ... + AnY~ > O. Setting Yj = c5 ij leads to 0 < x' Ax = Ai.
A.3.2. If A is p.s.d., then tr(A) > O. This follows from A.3.1 and A.1.3. A.3.3. A is p.s.d. of rank r if and only if there exists an n x n matrix R of rank r such that A = RR'. Proof. Given A is p.s.d. of rank r, then, by A.2.6 and A.3.1, A = diag(A1,A2, ... ,A n O, ... ,0), where Ai> (i = 1,2, ... ,r). Let A1/ 2 = diag(Ai/ 2 , A~/2, . .. , A;/2, 0, ... ,0), then T' AT ::::: A implies that A = TA 1/ 2A1/2T' = RR', where rankeR) = rank(A1/2) = r. Conversely, if A = RR', then rank(A) = rankeR) = r (A.2.4) and x' Ax = x'RR'x = y'y > 0, where y = R'x.
°
A.3.4. If A is an n x n p.s.d. matrix of rank r, then there exists an n x r matrix S of rank r such that S' AS = IT' Proof. From
T'AT =
(Ao
~)
we have T~AT1 = AT' where T1 consists of the first r columns of T. Setting S = T1A;/2 leads to the required result. 1 Some
authors use the term nonnegative-definite.
POSITIVE-DEFINITE MATRICES
A.3.5. If A is p.s.d., then X' AX = 0
=}
461
AX = O.
Proof. From A.3.3, 0 = X' AX = X/RR/X = B'B (B = R/X), which implies that b~bi = OJ that is, b i = 0 for every column b i of B. Hence AX=RB=O.
A.4
POSITIVE-DEFINITE MATRICES
A symmetric matrix A is said to be positive-definite (p.d.) if x' Ax > 0 for all x, x f= O. We note that a p.d. matrix is also p.s.d. A.4.l. The eigenvalues of a p.d. matrix A are all positive (proof is similar to A.3.1)j thus A is also nonsingular (A.2.6). A.4.2. A is p.d. if and only if there exists a nonsingular R such that A = RR/. Proof. This follows from A.3.3 with r = n. A.4.3. If A is p.d., then so is A -1. Proof. A- 1 = (RR,)-l = (R / )-lR- 1 = (R- 1 ),R- 1 = SS/, where S is nonsingular. The result then follows from A.4.2 above. A.4.4. If A 1s p.d., then rank(CAC') - rank(C). Proof. rank(CAC')
-
rank(CRR/C /) rank(CR) (by A.2.4) rank(C) (by A.2.2).
A.4.5. If A is an n x n p.d. matrix and C is p x n of rank p, then CAC' is p.d. Proof. x/CAC/x = y' Ay > 0 with equality {:} y = 0 {:} C/X = o {:} x = 0 (since the columns of C' are linearly independent). Hence x'CAC/x> 0 all x, x f= o. A.4.6. If X is n x p of rank p, then X'X is p.d. Proof. X/X/XX = y/y > 0 with equality {:} Xx columns of X are linearly independent).
=
0 {:} x = 0 (since the
A.4.7. A is p.d. if and only if all the leading minor determinants of A [including det(A) itself] are positive. Proof. If A is p.d., then det(A) = det(TAT') = det(A) =
II Ai > 0 i
(byA.4.1).
462
SOME MATRIX ALGEBRA
Let
A,
~ ("~'
~~:
arl
arr
)
( ) Xl
and
Xr
=
Xr
then
x~Arxr = (x~, O')A ( x~
) >0
for
Xr
i- 0
and Ar is positive definite. Hence if A is n x n, it follows from the argument above that det(Ar) > 0 (r = 1,2, ... , n). Conversely, suppose that all the leading minor determinants of A are positive; then we wish to show that A is p.d. Let A = (
~n-I' c ,
C
ann
)
and
R- (In-I, 0' ,
a -1
),
where a = A;:;-':'I c. Then R'AR- ( An-I, 0' ,
~ ),
where k
= det(R' AR)/ det(A n _ l ) = det(R)2 det(A)/ det(A n _ l ) > 0,
since R is nonsingular. We now proceed by induction. The result is trivially true for n = 1; assume that it is true for matrices of orders up to n - l. If we set y = R-I X (x i- 0), x'AX = y'R'ARy = y~-IAn-IYn-1 +ky; > 0, since A n - l is p.d. by the inductive hypothesis and y i- O. Hence the result is true for matrices of order n. A.4.8. The diagonal elements of a p.d. matrix are all positive.
Proof. Setting
Xj
= 8ij (j = 1,2, ... , n), we have 0 < x' Ax = aii.
AA.9. If A is an n x n p.d. matrix and B is an n x n symmetric matrix, then A - tB is p.d. for It I sufficiently small.
Proof. The ith leading minor determinant of A - tB is a function of t, which is positive when t = 0 (by AA.7 above). Since this function is continuous, it will be positive for It I < 8i for 8i sufficiently small. Let 8 = minimum(8 I , 82, ... ,8n ); then all the leading minor determinants will be positive for It I < 8, and the result follows from AA.7. AA.10. (Cholesky decomposition) If A is p.d., there exists a unique upper triangular matrix R with positive diagonal elements such that A = R'R.
POSITIVE-DEFINITE MATRICES
463
Proof. We proceed by induction and assume that the unique factorization holds for matrices of orders up to n - 1. Thus
A
(";n-l, c, (
c
)
ann
R~_lRn-l' c',
c)
ann'
where R n - l is a unique upper triangular matrix of order n - 1 with positive diagonal elements. Since the determinant of a triangular matrix is the product of its diagonal elements, R n - l is nonsingular and we can define R = (
~~-l' ~),
where d = (R~_l)-lC and k = ann - did. Since R is unique and A = R/R, we have the required decomposition of A provided that k > O. Takil.1g determinants, det(A) = det(R/R) = det(R)2 = det(R n _ l )2k, so that k is positive as det(A) > 0 (A.4.7) and det(R n- l ) i- O. Thus the factorization also holds for positive definite matrices of order n. A.4.11. If L is positive-definite, then for any b, max {(h'b)2} = b'L -lb. h:h,tO
h'Lh
Proof. For all a,
o <
II(v - au)112 a 211ull 2 - 2au'v u'v)2 ( allull - lIull
+ IIvll2 +
2
(u'v)2 Ilvll - Ilu112路
Hence given u i- 0 and setting a = u'v/llull2, we have the CauchySchwartz inequality (U'V)2 < IIvl1 211 u l1 2 with equality if and only if v = au for some a. Hence (u'v)2} I max { = u u. v:v,to v/v Because L is positive-definite, there exists a nonsingular ma.trix R such that L = RR' (A.4.2). Setting v = R'h and u = R-lb leads to the required result.
.464
SOME MATRIX ALGEBRA
A.4.12. (Square root of a positive-definite matrix) If A is p.d., there exists a p.d. square root Al/2 such that(A 1 /2)2 = A. Proof. Let A = TAT' be the spectral decomposition of A (A.l.4), where the diagonal elements of A are positive (by A.4.1). Let Al/2 = TAl/2T'; then TA 1 / 2T'TA 1 / 2T' = TAT' (since T'T = In).
A.5
PERMUTATION MATRICES
Let IIij be the identity matrix with its ith and jth rows interchanged. Then IIYj = I, so that llij is a symmetric and orthogonal matrix. Premultiplying any matrix by IIij will interchange its ith and jth rows so that IIij is an (elementary) permutation matrix. Postmultiplying a matrix by an elementary permutation matrix will interchange two columns. Any reordering of the rows of a matrix can be done using a sequence of elementary permutations 11= IIi;dK ... IIidl' where
The orthogonal matrix II is called a permutation matrix.
A.6
IDEMPOTENT MATRICES
A matrix P is idempotent if p 2 = P. A symmetric idempotent matrix is called a projection matrix. A.6.l. If P is symmetric, then P is idempotent and of rank r if and only if it has r eigenvalues equal to unity and n - r eigenvalues equal to zero. Proof. Given p2 = P, the Px = AX (x 1= 0) implies that .>.x'x = x'Px = x'p 2x = (Px),(Px) = A2 x i x, and A('>' - 1) = O. Hence the eigenvalues are 0 or 1 and, by A.2.6, P has r eigenvalues equal to unity and n - r eigenvalues equal to zero. Conversely, if the eigenvalues are o or 1, then we can assume without loss of generality that the first r eigenvalues are unity. Hence there exists an orthogonal matrix T such that
T'PT Therefore, p 2 (A.2.2).
=
(10'
= TAT'TAT'
~) = A, or P = TAT'. = TA2T' = TAT' = P, and rank(P) = r
A.6.2. If P is a projection matrix, then tr(P) = rank(P). Proof. If rank(P) = r, then, by A.6.1 above, P has r unit eigenvalues and n - r zero eigenvalues. Hence tr(P) = r (by A.l.3).
EIGENVALUE APPLICATIONS
A.6.3. If P is idempotent, so is 1 - P. Proof. (I - p)2 = 1 - 2P + p 2 = 1 - 2P + P
=1-
465
P.
A.6.4. Projection matrices are positive-semidefinite. Proof. x/px = X/ p2 X = (PX)' (Px) > O.
= 1,2) is a projection matrix and P l P 1P 2 = P 2P l = P 2,
A.6.5. If Pi (i
(a) (b) P l
-
- P 2 is p.s.d., then
P 2 is a projection matrix.
Proof. (a) Given Plx -'- 0, then 0 < x/(P l - P 2)x = -X / P 2 X. Since P 2 is positive-semidefinite (A.6.4), X / P 2 X = 0 and P 2 x = o. Hence for any y, and x = (I - Pl)Y, P 2(1 - Pdy = 0 as Pl(1 - Pl)y = O. Thus P 2P 1Y = P 2Y, which implies that P 2P l = P 2 (A.l1.1). Taking transposes leads to P 1P 2 = P 2, and (a) is proved.
(b) (P l -P 2Y = Pr-P1P2-P2Pl +P~
A.7
= P l -P 2 -P 2+P 2 = P l -P 2.
EIGENVALUE APPLICATIONS
A.7.1. For conformable matrices, the nonzero eigenvalues of AB are the same as those of BA. The eigenvalues are identical for square matrices. Proof. Let A be a nonzero eigenvalue of AB. Then there exists u (u f= 0) such that ABu = AU; that is, BABu = ABu. Hence BAv = AV, where v = Bu f= 0 (as ABu f= 0), and A is an eigenvalue of BA. The argument reverses by interchanging the roles of A and B. For square matrices AB and BA have the same number of zero eigenvalues.
A.7.2. Let A be an n x n symmetric matrix; then max x:x#O { and min x:x#O
(x'
AX)} = I
x x
{(Xlx~x) } x
=
AMAX
AMIN,
where AMIN and AMAX are the minimum and maximum eigenvalues of A. These values occur when x is the eigenvector corresponding to the eigenvalues AMIN and AMAX, respectively. Proof. Let A = TAT' be the spectral decomposition of A (d. A.1.4) and suppose that Al = AMAX. If Y f= 0 and x = Ty = (t l , ... , tn)y, then
466
SOME MATRIX ALGEBRA
with equality when Yl = 1, and Y2 = Y3 = ... = Yn = 0, that is, when x =t 1 . The second result follows in a similar fashion. Since the ratio Xl AX/XiX is independent of the scale of x, we can set XiX = 1, giving us max (Xl Ax) = AMAX, x:x'x::::l
with a similar result for
A.B If
AMIN.
VECTOR DIFFERENTIATION
d~ = (d~i) ,then:
A.8.1.
d((31 a) d(3 = a.
A.8.2. d((31 A(3) = 2A(3 d(3
(A symmetric).
Proof. (1) is trivial. For (2), d((31 A(3) d(3i
d~. (~~",j~'~j) 2aii(3i
+ 2L
aij (3j
#1 j
2 (A(3);.
A.9
PATTERNED MATRICES
A.9.l. If all inverses exist,
where Bn = A22 - A21Al/ A 12 , B12 = A 1lA 12 , B21 = A21A C l1 = All - A12A2l A 21 , C 12 = A12A221, and C 21 = A2l A 21 路
PATTERNED MATRICES
467
Proof. As the inverse of a matrix is unique, we only have to check that the matrix times its inverse is the identity matrix. A.9.2. Let W be an n x p matrix of rank p with columns w(j) (j Then (WW)j/ = [w(j) '(In - P j )w(j)j-1,
= 1,2, ... ,p).
where P j = WU)(W(j)'WU))-lWU)', and WU) is W with its jth column omitted.
Proof. From the first equation of A.9.1, (W'W)-l
(
W(p)'W(p)
W(p)'w(p)
w(p)'W(p)
w(p)'w(p)
(;,
~),
where h. (W'W);p1 = (w(p)'w(p) - w(p)'p p W(P))-l. Let II be the permutation matrix In with its jth and pth columns interchanged. Then II2 = In, so that II is a symmetric orthogonal matrix, and its own inverse. Hence (W'W)-l
II(IIW'WII) -1 II
where hI = (WW)j/. We have thus effectively interchanged w(j), and W(p) and WU). The result then follows.
w(p)
and
A.9.3. (Sherman-Morrison-Woodbury formula) Let A and B be nonsingular m x m and n x m matrices, respectively, and let U be m x n and V be n x m. Then
(A + UBV)-l = A-I - A -lUB(B + BVA -lUB)-lBVA -1. Proof. Multiply the right-hand side on the left by A+ UBV and simplify to get In. A.9.4. Setting B
= In, U = Âąu, and V = v' (A +uv
in A.9.3, we have
')-1 _ A-I _ A -luv' A-I , 1 + v'A- 1 u
and
(A - uv
')-1
=
A-l
A -luv' A-I + -:---:-:---c;-1- V'A-1 U
468
SOME MATRIX ALGEBRA
GENERALIZED INVERSE
A.IO
469
GENERALIZED INVERSE
A generalized inverse of an m x n matrix B is defined to be any n x m matrix B- that satisfies the condition (a) BB-B = B. Such a matrix always exists (Searle [1971: Chapter 1]). The name generalized inverse for B- defined by (a) is not accepted universally, although it is used fairly widely (e.g., Rao [1973], Rao and Mitra [1971a,b], Pringle and Rayner [1971]' Searle [1971]' Kruskal [1975]). Other names such as conditional inverse, pseudo inverse, g-inverse, and p-inverse are also found in the literature, sometimes for B - defined above and sometimes for matrices defined as variants of B -. It should be noted that B - is called "a" generalized inverse, not "the" generalized inverse, for B- is not unique. Also, taking the transpose of (a), we have B' = B'(B-),B', so that B-' is,ll. generalized inverse of B'; we can therefore write
for some (B')-. If B- also satisfies three more conditions, namely, (b) B-BB- = B-, (c) (BB-), = BB-, (d) (B-B),
= B-B,
then B- is unique and it is called the Moore-Penrose inverse (Albert [1972]); some authors call it the pseudo inverse or the p-inverse. We denote this inverse by B+. If the regression matrix X is less than full rank, the normal equations X'X(3 = X'Y do not have a unique solution. Setting B = X'X and c = X'Y, we have c
and B - c is a solution of B(3 = c. (In fact, it can be shown that every solution of B(3 = c can be expressed in the form B-c for some B-.) There are several ways of computing a suitable B- for the symmetric matrix B. One method is as follows: (i) Delete p - r rows and the corresponding columns so as to leave an r x r matrix that is nonsingular; this can always be done, as rank(X'X) = rank(X) = r.
470
SOME MATRIX ALGEBRA
(ii) Invert the '{' x '{' matrix. (iii) Obtain B- by inserting zeros into the inverse to correspond to the rows and columns originally deleted. For example, if
B = ( Bll B21 and Bll is an '{' x '{' nonsingular matrix, then
We saw above that
B-c (X'X)-X'Y X*Y , say, is a solution of the normal equations. Because
(X+)'X'X
(XX+)'X XX+X [condition (c)] X [condition (a)],
we can mUltiply (X'X)(X'X)-(X'X) = X'X on the left by (X+)' and obtain
X [(X'X)-X'1 X = X. Thus X*, the matrix in the square brackets, is a generalized inverse of X, as it satisfies condition (a); using similar arguments, we find that it also satisfies (b) and (c). In fact, a generalized inverse of X satisfies (a), (b), and (c) if and only if it can be expressed in the form (X'X)-X' (Pringle and Rayner [1971: p. 26]). However, any X- satisfying just (a) and (c) will do the trick:
X'(XX-)Y X' (XX-),Y [by (c)] X' (X-),X'Y X'Y [by (a) transposed], and X-Y is a solution of the normal equations. In particular, X+y is the unique solution which minimizes /3'/3 (Peters and Wilkinson [1970]). Finally, we note that = x/3 = X(X'X)-X'Y, so that by B.1.8, P X(X'X)-X' is the unique matrix projecting lR n onto n = C(X).
e
SOME USEFUL RESULTS
A.ll
471
SOME USEFUL RESULTS
A.ll.l. If Ax
= 0 for
all x, then A
Proof. Setting Xk = Oik (k ai is the ith column of A.
= o. = 1,2, ... ,n), we have
Ax
= ai = 0
where
A.1l.2. If A is symmetric and x' Ax = 0 for all x, then A = O. Proof. Setting Xk = Oik (k = 1,2, ... , n), then aii = O. If we set Xk = Oik + Ojk (k = 1,2, .. . ,n), then x'Ax = 0 => aii + 2aij + ajj o => aij = O. A.1l.3. If A is symmetric and nonsingular, then
f3' Af3 - 2b' f3
= (f3 -
A -lb)' A(f3 - A -lb) - b' A -lb.
A.ll.4. For all a and b:
(a) Ila+ bll < Iiall + Ilbll路 (b) Iia - bll > illall -llblli路 A.12
SINGULAR VALUE DECOMPOSITION
Let X be an n x p matrix. Then X can be expressed in the form X = U:EV', where U is an n x p matrix consisting of p orthonormalized eigenvectors associated with the p largest eigenvalues of XX', V is a p x p orthogonal matrix consisting of the orthonormalized eigenvectors of XX', and :E = diag( a1, a2, ... , a p ) is a p x p diagonal matrix. Here a1 > a2 > ... > a p > 0, called the singular values of X, are the square roots of the (nonnegative) eigenvalues of X'X. Proof. Suppose that rank(X'X) = rank(X) = l' (A.2.4). Then there exists a p x p orthogonal matrix T such that X'XT =TA,
where A
= diag(ar, a~, ... , a;, 0, ... ,0),
ar
> O. Let
(i =
1,2, ...
,1');
then X'Si = a;lX'Xti = aiti and XX'Si = aiXti arsi. Thus the Si (i = 1, 2, ... , 1') are eigenvectors of XX' corresponding to the eigenvalues ar (i = 1,2, ... ,1'). Now sisi = 1, and since the eigenvectors corresponding to different eigenvectors of a symmetric matrix are orthogonal,
472
SOME MATRIX ALGEBRA
the Si are orthonormal. By A.2.3 and A.2.4 there exists an orthonormal set {sr+l,Sr+2,.路. ,sn} spanning N(XX') (= N(X')). But N(X') ..L C(X) and Si E C(X) (i = 1,2, ... , r) so that S = (Sl' S2, ... , sn) is an n x n orthogonal matrix. Hence aiS~Sj
(i = 1,2, ... , r), (i=r+1, ... ,n)
o and S'XT = (
~
). Finally,
X
=S
(
~
) T'
= U~V',
where U is the first p columns of S and V = T. When X has full rank (i.e., r = p), then the singular values of X are al > a2 > ... > a p > o.
A.13
SOME MISCELLANEOUS STATISTICAL RESULTS
A.13.1. For any random variable X,
"{2
> -2, where "{2 = (p,4/ p,~) - 3.
Proof. Let p, = E[X]i then
o < _
var[(X - p,)2]
E[(x - p,)4]_ {E[(X _ p,)2]}2 2
P,4 - P,2 2
P,2
(P,4p'~
P,~b2 and
"(2
-
3
+ 2)
+ 2)
+ 2 > o.
A.13.2. If X ~ N(O,a 2 ), then var[X 2 ] = 2a 2 . The result follows by setting "{2 = 0 in the proof of A.13.1. A.13.3. Let X be a nonnegative nondegenerate random variable (i.e., not identically equal to a constant). If the expectations exist, then
and let P, = E[X] (> 0, since X is not identically zero). Taking a Taylor expansion, we have
Proof. Let f(x)
= X-I
FISHER SCORING
where Xo lies between X and p,. Now !"(Xo) E[(X - p,)2 !"(Xo)] > O. Hence
= 2Xo3 > 0,
473
so that
E[X- 1 ] = E[f(X») > j(p,) = (E[X))-l. A.13.4. If X is a random variable, then from A.13.3 we have
f(X)
~
f(p,)
+ (X - p,)1'(p,)
and var[J (X») ~ E[f(X) - f(p,»)2 ~ var f[X)(f' (p,»2. A.13.5. (Multivariate t-distribution) An m x 1 vector of random variables Y = (Y1 , Y 2 , .•• , Ym)'is said to have a multivariate t-distribution if its probability density function is given by
rc~[11 + m)) [1 -l( )'~-l( )]-(v+m)/2 fey) = (7l'1I)m/2f(~II)det(~)l/2 +V y-p, y-p, , where
~,.
t m (lI, p"
is an m x m positive-definite matrix. We shall write Y '" ~). This distribution has the following properties:
(a) If ~ = (lTij), then (Yr - P,r)f.';lTrr ~ tv' (b) (Y _p,)'~-l(y -p,) '" Fm,v. A.13.6. The random variable X is said to have a beta(a, b) distribution if its density function is given by
f(x) = where B(a,b)
A.14
1 x a - 1(1_ x)b-l, B(a, b)
0
< X < 1,
= rca)rcb)/f(a + b).
FISHER SCORING
Consider a model with log likelihood lb). Then Fisher's method of scoring for finding i, the maximum-likelihood estimate of "(, is given by the iterative process "Y(m+l)
= "(m)
_
{E [
2
8 1 ) }-l 8"(8"(' -y(=l
(~)
.
8"( -y(=l This algorithm can be regarded as a Newton method for maximizing lb), but with the Hessian replaced by its expected value, the (expected) information matrix. The latter matrix is usually simpler and more likely to be positive definite because of the relationship 2
-E [
8 1 8"( 8"('
J-
E
[~~J 8"( 8"('
which holds under fairly general conditions.
'