Linear Regression Analysis ch01

Page 1

1 Vectors of Random Variables

1.1

NOTATION

Matrices and vectors are denoted by boldface letters A and a, respectively, and scalars by italics. Random variables are represented by capital letters and their values by lowercase letters (e.g., Y and y, respectively). This use of capitals for random variables, which seems to be widely accepted, is particularly useful in regression when distinguishing between fixed and random regressor (independent) variables. However, it does cause problems because a vector of random variables, Y" say, then looks like a matrix. Occasionally, because of a shortage of letters, aboldface lowercase letter represents a vector of random variables. If X and Yare randomvariables, then the symbols E[Y), var[Y], cov[X, Y), and E[XIY = y) (or, more briefly, E[XIY)) represent expectation, variance, covariance, and conditional expectation, respectively. The n x n matrix with diagonal elements d 1 , d2 , •.• ,dn and zeros elsewhere is denoted by diag( d 1 , d 2 , •.. , dn ), and when all the di's are unity we have the identity Il].atrix In. If a is an n x 1 column vector with elements al, a2, . .. , an, we write a = (ai), and the length or norm of a is denoted by Iiali. Thus

lIall = Va'a = (a~ + a~ + ... + a~y/2. The vector with elements all equal to unity is represented by In, and the set of all vectors having n elements is denoted by lR n . If the m x n matrix A has elements aij, we write A = (aij), and the sum of the diagonal elements, called the trace of A, is denoted by tr(A) (= a11 + a22 + ... + akk, where k is the smaller of m and n). The transpose 1


2

VECTORS OF RANDOM VARIABLES

of A is represented by A' = (a~j)' where a~j = aji. If A is square, its determinant is written det(A), and if A is nonsingular its inverse is denoted by A -1. The space spanned by the columns of A, called the column space of A, is denoted by C(A). The null space or kernel of A (= {x: Ax = O}) is denoted by N(A). We say that Y '" N(B, (7"2) if Y is normally distributed with mean B and variance (7"2: Y has a standard normal distribution if B = 0 and (7"2 = 1. The t- and chi-square distributions with k degrees of freedom are denoted by tk and X~, respectively, and the F-distribution with m and n degrees offreedom is denoted by Fm,n' Finally we mention the dot and bar notation, representing sum and average, respectively; for example, J

ai路 = Laij

and

j=l

In the case of a single subscript, we omit the dot. Some knowledge of linear' algebra by the reader is assumed, and for a short review course several books are available (see, e.g., Harville [1997)). However, a number of matrix results are included in Appendices A and B at the end of this book, and references to these appendices are denoted by, e.g., A.2.3.

1.2

STATISTICAL MODELS

A major activity in statistics is the building of statistical models that hopefully reflect the important aspects of the object of study with some degree of realism. In particular, the aim of regression analysis is to construct mathematical models which describe or explain relationships that may exist between variables. The simplest case is when there are just two variables, such as height and weight, income and intelligence quotient (IQ), ages of husband and wife at marriage, population size and time, length and breadth of leaves, temperature and pressure of a certain volume of gas, and so on. If we have n pairs of observations (Xi, Yi) (i = 1,2, . .. , n), we can plot these points, giving a scatter diagram, and endeavor to fit a smooth curve through the points in such a way that the points are as close to the curve as possible. Clearly, we would not expect an exact fit, as at least one of the variables is subject to chance fluctuations due to factors outside our control. Even if there is an "exact" relationship between such variables as temperature and pressure, fluctuations would still show up in the scatter diagram because of errors of measurement. The simplest two-variable regression model is the straight line, and it is assumed that the reader has already come across the fitting of such a model. Statistical models are fitted for a variety of reasons. One important reason is that of trying to uncover causes by studying relationships between vari-


STATISTICAL MODELS

3

abIes. Usually, we are interested in just one variable, called the response (or predicted or dependent) variable, and we want to study how it depends on a set of variables called the explanatory variables (or regressors or independent variables). For example, our response variable might be the risk of heart attack, and the explanatory variables could include blood pressure, age, gender, cholesterol level, and so on. We know that statistical relationships do not necessarily imply causal relationships, but the presence of any statistical relationship does give us a starting point for further research. Once we are confident that a statistical relationship exists, we can then try to model this relationship mathematically and then use the model for prediction. For a given person, we can use their values of the explanatory variables to predict their risk of a heart attack. We need, however, to be careful when making predictions outside the usual ranges of the explanatory variables, as the model ~ay not be valid there. A second reason for fitting models, over and above prediction and explanation, is to examine and test scientific hypotheses, as in the following simple examples.

EXAMPLE 1.1 Ohm's law states that Y = rX, where X amperes is the current through a resistor of r ohms and Y volts is the voltage across the resistor. This give us a straight line through the origin so that a linear scatter diagram will lend support to the law. 0 EXAMPLE 1.2 The theory of gravitation states that the force of gravity F between two objects is given by F = a/ df3. Here d is the distance between the objects and a is a constant related to the masses of the two objects. The famous inverse square law states that (3 = 2. We might want to test whether this is consistent with experimental measurements. 0 EXAMPLE 1.3 Economic theory uses a production function, Q = aLf3 K"I , to relate Q (production) to L (the quantity of labor) and K (the quantity of capital). Here a, (3, and 'Y are constants that depend on the type of goods and the market involved. We might want to estimate these parameters for a particular, market and use the relationship to predict the effects of infusions 0 of capital on the behavior of that market. From these examples we see that we might use models developed from theoretical considerations to (a) check up on the validity of the theory (as in the Ohm's law example), (b) test whether a parameter has the value predicted from the theory, under the assumption that the model is true (as in the gravitational example and the inverse square law), and (c) estimate the unknown constants, under the assumption of a valid model, and then use the model for prediction purposes (as in the economic example).


4

VECTORS OF RANDOM VARIABLES

1.3

LINEAR REGRESSION MODELS

If we denote the response variable by Y and the explanatory variables by Xl, X 2 , ... , X K , then a general model relating these variables is

although, for brevity, we will usually drop the conditioning part and write E[Y]. In this book we direct our attention to the important class of linear models, that is,

which is linear in the parameters {3j. This restriction to linearity is not as restrictive as one might think. For example, many functions of several variables are approximately linear over sufficiently small regions, or they may be made linear by a suitable transformation. Using logarithms for the gravitational model, we get the straight line logF == loga - (3 log d.

(1.1)

For the linear model, the Xi could be functions of other variables z, w, etc.; for example, Xl == sin z, X2 == logw, and X3 == zw. We can also have Xi == Xi, which leads to a polynomial model; the linearity refers to the parameters, not the variables. Note that "categorical" models can be included under our umbrella by using dummy (indicator) x-variables. For example, suppose that we wish to compare the means of two populations, say, JLi = E[Ui ] (i = 1,2). Then we can combine the data into the single model

E[Y]

-

JLl {30

+ (JL2 + {3lX,

JLl)X

where X = a when Y is a Ul observation and X = 1 when Y is a U2 observation. Here JLl = {30 and JL2 == {30 + {3l, the difference being {3l' We can extend this idea to the case of comparing m means using m - 1 dummy variables. In a similar fashion we can combine two straight lines,

(j = 1,2), using a dummy X2 variable which takes the value 0 if the observation is from the first line, and 1 otherwise. The combined model is

E[Y]

say, where

X3

+ I'lXl + (a2 - al)x2 + (')'2 {30 + {3lXl + {32 x 2 + {33 x 3,

al

== Xl X2. Here al == {30,

a2

I'I)XlX2

(1.2)

= {30 + {32, 1'1 == {3l, and 1'2 == {3l + {33'


EXPECTATION AND COVARIANCE OPERATORS

5

In the various models considered above, the explanatory variables mayor may not be random. For example, dummy variables are nonrandom. With random X-variables, we carry out the regression conditionally on their observed values, provided that they are measured exactly (or at least with sufficient accuracy). We effectively proceed as though the X-variables were not random at all. When measurement errors cannot be ignored, the theory has to be modified, as we shall see in Chapter 9.

1.4

EXPECTATION AND COVARIANCE OPERATORS

In this book we focus on vectors and matrices, so we first need to generalize the ideas of expectation, covariance, and variance, which we do in this section. Let Zij (i = 1,2, ... , mj j = 1,2, ... , n) be a set of random variables with expected values E[Zij]. Expressing both the random variables and their expectations in matrix form, we can define the general expectation operator of the matrix Z = (Zij) as follows:

Definition 1.1 E[Z] = (E[Zij]).

THEOREM 1.1 If A = (aij), B = (b ij ), and C = (Cij) are l x m, n x p, and l x p matrices, respectively, of constants, then E[AZB

Proof路 Let W

+ C]

= AE[Z]B

+ C.

= AZB + Cj then Wij = 2::."=1 2:;=1 airZrsbsj + Cij

E [AZB

+ C]

= (E[Wij]) =

(~~ airE[Zrs]bsj + Ci j )

= ((AE[Z]B)ij) = AE[Z]B

and

+ (Cij)

+ C.

0

In this proof we note that l, m, n, and p are any positive integers, and the matrices of constants can take any values. For example, if X is an m x 1 vector, tlien E[AX] = AE[X]. Using similar algebra, we can prove that if A and B are m x n matrices of constants, and X and Yare n x 1 vectors of random variables, then E[AX

+ BY]

= AE[X]

+ BE[Y].

In a similar manner we can generalize the notions of covariance and variance for vectors. IT X and Yare m x 1 and n x 1 vectors of random variables, then we define the generalized covariance operator Cov as follows:


6

VECTORS OF RANDOM VARIABLES

Definition 1.2 Cov[X, Y] = (COV[Xi , 楼j]).

=a

THEOREM 1.2 If E[X)

and E[Y)

= (3,

then

Cov[X, Y] = E [(X - a)(Y - (3)'].

Proof路 Cov[X, Y) = (COV[Xi, Yj]) = {E[(Xi - ai)(Yj - .aj)]}

=E {[(Xi - ai)(Yj - .aj)]} =E [(X - a)(Y - (3)'].

o

Definition 1.3 When Y = X, Cov[X, X], written as Var[X], is called the variance (variance-covariance 01' dispersion) matrix of X. Thus Var[X]

-

(cov[Xi , Xj])

var[X1 ] COV[X2,XI )

cov[XI , X 2 ] var[X2)

cov[X2 ,Xn ]

cov[Xn,Xd

cov[Xn ,X2]

var[Xn]

cov[XI,Xn] (1.3)

Since cov[Xi , X j ] = cov[Xj , Xi], the matrix above is symmetric. We note that when X = Xl we write Var[X] = var[Xd. From Theorem 1.2 with Y = X we have Var[X]

= E [(X - a)(X - a)'] ,

(1.4)

which, on expanding, leads to Var[X] = E[XX') - aa'.

(1.5)

These last two equations are natural generalizations of univariate results.

EXAMPLE 1.4 If a is any n x 1 vector of constants, then Var[X - a) = Var[X]. This follows from the fact that Xi - ai - E[Xi - ail = Xi - E[Xi ], so that

o


EXPECTATION AND COVARIANCE OPERATORS

7

THEOREM 1.3 If X and Yare m x 1 and n x 1 vectors of random variables, and A and B are l x m and p x n matrices of constants, respectively, then Cov[AX, BY] = A Cov[X, Y]B'.

(1.6)

Proof. Let U = AX and V = BY. Then, by Theorems 1.2 and 1.1,

Cov[AX, BY] = Cov[U, V] =E [(U - E[U]) (V - E[V])'] =E [(AX - Aa)(BY - B,8)'] =E [A(X - a)(Y - ,8)'B'] =AE [(X - a)(Y - ,8)'] B' =A Cov[X, Y]B' .

o

From the theorem above we have the special cases Cov[AX, Y] = A Cov[X, Y]

and

Cov[X, BY] = Cov[X, Y]B'.

Of particular importance is the following result, obtained by setting B and Y = X: Var[AX] = Cov[AX, AX] = ACov[X,X]A' = A Var[X]A'.

=A (1.7)

EXAMPLE 1.5 If X, Y, U, and V are any (not necessarily distinct) n xl vectors of random variables, then for all real numbers a, b, c, and d (including zero), Cov[aX + bY,eU + dV] ac Cov[X, U] + ad Cov[X, V]

+ be Cov[Y, U] + bd CoYlY , V]. (1.8)

To prove this result, we simply multiply out E [(aX

=

+ bY -

aE[X]- bE[Y])(cU

E [(a(X - E[X])

+ b(Y -

+ dV -

cE[U] - dE[V])']

E[Y])) (c(U - E[U])

+ d(V -

E[V]))'].

If we set U = X and V = Y, c = a and d = b, we get

Var[aX + bY]

+ bY,aX + bY] Var[X] + ab(Cov[X, Y] +

Cov[aX -

a

2

2

+b Var[Y].

CoylY, X])

(1.9)

o


8

VECTORS OF RANDOM VARIABLES

In Chapter 2 we make frequent use of the following theorem. THEOREM 1.4 If X is a vector of random variables such that no element of X is a linear combination of the remaining elements ri. e., there do not exist a (=1= 0) and b such that a'X = b for all values of X = xj, then Var[X) is a positive-definite matrix (see A.4).

Proof. For any vector e, we have

o <

var[e'X) e'Var[X)e

[by equation (1. 7禄).

Now equality holds if and only if e'X is a constant, that is, if and only if e'X = d (e =1= 0) or e = O. Because the former possibility is ruled out, e = 0 and Var[X) is positive-definite. 0 EXAMPLE 1.6 If X and Y are m x 1 and n x 1 vectors of random variables such that no element of X is a linear combination of the remaining elements, then there exists an n x m. matrix M such that Cov[X, Y - MX) == O. To find M, we use the previous results to get Cov[X, Y) - Cov[X, MX) Cov[X, Y) - Cov[X, X]M' Cov[X, Y) - Var[X)M'.

Cov[X, Y - MX)

(1.10)

By Theorem lA, Var[X] is positive-definite and therefore nonsingular (AA.1). Hence (1.10) is zero for

o

M' = (Vai[X])-1 Cov[X, Y).

EXAMPLE 1.7 We now give an example of a singular variance matrix by using the two-cell multinomial distribution to represent a binomial distribution as follows: pr (X I = IT X

Xl,

X2 =

X2 )

=

n! ,

"'1

"'2

,PI P2 , PI Xl路 X2'

+ P2 ==

1

,Xl

+ X2 =

n.

= (XI ,X2 )', then Var[X) = ( npl(l- PI) -npIP2

which has rank 1 as

P2

= 1-

o

Pl'

EXERCISES 1a

1. Prove that if a is a vector of constants with the same dimension as the

random vector X, then

E[(X - a)(X - a)') = Var[X]

+ (E[X]

- a)(E[X] - a)'.


MEAN AND VARIANCE OF QUADRATIC FORMS

9

If Var[X] = E = ((J'ij), deduce that

E[IIX -

aWl = L

(J'ii

+ IIE[X] - aW·

i

2. If X and Y are m x 1 and n x 1 vectors of random variables, and a and bare m x 1 and n x 1 vectors of constants, prove that Cov[X - a, Y - b] = Cov[X, Y]. 3. Let X = (X I ,X2 , ••• ,Xn )' be a vector of random variables, and let YI = Xl, Yi = Xi - X i - l (i = 2,3, ... , n). If the Yi are mutually independent random variables, each with unit variance, find Var[X].

4. If Xl, X 2 , ••• , Xn are random variables satisfying Xi+l = pXi (i = 1,2, ... , n - 1), where p is a constant, and var[Xd = (J'2, find Var[X].

1.5

MEAN AND VARIANCE OF QUADRATIC FORMS

Quadratic forms play a major role in this book. In particular, we will frequently need to find the expected value of a quadratic form using the following theorem.

THEOREM 1.5 Let X = (Xi) be an n x 1 vector of random variables, and let A be an n x n symmetric matrix. If E[X) = J1, and Var[X) = E = ((J'ij) , then E[X' AX) = tr(AE) + J1,' AJ1,. Proof·

E[X' AX) = tr(E[X' AX)) =E[tr(X' AX)) =E[tr(AXX')) [by A.1.2) = tr(E[AXX')) = tr( AE[XX'J) = tr [A( Var[X) + J1,J1,')) [by (1.5)) = tr(AE) + tr(AJ.LJ1,') =tr(AE) + J1,'AJ1, [by A.1.2).

o

We can deduce two special cases. First, by setting Y = X - b and noting that Var[Y) = Var[X) (by Example 1.4), we have

E[(X - b)'A(X - b)) = tr(AE)

+ (J1, - b)'A(J1, -

b).

(1.11)


10

VECTORS OF RANDOM VARIABLES

Second, if ~ = 0-2In (a common situation in this book), then tr(A~) = 0- 2 tr(A). Thus in this case we have the simple rule

E[X'AX] = 0-2(sum of coefficients of Xl)

+ (X'AX)x=l'.

(1.12)

EXAMPLE 1.8 If Xl, X 2 , • •• ,Xn are independently and identically distributed with mean J.t and variance 0- 2, then we can use equation (1.12) to find the expected value of Q = (Xl - X 2)2 + (X2 - X3)2 + ... + (Xn-l - Xn)2. To do so, we first write n

Q = X'AX = 2 LX; -

n-l

xl - X~ -

2L

i=l

XiXi+l.

i=l

Then, since COV[Xi' Xj] = 0 (i f= j), ~ = 0-2In and from the squared terms, tr(A) = 2n - 2. Replacing each Xi by J.t in the original expression for Q, we see that the second term of. E[X' AX] is zero, so that E[Q] = 0-2(2n - 2). 0

EXAMPLE 1.9 Suppose that the elements of X = (X l ,X2, ... ,Xn )' have a common mean J.t and X has variance matrix ~ with o-ii = 0- 2 and o-ij = p0-2 (i f= j). Then, when p = 0, we know that Q = Ei(Xi - X)2 has expected value 0- 2 (n - 1). To find its expected value when p f= 0, we express Q in the form X' AX, where A = [(Oij - n- l )] and 1 -n -1 A~

0- 2

-n -1

-n -1 1 -n -1

-n -1 -n -1

-n -1

-n -1

1 -n -1

p P 1

p p

P

1

1

P

0- 2(1 - p)A. Once again the second term in E[Q] is zero, so that

E[Q] = tr(A~)

= 0- 2(1- p) tr(A) = 0- 2(1- p)(n -

0

1).

THEOREM 1.6 Let Xl, X 2, ... , Xn be independent random variables with means (h, B2, ... ,Bn , common variance J.t2, and common third and fourth moments about their means, J.t3 and J.t4, respectively (i.e., J.tr = E[(Xi - Bit]). If A is any n x n symmetric matrix and a is a column vector of the diagonal elements of A, then var[X' AX] = (J.t4 - 3J.t~)a' a + 2J.t~ tr(A 2) + 4J.t2(J' A 2(J + 4J.t3(J' Aa.

(This result is stated without proof in Atiqullah {1962}.} Proof. We note that E[X] = (J, Var[X] = J.t2In, and Var[X'AX] = E[(X'AX)2]- (E[X'AX])2.

(1.13)


MEAN AND VARIANCE OF QUADRATIC FORMS

11

Now

+ 20'A(X -

X'AX = (X - O)'A(X - 0)

fJ)

+ O'AfJ,

so that squaring gives (X' AX)2

=

+ 4[0' A(X - 0)]2 + (0' AfJ)2 + 20'AO[(X -0)' A(X - 0) + 40' AOO'A(X - 0)] [(X - 0)' A(X - 0)]2

+40'A(X - O)(X - O)'A(X - 0).

Setting Y

=X -

= 0 and, using Theorem 1.5,

0, we have E[Y]

E[(X'AX)2]

E[(Y'Ay)2] +4E[(O'Ay)2]

=

+ 20'AOJ.L2 tr(A)

+ (O'AO?

+ 4E[O' AYY' AY].

As a first step in evaluating the expression above we note that (Y'Ay)2 =

2:2:2:2>ij aklYiYj Ykll. i

k

j

I

Since the l'i are mutually independent with the same first four moments about the origin, we have i = j = k = l, i = j, k = lj i = k, j

= lj i = l,j =

k,

otherwise. Hence E[(Y' Ay)2]

-

J.L4

L:>~i + J.L~ L i

-

(L aiiakk + 2: atj + L aijaji) k#-i #i #i

i

(J.L4 - 3J.L~)a'a + J.L~ [tr(A)2

since A is symmetric and

Ei E j

a~j

+ 2tr(A2)]

,

= tr(A2). Also,

(O'Ay)2 = (b'y)2 =

LLbibjYiYj, i

j

say, and fJ'Ayy'AY =

LLLbiajkYiYjYk, j

i

so that E[(O' Ay)2]

= J.L2 L

b~

k

= J.L2b'b = J.L20' A 20

i

and E[O' AYY' AY] = J.L3

L biaii = i

J.L3b'a = J.L30' Aa.

(1.14)


12

VECTORS OF RANDOM VARIABLES

Finally, collecting all the terms and substituting into equation (1.13) leads to the desired result. 0 EXERCISES Ib

1. Suppose that Xl, X 2 , and X3 are random variables with common mean fl, and variance matrix 1

Var[X] = u

2

(

~

o 1 I

'4

2. If Xl, X 2 , ••• , Xn are independent random variables with common mean fl, and variances u?, u~, ... , u;, prove that I:i(Xi - X)2 I[n(n -1)] is an unbiased estimate of var[ X]. 3. Suppose that in Exercise 2 the variances are known. Let X w = be an unbiased estimate of fl, (Le., I:i Wi = 1). (a) Prove that var[Xw] is minimized when Wi imum variance Vrnin.

<X

l/ur

I:i WiXi

Find this min-

(b) Let S! = L:i Wi (Xi - Xw)2/(n - 1). If WW; = a (i = 1,2, ... , n), prove that E[S!] is an unbiased estimate of Vrnin. 4. The random variables Xl, X 2 , ••• , Xn have a common nonzero mean fl" a common variance u 2 , and the correlation between any pair of random variables is p. (a) Find var[X] and hence prove that -1 I (n - 1)

< P < 1.

(b) If

Q= a

~X; +b (~Xi)2

is an unbiased estimate of u 2 , find a and b. Hence show that, in this case, Q_ n (Xi _ X)2

-~(I-p)(n-1)"

5. Let Xl, X 2 , • •• , Xn be independently distributed as N(fl" ( 2 ). Define


MOMENT GENERATING FUNCTIONS AND INDEPENDENCE

and

13

n-1

" Q = 2(n 1_ 1) " ~(Xi+1 - Xi) 2 . • =1

(a) Prove that var(8 2 ) =

20-

4

j(n - 1).

(b) Show that Q is an unbiased estimate of 0- 2 • (c) Find the variance of Q and hence show that as n -+ ciency of Q relative to 8 2 is ~.

1.6

00,

the effi-

MOMENT GENERATING FUNCTIONS AND INDEPENDENCE

If X and tare n x 1 vectors of random variables and constants, respectively, then the moment generating function (m.g.f.) of X is defined to be

Mx(t) = E[exp(t'X»). A key result about m.g.f.'s is that if Mx(t) exists for all Iltll < to (to> 0) (i.e., in an interval containing the origin), then it determines the distribution uniquely. Fortunately, most of the common distributions have m.g.f. 's, one important exception being the t-distribution (with some of its moments being infinite, including the Cauchy distribution with 1 degree offreedom). We give an example where this uniqueness is usefully exploited. It is assumed that the reader is familiar with the m.g.f. of X~: namely, (1- 2t)-r/2.

EXAMPLE 1.10 Suppose that Qi '" X~i for i = 1,2, and Q = Q1 - Q2 is statistically independent of Q2. We now show that Q '" X~, where r = r1 -r2. Writing (1 - 2t)-rl/2

E[exp(tQ1») -

E[exp(tQ + tQ2») E[exp(tQ»)E[exp(tQ2») E[exp(tQ»)(l - 2t)-1/2,

we have E[exp(tQ») = (1 - 2t)-h- r2)/2,

which is the m.g.f. of X~.

o

Moment generating functions also provide a convenient method for proving results about statistical independence. For example, if Mx(t) exists and Mx(t) = MX(t1, ... , t r , 0, ... , O)Mx(O, ... , 0, tr+1' ... ' t n ),


14

VECTORS OF RANDOM VARIABLES

then Xl = (X1, ... ,Xr )' andX 2 = (X r +1,' .. 'X n )' are statistically independent. An equivalent result is that Xl and X 2 are independent if and only if we have the factorization

Mx(t) = a(tI, ... , tr)b(tr+l, ... , t n ) for some functions a(路) and b(路).

EXAMPLE 1.11 Suppose that the joint distribution of the vectors of random variables X and Y have a joint m.g.f. which exists in an interval containing the origin. Then if X and Yare independent, so are any (measurable) functions of them. This follows from the fact that if c(路) and d(路) are suitable vector functions,

E[exp{s'c(X)

+ s'd(Y)}

= E[exp{s'c(X)}]E[exp{s'd(Y)}] = a(s)b(t),

say. This result is, in fact, true for any X and Y, even if their m.g.f.'s do not exist, and can be proved using characteristic functions. 0 Another route .that we shall use for proving independence is via covariance. It is well known that cov[X, Y] = 0 does not in general imply that X and Y are independent. However, in one important special case, the bivariate normal distribution, X and Y are independent if and only if cov[X, Y] = O. A generalization of this result applied to the multivariate normal distribution is given in Chapter 2. For more than two variables we find that for multivariate normal distributions, the variables are mutually independent if and only if they are pairwise independent. Bowever, pairwise independence does not necessarily imply mutual independence, as we see in the following example.

EXAMPLE 1.12 Suppose that Xl, X 2 , and X3 have joint density function (27r) -3/2 exp [- ~xt x

{I + XIX2X3 exp -00 < Xi < 00

+ x~ + xm [-Hx~ + x~ + x~)]} (i = 1,2,3).

Then the second term in the braces above is an odd function of its integral over -00 < X3 < 00 is zero. Hence

(27r)-1 exp [-~(x~

X3,

so that

+ xm

!I (Xd!z(X2), and Xl and X 2 are independent N(O,l) variables. Thus although Xl, X 2 , and X3 are pairwise independent, they are not mutually independent, as


MOMENT GENERATING FUNCTIONS AND INDEPENDENCE

15

EXERCISES Ie 1. If X and Y are random variables with the same variance, prove that

cov[X + Y, X - Y] = O. Give a counterexample which shows that zero covariance does not necessarily imply independence.

2. Let X and Y be discrete random variables taking values 0 or 1 only, and let pr(X = i, Y = j) = Pij (i = 1, OJ j = 1,0). Prove that X and Y are independent if and only if cov[X, Y] = o. 3. If X is a random variable with a density function symmetric about zero and having zero mean, prove that cov[X, X2] = O. 4. If X, Y and Z have joint density function

f(x,y,z) =

i(1 + xyz)

(-1

< x,y,z < 1),

prove that they are pairwise independent but not mutually independent.

MISCElLANEOUS EXERCISES I 1. If X and Y are random variables, prove that

var[X) = Ey{ var[XJYJ}

+ vary{E[XJYJ}.

Generalize this result to vectors X and Y of random variables.

Var[X) =

(

(a) Find the variance of Xl - 2X2 (b) Find the variance matrix of Y and Y2 = Xl +X2 +X3 •

5 2 3) 2 3 0 303

.

+ X 3•

= (Yi, }2)',

where Yl

= Xl + X 2

3. Let Xl, X2, . .. , Xn be random variables with a common mean f.L. Suppose that cov[Xi , Xj) = 0 for all i and j such that j > i + 1. If

i=l

and


16

VECTORS OF RANDOM VARIABLES

prove that

E

[3n(nl -- Q2] 3) Q

= var[X].

4. Given a random sample X l ,X2,X3 from the distribution with density function f(x) = ~ find the variance of (Xl - X 2)2

+ (X2 - X3)2 + (X3 -

Xl)2.

5. If Xl, ... , Xn are independently and identically distributed as N(O, 0"2), and A and B are any n x n symmetric matrices, prove that Cov[X' AX, X'BX] =

20"4

tr(AB).


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.