15 minute read

18.2Essays and Longer Questions

1)Write an essay on the difference between the OLS estimator and the GLS estimator. Answer:Answers will vary by student, but some of the following points should be made.

The multiple regression model is

Y i = β 0 + β 1 X1 i + β 0 X2 i + + β k Xki + ui , i = 1, …, n which, in matrix form, can be written as Y = Xβ + U

The OLS estimator is derived by minimizing the squared prediction mistakes and results in the following formula: β ^ = ( X′ X) - 1 X′ Y There are two GLS estimators. The infeasible GLS estimator is β ^ GLS = ( X′ Ω- 1 X) - 1 (X′ Ω- 1 Y) Since Ω is typically unknown, the estimator cannot be calculated, and hence its name. However, a feasible GLS estimator can be calculated if Ω is a known function of a number of parameters which can be estimated. Once these parameters have been estimated, they can then be used to calculate Ω ^ , the estimator of Ω The feasible GLS estimator is defined as β ^ GLS= (X′ Ω ^ - 1 )- 1 ( X′ Ω ^ - 1 Y).

There are extended least squares assumptions.

· E(ui Xi) = 0 (ui has conditional mean zero);

· (Xi,Y i ), i = 1, …, n are independently and identically distributed (i.i.d.) draws from their joint distribution;

· Xi and ui have nonzero finite fourth moments;

· X has full column rank (there is no perfect multicollinearity);

· var(ui Xi ) = σ 2 u (homoskedasticity);

· the conditional distribution of ui given Xi is normal (normal errors),

These assumptions imply E(U X) = 0 n and E(UU′ X) = σ 2 u In, the Gauss- Markov conditions for multiple regression. If these hold, then OLS is BLUE. If assumptions 5 and 6 do not hold, but assumptions 1 to 4 still hold, then OLS is consistent and asymptotically normally distributed. Small sample statistics can be derived for the case where the errors are i.i.d. and normally distributed, conditional on X

The GLS assumptions are

1. E(U X) = 0 n;

2. E(UU′ X) = Ω(X), where Ω(X) is n× n matrix- valued that can depend on X;

3. Xi and ui have nonzero finite fourth moments;

4. X has full column rank (there is no perfect multicollinearity).

The major differences between the two sets of assumptions relevant to the estimators themselves are that (i) GLS allows for homoskedastic errors to be serially correlated (dropping assumption 2 of OLS list), and (ii) there is the possibility that the errors are heteroskedastic (adding assumption 2 to GLS list). For the case of independent sampling, replacing E(UU′ X) =Ω(X) with E(UU′ X) = σ 2 u In turns the GLS estimator into the OLS estimator.

In the case of the infeasible GLS estimator, the model can be transformed in such a way that the Gauss- Markov assumptions apply to the transformed model, if the four GLS assumptions hold. In that case, GLS is BLUE and therefore more efficient than the OLS estimator. This is of little practical value

Stock/Watson 2e CVC2 8/23/06 Page 421 since the estimator typically cannot be computed. The result also holds if an estimator of Ω exists. However, for the feasible GLS estimator to be consistent, the first GLS assumption must apply, which is much stronger than the first OLS assumption, particularly in time series applications. It is therefore possible for the OLS estimator to be consistent while the GLS estimator is not consistent.

2)Give several economic examples of how to test various joint linear hypotheses using matrix notation. Include specifications of Rβ = r where you test for (i) all coefficients other than the constant being zero, (ii) a subset of coefficients being zero, and (iii) equality of coefficients. Talk about the possible distributions involved in finding critical values for your hypotheses.

Answer:Answers will vary by student. Many restrictions involve the equality of coefficients across different types of entities in cross - sections (“stability”).

Using earnings functions, students may suggest testing for the presence of regional effects, as in the textbook example at the end of Chapter 5 (exercises). The textbook tested jointly for the presence of interaction effects in the student achievement example at the end of Chapter 6. Students may want to test for the equality of returns to education and on - the- job training. The panel chapter allowed for the presence of fixed effects, the presence of which can be tested for. Testing for constant returns to scale in production functions is also frequently mentioned.

Consider the multiple regression model with k regressors plus the constant. Let R be of order q × ( k + 1), where q are the number of restrictions. Then to test (i) for all coefficients other than the constant to be zero, H0 : β 1 = 0, β 2 = 0,. . ., β k = 0 vs. H1 : β j ≠ 0, at least one j , j =1, ..., n, you have R = [ 0 k × 1 Ik ] and r = 0 k × 1 In large samples, the test will produce the overall regression F- statistic, which has a Fk , ∞ distribution. In case (ii), reorder the variables so that the regressors with non - zero coefficients appear first, followed by the regressors with coefficients that are hypothesized to be zero. This leads to the following formulation

= 1, …, n R = [ 0 q× ( k - q+ 1) Iq ] and r = 0 q× 1 In large samples, the test will produce an F- statistic, which has an Fq,∞ distribution. In (iii), assume that the task at hand is to test the equality of two coefficients, say H0 : β 1 = β 1 vs. H1 : β 1 ≠ β 2 , as in section 5.8 of the textbook. Then R = [0 1 - 1 0 0], r = 0 and q = 1. This is a single restriction, and the F- statistic is the square of the corresponding t - statistic. Hence critical values can be found either from Fq,∞ or from the standard normal table, after taking the square root.

3)Define the GLS estimator and discuss its properties when Ω is known. Why is this estimator sometimes called infeasible GLS? What happens when Ω is unknown? What would the Ω matrix look like for the case of independent sampling with heteroskedastic errors, where var( ui Xi ) = ch (Xi ) = σ2 X 2 1 i ? Since the inverse of the error variance- covariance matrix is needed to compute the GLS estimator, find Ω - 1 . The textbook shows that the original model Y = Xβ + U will be transformed into Y ~ = X ~ β + U ~ , where Y ~ = FY, X ~ = FX , and U ~ = FU, and F′ F = Ω- 1 . Find F in the above case, and describe what effect the transformation has on the original data.

Answer: β ^ GLS= (X′ Ω - 1 X)- 1 (X′ Ω - 1 Y) The key point for the GLS estimator with Ω known is that Ω is used to create a transformed regression model such that the resulting error term satisfies the Gauss - Markov conditions. In that case, GLS is BLUE. However, since Ω is typically unknown, the estimator cannot be calculated, and is therefore sometimes referred to as infeasible GLS. If Ω is unknown, then a feasible GLS estimator can be calculated if Ω is a known function of a number of parameters which can be estimated. Once the parameters have been estimated, they can then be used to calculate Ω ^ , which is the estimator of Ω The feasible GLS estimator is then

In the above example of heteroskedasticity,

The transformation in effect scales all variables by X1 .

4)Consider the multiple regression model from Chapter 5, where k = 2 and the assumptions of the multiple regression model hold.

(a) Show what the X matrix and the β vector would look like in this case.

(b) Having collected data for 104 countries of the world from the Penn World Tables, you want to estimate the effect of the population growth rate ( X1 i ) and the saving rate ( X2 i ) (average investment share of GDP from 1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. What are your expected signs for the regression coefficient? What is the order of the ( X′ X) here?

(c) You are asked to find the OLS estimator for the intercept and slope in this model using the formula β ^ = ( X′ X)- 1 X ′ Y Since you are more comfortable in inverting a 2 × 2 matrix (the inverse of a 2 × 2 matrix is, a b c d - 1 = 1 ad - bc d - b - c a ) you decide to write the multiple regression model in deviations from mean form. Show what the X matrix, the ( X′ X) matrix, and the X ′ Y matrix would look like now.

(Hint: use small letters to indicate deviations from mean, i.e., z i = Z i - Z and note that

Subtracting the second equation from the first, you get

(d) Show that the slope for the population growth rate is given by

(e) The various sums needed to calculate the OLS estimates are given below:

Find the numerical values for the effect of population growth and the saving rate on per capita income and interpret these.

(f) Indicate how you would find the intercept in the above case. Is this coefficient of interest in the interpretation of the determinants of per capita income? If not, then why estimate it?

(b) You would expect the population growth rate to have a negative coefficient, and the saving rate to have a positive coefficient. The order of

A reduction of the population growth rate by one percent increases the per capita income relative to the United States by roughly 0.13. An increase in the saving rate by ten percent increases per capita income relative to the United States by roughly 0.14.

(f) The first order condition for the OLS estimator in the case of k = 2 is which, after dividing by n, results in intercept is only of interest if there are observations close to the origin, which is not the case here. If it is set to zero, then the regression is forced through the origin, instead being allowed to choose a level.

5)In Chapter 10 of your textbook, panel data estimation was introduced. Panel data consist of observations on the same n entities at two or more time periods T For two variables, you have (Xit, Y it), i = 1,..., n and t = 1,..., T where n could be the U.S. states. The example in Chapter 10 used annual data from 1982 to 1988 for the fatality rate and beer taxes. Estimation by OLS, in essence, involved “stacking” the data.

(a) What would the variance - covariance matrix of the errors look like in this case if you allowed for homoskedasticity - only standard errors? What is its order? Use an example of a linear regression with one regressor of 4 U.S. states and 3 time periods.

(b) Does it make sense that errors in New Hampshire, say, are uncorrelated with errors in Massachusetts during the same time period (“contemporaneously”)? Give examples why this correlation might not be zero.

(c) If this correlation was known, could you find an estimator which was more efficient than OLS?

Answer:(a) Under the extended least least squares assumptions, E( UU′ X) = σ 2 u In

In the above example of 4 U.S. states and 3 time periods, the identity matrix will be of order 12 × 12, or (nT) × (nT) in general. Specifically

(b) It is reasonable to assume that a shock to an adjacent state would have an effect on its neighboring state, particularly when the shock affects the larger of the two such as the case in Massachusetts. Other examples may be Texas and Arkansas, Michigan and Indiana, California and Arizona, New York and New Jersey, etc. A negative oil price shock, which affects the demand for automobiles produced in Michigan, will have repercussions for suppliers located not only in Michigan, but also elsewhere.

(c) In case of a known variance - covariance matrix of the error terms, the GLS estimator β ^ GLS = ( X′ Ω- 1 X)- 1 (X′ Ω- 1 Y) could be used. The variance - covariance matrix would be of the form

(There is a subtle issue here for the case of a feasible GLS estimator, where the variances and covariances have to be estimated. It can be shown, in that case, that the GLS estimator does not exist unless n ≤ T, which is not the case for most panels. It is easier to see that the variance - covariance matrix is singular for n>T if the data is stacked by time period.)

18.3Mathematical and Graphical Problems

1)Your textbook derives the OLS estimator as β ^ = (X′ X)- 1 X′ Y

Show that the estimator does not exist if there are fewer observations than the number of explanatory variables, including the constant. What is the rank of X′ X in this case?

Answer:In order for a matrix to be invertible, it must have full rank. Since X′ X is of order ( k + 1) × ( k + 1), then in order to invert X′ X , it must have rank ( k + 1). In the case of a product such as X′ X, the rank is less than or equal to the rank of X′ or X, whichever is smaller. X is of order n × ( k + 1), and assuming that there is no perfect multicollinearity, will have either rank n or rank ( k + 1), whichever is the smaller of the two. Hence if there are fewer observations than the number of explanatory variables (including the constant), then the rank of X will be n(< k + 1), and the rank of X′ X is also n( < k + 1). Hence X′ X does not have full rank, and therefore cannot be inverted. The OLS estimator does not exist as a result.

2)Assume that the data looks as follows:

Using the formula for the OLS estimator β ^ = (X

X)- 1 X

, derive the formula for β ^ 1 , the only slope in this “regression through the origin.” show that (A+B )′ = A′ + B ′ and (AC)′ =

3)Write the following three linear equations in matrix format Ax = b, where x is a 3 × 1 vector containing q, p , and y , A is a 3× 3 matrix of coefficients, and b is a 3× 1 vector of constants.

6)Write restriction in the same format? Why not?

The cannot be written in the same format because it is nonlinear.

7)Using the model Y = Xβ + U, and the extended least squares assumptions, derive the OLS estimator β ^ Discuss the conditions under which X′ X is invertible.

Answer:The derivation copies the relevant parts of section 16.1 of the textbook. The model is Y = Xβ + U, where dimensional matrix of n observations on the k + 1 regressors (including the “constant” regressor for the intercept), U is the n× 1 dimensional vector of the n error terms, and β is the ( k + 1) × 1 dimensional vector of the k + 1 unknown regression coefficients.

The extended least squares assumptions are:

E(ui Xi) = 0 (ui has conditional mean zero);

(Xi ,Y i ), i = 1, ..., n are independently and identically distributed (i.i.d.) draws from their joint distribution;

Xi and ui have nonzero finite fourth moments.

X has full column rank (there is no perfect multicollinearity); var(ui Xi ) = σ 2 u (homoskedasticity); the conditional distribution of ui given Xi is normal (normal errors).

The OLS estimator minimizes the sum of squared prediction mistakes, n i

The derivative of the sum of squared prediction mistakes with respect to the j th regression coefficient, bj , where, for j

0, i = 1 for all i The formula for the OLS estimator is obtained by taking the derivative of the sum of squared prediction mistakes with respect to each element of the coefficient vector, setting these derivatives to zero, and solving for the estimator β ^ . The derivative on the right - hand side of above equation is the j th element of the k + 1 dimensional vector, –2 X′ (Y – Xb), where b is the k + 1 dimensional vector consisting of b0 ,…, bk . There are k + 1 such derivatives, each corresponding to an element of b. Combined, these yield the system of k + 1 equations that constitute the first order conditions for the OLS estimator that, when set to zero, define the OLS estimator β ^ . That is, β ^ solves the system of k + 1 equations,

X′ (Y – X β ^ ) = 0 k + 1 , or, equivalently, X′ Y = X′ X β ^ Solving this system of equations yields the OLS estimator β ^ = in matrix form: β ^ = ( X′ X ) - 1 X′ Y , where ( X′ X ) - 1 is the inverse of the matrix X′ X

X′ X is invertible as long as it has full rank. This requires that there are more observations than regressors (including the constant), and that there is no perfect multicollinearity among the regressors.

8)Prove that under the extended least squares assumptions the OLS estimator β ^ is unbiased and that its variance- covariance matrix is σ 2 u

)- 1

Answer:Start the proof by relating the OLS estimator to the errors

To prove the unbiasedness of the OLS estimator, take the conditional expectation of both sides of the expression.

Since E(UX) = 0 (from extended least squares assumptions 1 and 2),

To find the variance - covariance matrix var( and following the extended least squares assumptions

9)For the OLS estimator β ^ = ( X′ X)- 1 X′ Y to exist, X′ X must be invertible. This is the case when X has full rank. What is the rank of a matrix? What is the rank of the product of two matrices? Is it possible that X could have rank n? What would be the rank of X′ X in the case n<(k + 1)? Explain intuitively why the OLS estimator does not exist in that situation.

Answer:The rank of a matrix is the maximum number of linearly independent rows or columns. In general, in the case of a rectangular matrix, the maximum number of linearly independent columns is also equal to the maximum number of linearly independent rows. In the case of X, it can be, at most, either n or (k + 1), whichever is smaller. The rank of product of two matrices will be, at most, the minimum of the rank of the two matrices of the product. In the case of X′ X, both matrices will have, at most, either rank n or (k + 1), whichever is smaller. Since X′ X is a square matrix of order ( k + 1) × (k + 1), it must have full rank in order to be invertible. In the absence of perfect multicollinearity, the rank will be ( k + 1) as long as (k + 1) ≤ n If there are fewer observations than regressors (including the constant), then the rank will be n Except for the special case where there are exactly as many observations as regressors (including the constant), X′ X will not have full rank in this case, and cannot be inverted. Intuitively you have to have as many independent equations as there are unknowns to find a unique solution. This is not the case when you have n<(k + 1).

10)In order for a matrix A to have an inverse, its determinant cannot be zero. Derive the determinant of the following matrices: A =

X′ X where X = (1 10)

Answer:det (A) =15, det ( B ) = - 10, det ( X′ X) = 0.

11)Your textbook shows that the following matrix ( Mx = In - Px ) is a symmetric idempotent matrix.

Consider a different Matrix A, which is defined as follows: A = I - a. Show what the elements of A look like. b. Show that A is a symmetric idempotent matrix c. Show that Aι = 0. d. Show that AU ^ = U ^ , where U ^ is the vector of OLS residuals from a multiple regression. Answer:a.

This means that the last two terms in the above equation cancel each other, and therefore A×A = A, that is, idempotent.

12)Write down, in general, the variance - covariance matrix for the multiple regression error term U. Using the assumptions cov(u i ,uj|Xi X j) = 0 and var(u i |Xi ) =

2 u . Show that the variance - covariance matrix can be written

...

13)Consider the following symmetric and idempotent Matrix A: a.Show that by postmultiplying this matrix by the vector Y (the LHS variable of the OLS regression), you convert all observations of Y in deviations from the mean. b.Derive the expression Y’AY What is the order of this expression? Under what other name have you encountered this expression before?

Answer:a. Note that 1 n ιʹY = Y . Given this result, then if you pre‐multiply Y with A, you get b. Note that Y’A’AY = Y’AAY = Y’AY = GR:iem2s:K40062003

This is a scalar which is called the variation in Y or the Total Sums of Squares ( TSS).

This article is from: