Econometrics WS11-12 Course Manual

5/18/2018 Econometrics WS11-12 Course Manual

1/72

Course manual Econometrics: Week 1

1 About the course manual

At the beginning of every week we will upload a course manual for the correspondingweek. We decided to create a course manual for this course to have one source whereall information concerning both the content and the organization of the course is madeavailable for the student. Important information that is contained in the lecture slideswill be repeated in this manual. It is therefore obligatory for all student taking this courseto read the manual for every week of the semester. E-mails containing questions thatcould be answered by reading the manual will not be replied. The most important partof the manual is this weeks, which contains background information about the course,some tips at how to study for the course, as well as important dates.

Besides some organizational information the course manual will contain detailed readinginstructions for every week, including both obligatory and voluntary literature. Further-more, obligatory and additional exercises for each week will be provided. Finally, startingfrom week 2 the manual will contain the solutions to the obligatory exercises.

2 People

The course is taught by Jun.-Prof. Dr. Hans Manner, who gives both the lectures and theexercise meetings. He has studied econometrics at the University of Maastricht from 2001-2005, followed by a PhD at the same university from 2006-2010. Since April 2010 he worksat the Department for Social- and Economic Statistics, being attached to the chair of Prof.Dr. Karl Mosler. He can be reached by e-mail ([email protected]), byphone (0221-4704130) or by coming to his office located at Meister-Ekkehart-Str. 9,second floor during his office hours to be found on his webpage.

Walter Orth assists in designing and preparing the course, and may occasionally take overa class in case of emergency. In particular, he is responsible for preparing the exercisesand their solutions. Nevertheless, he can also be contacted in case students have questionsregarding the content of the course. His e-mail address [email protected] he can be sought during his office hours to be found on his webpage. CurrentlyWalter Orth is a PhD Student of Prof. Mosler. Before that he studied mathematics andeconomics in Duisburg-Essen.

1


2/72

3 General information about the course

Econometrics is an extremely important topic for all economics students and for manybusiness students as well. Sufficient knowledge of econometrics often is a prerequisite, orat least extremely useful, for taking many economics and finance courses. Additionally,writing a Masters thesis often requires performing an empirical analysis using economet-ric techniques. The aim of this course is to prepare the students for such tasks and toteach the most relevant econometric techniques needed to analyze economic data.

While being very useful, most students find econometrics to be a rather difficult subject.First of all, it requires knowledge and understanding of many topics from mathematicsand statistics. Although all students have learned these things, only few actually remem-ber the relevant things when needed. Econometrics itself requires understanding oftencomplex theory and knowing when certain things need to be applied. There are manysubtle issues of when and how different methods may or may not be applied. Therefore,

mastering econometrics requires hard work, time, exercise and reflecting on what hasbeen learned. We tried to design the course in a way that finds a good balance betweenlectures/exercise sessions and self study. We hope that following our recommendationson how to study for this course, students will be able to acquire sufficient knowledgeof econometrics not only to pass the exam without difficulty, but also to apply basiceconometric techniques in practice.

Students should nevertheless be aware that, for a masters course in econometrics, thematerial covered will not be very advanced. Students who have good prior knowledge, forexample those who have taken the Profilgruppe Quantitative Methoden der Wirtschafts-und Sozialwissenschaften in Cologne, are recommended to skip this course and move

right away into an advanced econometrics course like Time series analysis or AdvancedEconometrics. Those who do take the course, find it interesting, and want to learnmore about econometrics are recommended to take one of the follow up courses offeredby our department. There are many interesting and important topics in econometricsthat cannot be covered in a one-semester course and it definitely pays off to have thisknowledge.

4 Prior knowledge

As mentioned above, prior knowledge in mathematics and statistics is absolutely crucial.All students who have done their bachelor studies in Cologne should be fine. In general,having heard two statistics courses, which include an introduction to the linear regressionmodel is expected. We recommend that you review your old textbook or notes wheneveryou find it necessary. The most important concepts will again be covered in this course,but this will be very brief and without much explanation. Having taken an additionalcourse in econometrics at the bachelor level would be very helpful, but not necessary.Students who have studied econometrics before should decide for themselves if they wantto take this course, implying they will hear many things they have learned before, or ifright away they want to take a more advanced course. This depends on how intensivetheir former course was and how well they performed. If you have any doubt on this issue

2


3/72

feel free to contact the professor.

5 The textbook and reading

The main textbook for the course is A guide to modern econometrics by Marno Verbeek.The course is based on the third edition of the book, but if you have a different editionthere should be no problem. Each week we will give you detailed reading and we willfollow the structure of this book quite closely. This means that the lecture slides and thecorresponding book chapters are close to each other in terms of structure, content andnotation. This is intended, as we believe that this will facilitate combining self study andlectures. Reading the book is obligatory and you are required to know everything fromthe weekly reading. We highly recommend you to buy this book, as it will not only beextremely useful for studying this course, but will also serve as a reference in the future.If you are not willing or able to buy this book there are currently 16 copies available atthe Lehrbuchsammlung, as well as further copies at other libraries of the faculty.

There are other textbooks that you may want to consider when studying for this course.If you have studied econometrics in the past the book you have used back then shouldbe useful, also given that it is familiar to you. We recommend three additional books forwhich we will give some broad reading indication for each week. The books Introduc-tory Econometrics by Jeffrey Wooldridge and Introduction to Econometrics by JamesStock and Mark Watson are a little bit easier than the Verbeek. Econometric Analysisby William Greene is much more advanced, but also contains more details and manyadditional topics.

Those who prefer to consult a German textbook we suggest Einfhrung in die konome-trie by Walter Assenmacher or konometrie. Eine Einfhrung by Ludwig von Auer.

6 gretl

Econometrics requires using a computer program to analyze data. Many such programsexist and each programm has its pros and cons. We have decided to use gretl, becauseit allows to use a wide range of econometric techniques while at the same time beingfreeware. Many other programs such as EVIEWS, STATA or Matlab may be more

suitable for certain specialized task, but they are expensive and students cannot usethem at home. Given that this course is too large for going to the computer lab thiswould be very problematic.

gretlcan be downloaded for free from http://gretl.sourceforge.net/. There you alsofind a lot of additional information about the program and a lot of add ons. In particular,you can directly include the complete data used in the Verbeek book, but also data fromother books, which makes doing the empirical exercises much easier. An introduction togretlwill be given during the second exercise meeting.

3


4/72

7 Studying for the course

From the above it should be rather clear how the ideal student should study for thiscourse. You should read the obligatory literature either before or after the correspondinglecture. If, after both lecture and reading, a topic is not understood you may want to lookup the topic in one of the other books. If you find the course difficult and have problemsunderstanding everything you may decide to read one of the other books continuously.

The exercises should at least be read and thought about before the second class of theweek, but ideally student should have tried to solve them by themselves. Just looking atthe professor presenting the solution may seem fine to you, but experience has shown thatfor most students this does not suffice. After the exercise meeting it is recommended toreview the solutions of those exercises that appeared difficult. Note that solutions to allobligatory exercises will be provided after the meetings. Again, those who have problemswith the material should try the additional exercise and try to answer the review questions

(for which we will in general not provide the answers).

8 The exam

There will be two exams offered this semester. The first will take place on 31.01.2012 at14.00h in Aula 1. This is during the last week of the semester and may be at the sametime as other exams or classes. Therefore the second exam date will be on 20.03.2012 at10.00h in HS C. The exams will last 60 minutes and you are allowed to bring a pocketcalculator and one A4 piece of paper with any notes you desire. However, you will not be

given a formula sheet, so all formulas you think you will need must be included in thosenotes. There are hardly any past exams available to help you study, since this course isrelatively new and is taught by Jun.-Prof. Manner for the first time. However, duringthe lectures you can expect to get some hints at what may be useful for the exam. Thelast lecture of the semester will be spent reviewing the most relevant material and shouldagain serve as a preparation for the exam. For now it can be said that the main focuswill be on understanding the material of the course rather than exclusively being able toperform mechanical calculations (although that may also be a part). The exam questionswill certainly require students to interpret computer output and relate it to theoreticalconcepts.

4


5/72

9 This weeks reading

Obligatory reading: Appendix A and B, as well as Chapter 1 of Verbeek

Additional reading: The appendices of Wooldridge are very detailed about the mathemat-

ical and statistical basics. They also contain many examples. Reading it all is probablya bit too much, but you may want to read the sections corresponding to topics that youfind difficult.

Chapter 1 of Wooldridge gives you a nice introduction on what econometrics is about.

Appendices A and B of Greene are very detailed and contain much more information thanwe need. You may want to consult them if you are looking for more concise secondaryreading.

10 Exercises (obligatory)

1)Calculate the following sums and products (as far as they are defined):

A=

3 12 0

6 7 32 1 4

B= 6 +

3 71 2

C=

1 01 2

1 2

D=

37

1 9

E=4 2

3 59 7

39

F=

6 3 34 1 2

1 8 47 5 2

2)Let A and B be n nmatrices with full rank. Calculate

a) (AB)(B1A1)

b) (A(A1 + B1)B)(B+ A)1

3)Consider the matrix

X=

x1 x2 xn

,

where x1, x2, . . . , xn have Kentries each. Show that

ni=1

xix

i= XX .

4)

a) Let X1, X2, Y1, Y2 be random variables. Show that

cov{X1+ X2, Y1+ Y2}= cov{X1, Y1}+ cov{X1, Y2}+ cov{X2, Y1}+ cov{X2, Y2}

5


6/72

b) Now, X is a column vector of random variables with m entries. Note that thecovariance matrix of X is defined as V{X} = E{(XE{X})(XE{X})} =E{XX} E{X}E{X}. Further, let A be an mmatrix of constants. Show that

V{AX}= AV{X}A

11 Additional exercises

1)Show that(AB)1 =B1A1

for any two n nmatrices Aand B that have full rank (i.e. rank(A) = rank(B) = n).

2)Show thatcov{aX1, bX2}= ab cov{X1, X2}

where aand bare constant (non-random) scalars and X1 and X2 are random variables.

3)Show that E{(X E{X})(X E{X})}= E{XX} E{X}E{X}.

If you still feel unfamiliar with matrix algebra you may want to work through exercises2, 3 and 4 in chapter 2 of the textbook Mosler/Dyckerhoff/Scheicher: MathematischeMethoden fuer Oekonomen.

6


7/72


1 General information

This week is concerned with introducing the linear regression model along with modelassumptions, its estimation by ordinary least squares and the properties of the OLSestimator both in finite samples and asymptotically.

2 Reading

Obligatory reading: Verbeek, Sections 2.1, 2.2, 2.3 and 2.6.

Additional reading: Wooldridge, Chapters 2,3 and 5. The material that we cover isdistributed over these Chapters in Wooldridge and many topics we cover in the comingweeks are treated in between. Therefore you have to look for the relevant sections beforeor read these chapters completely after we have treated all the subjects.

Greene, Chapter 2, Chapter 3.1 and 3.2, Chapter 4.1-4.6 and 4.9.

3 Exercises

During this weeks exercise meeting we will provide an introduction to gretl. You mayprint and read the document A short introduction to gretl, the content of which will beexplained in the meeting. If you cant make it to the meeting make sure to go throughall the steps in the manual and explore some of the features of gretl by clicking throughsome of the menus.

Additionally, if time permits we will give you a short introduction to alternative programsthat can be used for applying econometric techniques. Examples of such programs areEVIEWS, STATA, Matlab and R.


1)

Look at the True/False questions available at http://www.econ.kuleuven.be/gme/. Try

to answer 2.1, 2.3, 2.5 and 2.7.

1


8/72

2)

Consider the simple linear regression model

yi=1+ 2xi+ i , i= 1, . . . , N ,

Assume that it holds that i N(0, 2) for i = 1, . . . , N , where 1, . . . , n are indepen-dent.

You observe the following values for xi and yi:

xi 5 10 0 15 5yi 5 5 5 12.5 17.5

a) Calculate estimates for 1 and 2 using the Ordinary Least Squares method.

b) Estimate 2

. (Note: The estimated square root of2

is usually called the standarderror of the regression.)

c) Write down the estimated regression along with the estimated standard errors of1 and 2.

2


9/72

5 Solution to last weeks exercises

1)

A =

3 12 0

6 7 32 1 4

=

16 20 512 14 6

D =

37

1 9

=

3 277 63

E =

4 2 3 5

9 7

39

=

30 6

39

= 144

F =

6 3 34 1 2

1 8 47 5 2

=

6 43 1

3 2

1 8 47 5 2

=

34 28 1610 19 10

11 34 16

The matrices B and Care not defined.

2)

a) (AB)(B1A1) =B A(A1)(B1) =B AA1

= IB1

= BB1 = I

=I

b) (A(A1 + B1)B)(B + A)1 = (( AA1 = I

+ AB1)B)(B + A)1 = (B + A B1B = I

)(B +

A)1 = (B+ A)(B+ A)1 =I

3)

X= x1 x2 xn

xi=

x1ix2i

xKi

, i= 1, . . . , n

3


10/72

XX=

x1 x2 xn

x1 x2 xn

=

x1 x2 xn

x1

x2

xn

= x1x

1KKmatrix

+x2x

2+ . . .+ xnx

n

=n

i=1

xix

i

4)

a)

cov{X1+ X2, Y1+ Y2}=E{((X1+ X2)E{X1+X2})((Y1+Y2)E{Y1+ Y2})}

=E{(X1E{X1}+X2E{X2})(Y1E{Y1}+ Y2E{Y2})}

=E{(X1E{X1})(Y1 E{Y1}) + (X1E{X1})(Y2E{Y2})+

(X2E{X2})(Y1 E{Y1}) + (X2E{X2})(Y2E{Y2})}

=E{(X1E{X1})(Y1 E{Y1})}+E{(X1E{X1})(Y2E{Y2})}+

E{(X2E{X2})(Y1 E{Y1})}+E{(X2E{X2})(Y2E{Y2})}

=cov{X1, Y1}+cov{X1, Y2}+cov{X2, Y1}+cov{X2, Y2}

b)

V{AX}= E{(AX E{AX})(AX E{AX})}

=E{(AX AE{X})(AX AE{X})}

=E{(A(X E{X}))(A(X E{X}))}

=E{A(X E{X})(X E{X})A}

=AE{(X E{X})(X E{X})

}A

=AV{X}A

6 Solution to last weeks additional exercises

1)

We only have to show that(AB)(B1A1) = I .

4


11/72

Indeed, it holds that

(AB)(B1A1) =A BB1 = I

A1 =AA1 =I .

2)

cov{aX1, bX2}=E{aX1bX2} E{aX1}E{bX2}

=abE{X1X2} abE{X1}E{X2}

=ab(E{X1X2} E{X1}E{X2})

=ab cov{X1, X2}

3)

E{(X E{X})(X E{X})}

=E{XX XE{X} E{X}X + E{X}E{X}}

=E{XX} E{XE{X}} E{E{X}X}+E{E{X}E{X}}

=E{XX} E{X}E{X} E{X}E{X}+E{X}E{X}

=E{XX} E{X}E{X}

5


12/72



In this week we mainly treat the problem of hypothesis testing in the linear regressionmodel. Problems related to multicollinearity and how to detect it are treated as well.Finally, we look at how to make predictions with the linear regression model.

2 Reading

Obligatory reading: Verbeek, Sections 2.4, 2.5, 2.7, 2.8 and 2.9

Additional reading: Wooldridge, Chapter 4, Section 6.4. The R2 treated on pages 80-81and multicollinearity on page 95.

Greene, Section 3.5 and Chapter 5. Section 5.5 can be skipped.

3 Exercises

1)

Suppose you estimate a parameter vector by some estimator b and that your estimatorhas the following property:

N(b ) N(0, A) ,whereAis some matrix. Assume further that there is an estimator Awhich consistentlyestimates A. Now, consider the following empirical results:

N= 10000 , b= 2

3

, A=

400 100100 900

Calculate asymptotic standard errors and t-statistics for b1 and b2.

2)

Consider the following multiple linear regression model:

yi=1+2xi2+3xi3+i i= 1, . . . , N .

a) Explain how one can test the hypothesis that3= 1 .

1


13/72

b) Explain how one can test the hypothesis that 2 +3 = 0 . As one alternative,consider to rewrite the model in a way that allows applying the standard t-test.

c) Explain how one can test the hypothesis that2=3= 0 .

d) Explain how one can test the hypothesis that2= 0 and 3 = 1.

3)

Load the data set dataweek3ex3.gdt into GRETL.

a) Assume you believe that there exists a linear relationship between y and x2, x3,x4, and x5. Estimate a linear regression and interpret the output. What are themost striking findings? What is the most likely explanation for your findings?

b) Use the appropriate tools from the lecture to look for evidence of multicollinearity

in your data.

4)

Load the data set hprice1.gdt from the GRETL introduction and consider again OLSestimation of the linear regression model

log(price)i=1+2log(sqrft)i+3log(lotsize)i+4bedroomsi+i.

a) Test the hypothesis that 4= 0.

b) Test the hypothesis that 2

= 1.

c) Test the hypothesis that 2+4= 0.

d) Test the joint hypothesis that 4= 0 and 2 = 1.


1)

Show that in the linear model

y=X + , N(0, 2I)

the Wald test for the general linear hypothesis H0: R=q is asymptotically equivalentwith the F test.

2)

Look again at the True/False questions available at http://www.econ.kuleuven.be/gme/.Try to answer 2.2, 2.4, 2.6 and 2.8.

2


14/72


2)

a) The matrix containing the regressors is

X=

1 51 101 01 151 5

.

Calculating the OLS estimator (XX)1Xy yieldsb1 = 5 and b2= 1. (Note thatthe parameter vector bis also sometimes denoted by

.)

b) For the standard error of regression we have to calculate the residuals

ei = yi yi=yi (b1+b2xi) , i= 1, . . . , N .

In our case we have

yi 0 5 5 10 10ei 5 10 10 2.5 7.5

Thus, the estimator for 2 is

s2 = 1

N 2Ni=1

e2i = 95.8333 ,

so that the standard error of regression is given by

s=

95.8333 = 9.7895 .

c) The standard errors of b1 and b2 are the square roots of the values on the main

diagonal of

V{b} =s2(XX)1. Here we have se(b1) = 5.3619 and se(b2) = 0.6191,

so that the estimated regression can be written down as

y= 5(5.3619)

1(0.6191)

x .

3


15/72



This week is concerned with various rather practical, but extremely important aspectsconcerning the use of the linear regression model. We discuss how to interpret the es-timated parameters of the model in different situations. Furthermore, we study how toselect the set of regressors and how to test the functional form.

2 Reading

Obligatory reading: Verbeek, read the entire Chapter 3 with the exception of Section3.2.3

Additional reading: Wooldridge, the relevant material cannot be found in one single placein this book, but is dispersed over Chapters 2, 3 and 6. As mentioned before, it makes

sense to read the first 6 Chapters of Woolridge to cover the material of about the first 4weeks of this course.

Greene, The relevant parts of Chapters 6 and 7.

3 Exercises

1)

Consider the simple regression

log(yi) =1+2log(xi) +i, i= 1, . . . , N . (1)

a) Show that 2 can be interpreted as elasticityofyi with respect to xi.

b) Calculate the elasticity ofyi with respect to xi for the alternative model

yi = 1+2xi+i, i= 1, . . . , N . (2)

Explain the essential difference between the elasticity of model (1) and model (2).

1


16/72

c) Consider now a third model:

log(yi) =1+2xi+i, i= 1, . . . , N . (3)

Interpret the coefficient 2.

2)

a) Suppose you want to investigate the question if a beer tax will reduce traffic fatali-ties. Further assume that you have data on the number of traffic fatalities and thebeer tax rate for different regions of a country. Would it be sensible to include i)the amount of beer consumption or ii) the number of miles driven as explanatoryvariables in your regression?

b) Suppose you want to investigate the question if pesticide use of farmers has an

effect on the health expenditures of farmers. When regressing health expenditureson pesticide usage amounts, does it make sense to include the number of doctorvisits as a control variable?

3)(Adapted from Stock/Watson)

Consider the results from a study comparing total compensation among top executivesin a large set of U.S. public corporations in the 1990s. Let Femalebe a dummy variablethat is equal to 1 for females and 0 for males.

a) A simple regression of the logarithm of earnings on Femaleyields

log(Earnings) = 6.48(0.01)

0.44(0.05)

Female

Interpret the coefficient ofFemale.

b) Two new variables, the market value of the firm (a measure of firm size, in millionsof dollars) and stock return (a measure of firm performance, in percentage points),are added to the regression:

log(Earnings) = 3.86(0.03)

0.28(0.04)

Female+ 0.37(0.004)

log(MarketV alue) + 0.004(0.003)

Return

Interpret the coefficient of log(MarketValue). Further, explain why the coefficientofFemalehas changed from the regression in a).

c) What would happen to your regression if the market value of firms is measured inbillions?


Load the dataset CPS04.gdt into GRETL. The data are from the Current PopulationSurvey of the U.S. Department of Labor.

2


17/72

a) Run a regression of the logarithm of average hourly earnings (AHE) on age (Age),gender (Female) and education (Bachelor). IfAgeincreases from 33 to 34, how areearnings expected to change?

b) Run a regression of log(AHE) on log(Age), Femaleand Bachelor. IfAge increases

from 33 to 34, how are earnings expected to change?

c) Run a regression of log(AHE)onAge,Age2,Femaleand Bachelor. IfAge increasesfrom 33 to 34, how are earnings expected to change?

d) Do you prefer the regression in b) to the regression in a)? Explain.

e) Do you prefer the regression in c) to the regression in a)? Explain.

f) Do you prefer the regression in c) to the regression in b)? Explain.

g) Plot the regression relation between Ageand log(AHE) from c) for females with a

Bachelor degree.


1)(Adapted from Verbeek)

Explain why it is inappropriate to drop two explanatory variables from the model at thesame time on the basis of their t-statistics only.


Suppose you want to analyze the relationship of class size and student performance asmeasured by some test score. Talking with a teacher you get the following comment:

In my experience, students do well when the class size is less than 20 students and dopoorly when the class size is greater than 25. There are no gains from reducing class sizebelow 20 students, the relationship is constant in the intermediate region between 20 and25 students, and there is no loss to increasing class size when it is already greater than25.

If the teacher is right, how should you choose the functional form of your model?

3


18/72

5 Solutions to last weeks exercises

1)

Given that V{N(b )} = V{Nb} A and using that A is a consistent estimatorofA, in large samples we have

V{

Nb} A V{b} 1

NA

V{b} = 110000

400 100100 900

V{b1} = 40010000

= 0.04 , V{2} = 90010000

= 0.09

The standard errors are equal to the square roots of the estimated variances so that we

have se(b1) = 0.2 and se(b2) = 0.3. For the t statistics we have t1 =

b1

se(b1) = 10 andt2=

b2se(b2)

= 10.

2)

a) The hypothesis that 3 = 1 can be tested by means of a t-test.The test statistic is

t= b3 1se(b3)

.

which - under the null hypothesis - has an approximate standard normal distributionin large samples and atdistribution with (N3) degrees of freedom in small samplesunder the assumption of normality of the error term. At the 95 % confidence level,we reject the null in large samples if|t| >1.96 (two-tailed test).

b) The hypothesis that 2+3 = 0 can also be tested by means of a t-test.The test statistic is

t= b2+b3se(b2+b3)

.

To calculate se(b2+b3) we use the estimated covariance matrixV{b} and the factthat V{b2+b3} =V{b2} +V{b3} + 2 cov{b2, b3}.Alternatively, you can rewrite the model as

yi=1+ (2+3)xi2+3(xi3 xi2) +iyi=1+

2xi2+3(xi3 xi2) +iand apply the usual t-test for 2 .

c) The joint hypothesis that2=3= 0 can be tested by means of theoverall F-testwhich is a special case of the general F-test. The test statistic is

F = R2/2

(1 R2)/(N 3).

We compare the test statistic with the critical values from an F distribution with

2 (the number of restrictions) and N-3 degrees of freedom.

4


19/72

d) The joint hypothesis that2= 0 and3= 1 can be tested by means of the generalF-test. For the general joint linear null hypothesis H0 : R = qwe have in ourcase

R=

0 1 00 0 1

and q= (0, 1). The test statistic is

F =(Rb q)(R(XX)1R)1(Rb q)

Js2 ,

where one has to insert R and q, the estimated parameter vector b, the regressormatrix X, the degrees of freedom J(here J= 2) and the estimated error variances2 = 1

NK

Ni=1e

2i . Note that the matrixR has nothing to do with the goodness-

of-fit measure R2.

3)

a) Model 1: OLS, using observations 1100Dependent variable: Y

Coefficient Std. Error t-ratio p-value

const 0.0541118 0.0922596 0.5865 0.5589X2 0.621068 0.680645 0.9125 0.3638X3 0.402881 0.595978 0.6760 0.5007X4 0.497742 0.295766 1.6829 0.0957

X5 0.601589 0.0897158 6.7055 0.0000

Mean dependent var 0.000323 S.D. dependent var 1.554198Sum squared resid 79.71127 S.E. of regression 0.916005R2 0.666672 AdjustedR2 0.652637F(4, 95) 47.50110 P-value(F) 7.09e22Log-likelihood 130.5559 Akaike criterion 271.1118Schwarz criterion 284.1376 HannanQuinn 276.3836

The first three variables are not significant, but the R2 is quite large indicating agood overall fit. The large standard errors may partly be caused by multicollinearity.

b) First of all you compute the correlation matrix between the regressors:

Correlation coefficients, using the observations 11005% critical value (two-tailed) = 0.1966 for n = 100

X2 X3 X4 X51.0000 0.8816 0.4565 0.0328 X2

1.0000 0.0123 0.0816 X31.0000 0.1114 X4

1.0000 X5

5


20/72

The correlation betweenx2 andx3 is quite large (0.88), indicating multicollinearity.Also the correlation between x2 and x4 is notable.

Next, run the auxiliary regression of the each of the regressors on the remainingones and look at the resulting R2j . You get R

22 = 0.97, R

23 = 0.96, R

24 = 0.89,

and R2

5 = 0.024. This indicates strong multicollinearity between the first threevariables, but excludes x5.

4)

a) This test is automatically performed by GRETL. The t-statistic is 1.342 and theassociated p-value is 0.1831. Thus, we cannot rejectH0 at conventional significancelevels.

b) The t-statistic for this test is

t= 0.700232 10.0928652

= 3.228

Using the p-value finder of GRETL (see the GRETL introduction) gives a p-valueof 0.00177876 for the two-tailed t-test, so that H0 is clearly rejected. Note that wehave to choose 88 4 = 84 degrees of freedom.

c) Similar to exercise 2b) you have two options. First, in the model window, you maygo to Tests Linear restrictions and typeb[2]+ b[4] = 0. Second, rearrange themodel by defining a new variable defined as the difference ofbedroomsilog(sqrft)i(this can done in GRETL via

Add

Define new variable

varname = lsqrft -

bdrms) and reestimate the model with the regressors log(sqrft)i,log(lotsize)i andvarname. The standard t-test for the coefficient oflog(sqrft)iin the revised modelcan then be applied. The result is a t-statistic of 8.918 which corresponds to ap-value of 8.68e-014 leading to a very clear rejection of the null hypothesis.

d) Since we have a joint hypothesis we have to use the F-test now. Go again to Tests Linear restrictions, and type b[2] = 1 and b[4] = 0 (see Help forexplanations). The test statistic is F= 5.25722 with a p-value of 0.00706007. Thus- not surprisingly after the result from part b) - we reject our joint hypothesis.


1)

The statistic for the F-test is given by (see lecture 3)

F =(Rb q)(R(XX)1R)1(Rb q)

Js2

=(Rb q)(Rs2(XX)1R)1(Rb q)

J

6


21/72

The corresponding Wald statistic is (see again lecture 3)

W = (Rb q)(RV{b}R)1(Rb q)

Since the standard estimate for the covariance matrix V

{b

}under the assumption

N(0, 2I) is s2(X

X)

1, we see that F and Wonly differ by the factor 1/J. Moreover,under H0 F FJNK and W

asympt.2J. Using the definition of the F distribution (see

lecture 1) we can write F as

F = J/J

NK/N K,

thus as a ratio of2 distributed random variables. For large N, we can use the result thatdue to the definition of the 2 distribution (see lecture 1) and the Law of Large Numbers

NKN

K

=

NKi=1 U

2i

N

KN E{U2i } =V{Ui} = 1

so thatJF =

asympt.J

We reject the null hypothesis ifF > FJNK;1 or, equivalently, ifJF > JFJNK;1. As

we have shown, the null distribution ofJF converges to a 2J distribution as N so that asymptotically JFJNK;1 =

2J;1. Since the latter is the critical value of the

Wald test and J F =W, the F-test and the Wald test are asymptotically equivalent andconsequently do not differ much in large samples. In small samples, the F-test can beshown to be more conservative as there can be cases where the Wald test rejects H0 and

the F-test does not but not vice versa.

7


22/72



This week we treat the problem of heteroscedasticity, so a violation of the assumptionof constant error variances. We study the consequences for estimation and inferenceusing OLS, more efficient estimation by generalized least squares and how to test for

heteroscedasticity.

2 Reading

Obligatory reading: Verbeek, Sections 4.1-4.5

Additional reading: Wooldridge, Chapter 8.

Greene, Chapter 8.

3 Exercises

1)

Consider the modelyi=1+2xi2+i

where the error terms are uncorrelated but V{i|X}= 2x2

i2.

a) Explain how an appropriate Generalized Least Squares estimator can be constructed.

Use the notation of the lecture and be specific about your choice of ,P ,hi and

.

b) What is the interpretation of2?

c) How can you test the assumed relation of the error term and xi2?

2)

Consider the modelyi = 1+2xi2+3di+i

where di is a dummy variable taking values 0 and 1. We assume that the error terms are

uncorrelated but V{i|X}= 2

1 ifdi= 1 and V{i|X}= 2

2 ifdi= 0.

1


23/72

a) Explain how an appropriate Generalized Least Squares estimator can be constructed.Use the notation of lecture and be specific about your choice of ,P ,hi and .

b) How can you get standard errors for your GLS estimator?

c) How can you graphically inspect your assumption for the error term?

3)

Use the dataset 401ksubs.gdt which contains the variable net total financial assets(nettfa, in $1000s) and several other variables which may explain a personss financialwealth. Consider the regression

nettfai=1+2inci+3agei+4age2

i+5marri+i

a) Estimate the model by OLS.

b) According to your model, at what age is financial wealth supposed to be lowest?

c) Test for heteroskedasticity using the White test and the Breusch-Pagan test.

d) Given your results from part c), does it make sense to perform additionally theGoldfeld-Quandt test based on subsamples for married and unmarried persons?

e) Use your insights from part c) to construct an appropriate Weighted Least Squaresestimator of the model. Compare the parameter estimates and standard errors withthe OLS approach.

f) Apply heteroskedasticity-robust standard errors to your OLS estimator and to theWLS estimator as well. Why may the latter be sensible?


1)

Which of the following are consequences of heteroscedasticity?

a) OLS is inconsistent.

b) OLS is biased.

c) OLS is inefficient.

d) s2(XX)1 is an inconsistent estimate of the covariance matrix of the OLS estima-tor.

e) The t and Ftests as presented in lecture 3 are no longer valid.

2


24/72


1)

a) Generally, the elasticity of a variabley with respect to x is defined as

el:= y/y

x/x=

y

xx

y

Hence the elasticity can be interpreted as the approximate percentage change ofyfor each 1% change ofx.

For model (1),

el1= yixi

xiyi

= yixi

log yi/yilog xi/xi

= log yilog xi

so that we obtain aconstantelasticity ofel1= 2. Note that for this interpretation

we must use the natural logarithm.

b) In case of model (2) the elasticity corresponds to

el2= yixi

xiyi

=2xiyi

That means the elasticity is not constant and instead depends on the current valuesofxand y.

c) For model (3),

2= log(yi)

xi=log(yi)

yi

yi

xi=yi/yi

xiwhich is called the semi-elasticity of y with respect to x. In words, the semi-elasticity is the approximate percentage change ofy given a one unit increase ofx.(For further practice you might calculate the elasticity for model (3) as well.)

2)

a) It would not be sensible to include the amount of beer consumption as an explana-tory variable although it may be significant and increase the goodness-of-fit of yourregression. If you include beer consumption the coefficient of beer tax gives you theeffect of the beer tax on traffic fatalities,holding beer consumption constant. Since abeer tax is only supposed to work if it reduces beer consumption, this does not makesense. In contrast, including the miles driven in a region is sensible. It is likely tobe an important explanatory variable and it still allows the desired interpretationof the beer tax coefficient.

b) Including the number of doctor visits as a control variable is not sensible althoughdoctor visits are likely to be highly significant. This is because you would estimatethe effect of pesticide usage on health expenditures, holding the number of doctorvisits constant. Thus, you would only estimate the effect of pesticide usage onhealth expenditures that did not arise together with doctor visits. This is probably

not what you are interested in.

3


25/72

3)

a) The earnings of females are estimated to be on average 44% lower for females thanfor males. More formally, 0.44 is the semi-elasticity of earnings with respect to

gender (see exercise 1).b) A 10% increase of the market value of a firm is estimated to increase the earnings

of top executives by 3.7%, ceteris paribus. The coefficient ofFemaleis now lower inabsolute terms implying 28% less earnings for females holding the other variables,especially market value, constant. Note that the regression from part a) suffers fromomitted variable bias if i) gender is correlated with the omitted variables like, forinstance,log(MarketValue), and ii) the omitted variables have non-zero coefficients.Since log(MarketValue) is highly significant, the omitted variable bias is likely tooccur due to a negative correlation ofFemaleand log(MarketValue), i.e. larger firmshaving less female top executives.

c) Since the coefficient oflog(MarketValue)is an elasticity it has no dimension, so thatthe interpretation would not change. Further, it is possible to show that running theregression with market value measured in billions would only change the intercept(here it would change by 0.37 log(1000)) and leave all other coefficients unchanged.

4)

a) The coefficient ofAgeis 0.0244429. Since it represents a semi-elasticity, this meansthat increasing Ageby one unit, for instance from 33 to 34, is expected to lead to

a 2.4% increase in hourly earnings.

b) The coefficient of log(Age) is 0.724697 giving the estimated elasticity of earningswith respect to age. From 33 to 34, age increases by about 3%, so that the expectedincrease in earnings is 0.030.724697 = 0.02196, i.e. about 2.2%.

c) When Age increases from 33 to 34, the predicted change in log(AHE) is

(0.147045340.00207056342)(0.147045330.00207056332) = 0.0083 ,

meaning a 0.83% increase in earnings.

d) The regressions from a) and b) contain the same number of parameters and canthus be compared by their goodness-of-fit as measured by the R2. Thus, we wouldprefer b) which has a slightly higher R2 (0.192685 vs. 0.192372).

e) The regression from c) is the same as a) but augmented with Age2. Thus, theR2

mustbe higher for c) but this does not necessarily mean that we prefer c). However,since Age2 is highly significant (p-value of 0.0031) and the Akaike InformationCriterion decreases from 10163.48 to 10156.74, we indeed prefer model c).

f) Noting that now no model is nested in the other one, we stick again to the AIC.Since 10156.74 < 10160.39, we prefer again model c).

4


26/72

g) For females with a Bachelor degree, the regression relation is

log(AHE) = 0.05873320.1797871 + 0.4050771 + 0.147045Age0.00207056Age2

log(AHE) = 0.2840232 + 0.147045Age0.00207056Age2

AHE=exp(0.2840232 + 0.147045Age0.00207056Age2 + 0.50.4568762)

=exp(0.388391039 + 0.147045Age0.00207056Age2)

Note that we added half the estimated error variance for predicting AHE(see slide9, lecture 4). In GRETL, go to Tools Plot a curve, and use the formulagiven above, i.e. exp(0.388391039 + 0.147045x0.00207056x2) and specify areasonable range for x(age), for instance 20-60.


1)

Two explanatory variables may have low t-statistics if they are highly correlated, even ifat least one of their true coeffcients is nonzero. A joint test (F-test) takes this correlationinto account. Put differently, if you drop one of the two variables from the model, theremaining one may become (highly) significant and you will not observe this if you removetwo variables at once. As an example, assume you have both a short-term and a long-term interest rate in a model explaining investments. Given the high correlation betweenthe two interest rates, both may have fairly low t-statistics. However, if you drop one ofthem, the remaining interest rate will pick up much of the explanatory power of the two,and (probably) will be statistically significant.

2)

The teacher has the hypothesis that there is no linear effect of class size on studentperformance. Rather, there are three categories (25), which have differenteffects but without any effects of changes within a category. Such a relationship can bemodelled using dummy variables. In this case, define

d1= 1 if 20 class size 25 , and d1= 0 otherwise

d2= 1 if class size >25 , and d2= 0 otherwise

Making a linear regression with these dummy variables gives coefficients with the followinginterpretation. The coefficient ofd1 is the ceteris paribus effect of increasing class sizefrom below 20 (the base category) to 20-25 on test scores. The coefficient ofd2 is theeffect of increasing class size from below 20 to above 25 on test scores. In practice, youshould compare the goodness-of-fit of such a dummy variable specification with otheralternatives like the simple linear specification.

5


27/72



This week we treat a second common violation of the standard assumptions, namelyautocorrelation. As last week, we look at consequences for OLS, alternative estimation,tests for autocorrelation and how to compute robust standard errors.

2 Reading

Obligatory reading: Verbeek, Sections 4.6-4.11


Greene, Some parts of Chapter 19.

3 Exercises1)

The plots on the following page were generated from a first-order autoregressive processesof the following form:

t = t1+t, t= 1, . . . , 100 , tN(0, 1)

The choices for are -0.9, 0 and 0.9. Which plot refers to which value of?

2)

Verify the approximation of the Durbin-Watson statistic given in the lecture,dw 2 2

where is an estimate of the first-order autocorrelation of the error terms.3)

a) Explain why autocorrelation may arise because of an omitted variable. Give exam-ples.

b) Imagine you have a linear regression model with the dependent variable being an-nual stock returns observed at a monthly frequency. What are the consequences?

1


28/72

0 10 25 40 55 70 85 100

3

2

1

0

1

2

0 10 25 40 55 70 85 100

4

2

0

2

4

0 10 25 40 55 70 85 100

6

2

0

2

4

6


29/72

4)

Load the dataset gasoline.gdt. Consider the following two regressions which both try toexplain per capita gasoline consumption in the US for the years 1960-1995:

log(Gt/Popt) =1+2log(P gt) +3log(Yt) +t

log(Gt/Popt) =1+2log(P gt) +3log(Yt) +4log(P nct) +5log(P uct)+

6log(P ptt) +7log(P dt) +8log(P nt) +9log(P st) +10t+t

a) Estimate both models by OLS and plot the residuals against time. What do thesegraphs tell about possible autocorrelation in each of the two models?

b) Test for autocorrelation by i) regressing the residuals on their first lag and applyinga t-test, ii) using the Durbin-Watson test and iii) using the Breusch-Godfrey-testwith up to 3 lags. Do the results fit to your interpretation in part a)?

c) Apply Newey-West (HAC) standard errors to the augmented model.

5)

Load the dataset ukrates.gdtwhich is a time-series dataset consisting of monthly short-term (variable rs) and long-term interest rates (variable r20) on U.K. government secu-rities. Consider the regression

rst = 1+2 r20t1+t

where rst = rst rst1 and r20t1=r20t1 r20t2. The model can be interpreted

as a simple monetary policy reaction function.

a) Estimate the model by Ordinary Least Squares and make tests for first-order au-tocorrelation using i) a regression of the residuals upon their first lag and ii) theDurbin-Watson test.

b) Re-estimate your model using Newey-West (HAC) standard errors.

c) Re-estimate your model by applying the Feasible Generalized Least Squares esti-mator by Cochrane and Orcutt and interpret the differences.

d) Re-estimate your model by applying the Prais-Winsten estimator which is identicalto the Cochrane-Orcutt procedure but additionaly uses the first observation.


1)

Which of the following are consequences of autocorrelation?

a) OLS is inconsistent.

3


30/72

b) OLS is biased.

c) OLS is inefficient.

d) s2(XX)1 is an inconsistent estimate of the covariance matrix of the OLS estima-

tor.e) The t and Ftests as presented in lecture 3 are no longer valid.

f) The model is misspecified.

4


31/72


1)

a) Our assumption is that

V{|X}= 2Diag{x2i2}= 2

Thus,

=

x212 0

. . .

0 x2N2

.Therefore, hi=xi2.Since 1 =PP,

P =Diag{x1i2 }.

Applying the transformation matrix P to y and Xgives the transformed model

P y= P X+P

yixi2

= 1xi2

+2+ ixi2

, i= 1, . . . , N

yi =1x

i1+2x

i2+

i , i= 1, . . . , N

The GLS estimator can be written as

=

Ni=1

1

x2i2xix

i

1 Ni=1

1

x2i2xiyi

or

=

Ni=1

xi x

i

1 Ni=1

xi y

i

wherex = (xi1, x

i2).

b) Since

V{i |X}= V{ixi2

|X}= 1

x2i2V{i|X}=

2x2i2x2i2

=2,

2 is the variance of the error in the transformed model and not equal to the variance

in the original model.

c) Since the assumption is that the variance of the errors is proportional toxi2,

V{i|X}= E{2i |X}=

2x2i2

we can simply use estimates of2i , for instance squared residuals taken from OLS,to perform the auxiliary regression

2i =1+2x2i2 , i= 1, . . . , N

if our assumption is correct, 1 = 0 and 2 = 0 which can be tested seperately by

t-tests.

5


32/72

2)

a) Denote the number of observations where di = 1 as N1 and let N2 be the numberof observations with di= 0. Then, our assumption is that

V{|X}= 2 =2

21

2 0. . .21

2

22

2

. . .

0 2

2

2

N1colums

N2colums

Thus hi = 1

for i= 1, . . . , N 1 and hi= 2

for i = N1+ 1, . . . , N .The transformation matrix P is

P =

1

0. . .

1

2

. . .

0 2

The GLS estimator is

=N1

i=1

2

21xix

i+N

j=N1+1

2

22xix

i

1N1i=1

2

21xiyi+

Nj=N1+1

2

22xiyi

=

N1i=1

1

21xix

i+N

j=N1+1

1

22xix

i

1N1i=1

1

21xiyi+

Nj=N1+1

1

22xiyi

b) The covariance matrix ofcan be estimated as

V{}= 2(X1X)1,

where

2 = 1N 3

(y X)1(y X)

and

1 =

2

21

0. . .

2

21

2

22

. . .

0 2

22

6


33/72

c) Before one would apply the GLS approach shown above, one should inspect theunderlying assumption. This can be done by estimating the model by OLS andplotting the residuals separately for the subsamples with di= 1 and di = 0 respec-tively. If you there are too much data for such a plot, a boxplot for the residuals ofboth subsamples is preferable.

3)

a) Model 1: OLS, using observations 19275Dependent variable: nettfa


const 9.18133 10.0688 0.9119 0.3619inc 1.05132 0.0272112 38.6356 0.0000age 2.29337 0.488324 4.6964 0.0000

sq age 0.0385523 0.00560576 6.8773 0.0000marr 10.0488 1.33773 7.5118 0.0000

b) We are looking for a local minimum of our regression function with respect to age.Taking the partial derivative with respect to age yields the first-order condition:

2.29337 + 2 0.0385523age != 0

Solving this equation leads toage = 29.74. It is easily seen that the second (partial)derivative is positive so that we have indeed a minimum at age 30.

c) The null hypothesis of homoskedasticity is clearly rejected by both the White testand the Breusch-Pagan test. Age and income especially seem to be sources ofheteroskedasticity.

d) No. One the one hand, we have already seen in the Breusch-Pagan test that mar-riage is likely to be another source of heteroskedasticity. More importantly, theGoldfeld-Quandt test considers only the heteroskedasticity induced by marriageand neglects any other sources like age and income which are apparently existentin our case.

e) The regression from the Breusch-Pagan test seems to be a sensible starting point for

a Weighted Least Squares approach. However, if we would directly use it we wouldhave no guarantee to get positive variance estimates and thus could get negativeweights. Instead, we might consider the same variables but using the multiplicativemodel presented in the lecture. The corresponding auxiliary regression is

log(e2i ) =1+2inci+3agei+4age2i +5marri+vi

To do so in GRETL, go to Save Squared residuals in the model window whichgenerates the corresponding new variable. Taking the log of the squared residualsand performing the auxiliary regression we get the interesting result that marriageis not a significant source of heteroskedasticity anymore. Dropping marri gives the

following results:

7


34/72

Model 3: OLS, using observations 19275Dependent variable: l usq2


const 5.24314 0.390843 13.4149 0.0000inc 0.0426766 0.000988221 43.1853 0.0000age 0.153820 0.0189863 8.1016 0.0000sq age 0.00243750 0.000217944 11.1841 0.0000

Using the predictions from this auxiliary regression (Save Fitted values) andreversing the logarithmic transformation we have

h2i =exp(5.24314 + 0.0426766inci 0.153820agei+ 0.00243750age2i )Note that it does not matter if we keep or drop

1 since including it means multi-

plying all observations with a constant which does not change the result. For WLS

estimation, go in GRETL to ModelsOther linear modelsWeighted LeastSquares and choose 1/h2i as the weight variable (see Help for details). Doing sogives

Model 4: WLS, using observations 19275Dependent variable: nettfa

Variable used as weight: h sq inv


const 0.337574 5.19623 0.0650 0.9482inc 0.550563 0.0225816 24.3810 0.0000age 0.840889 0.271119 3.1016 0.0019sq age 0.0167648 0.00339816 4.9335 0.0000marr 3.93802 0.586034 6.7198 0.0000

The difference in the parameter estimates is remarkably high. As expected fromtheory, the standard errors are lower under the WLS approach.

f) Go to Edit Modify Model and choose robust standard errors. Under Con-figure you get several options for heteroskedasticity-robust standard errors. Option

HC0 refers to the original White standard errors which you have seen in the lec-ture. Applying this option to our models increases the standard errors in the OLSmodel considerably while the WLS standard errors remain on a similar magnitude.Nevertheless, using robust standard errors within WLS may be sensible since themodel for the variance of the error term will only be an approximation to reality sothat heteroskedasticity may well still be present.

6 Solution to additional exercises

1)

8


35/72

a) False. Homoskedasticity is not required for the consistency of OLS.

b) False. Homoskedasticity is not required for unbiasedness of OLS.

c) True. If the error terms are heteroskedastic, the WLS estimator is more efficient in

theory. This, however, only holds if the variances of the error terms are known. Inpractice, we have to specify a model for the variance of the error term and estimateit. Then, there is no guarantee that WLS is more efficient. However, WLS will beoften be more efficient especially if the degree of heteroskedasticity is high.

d) True. In the derivation of this formula we used the assumption of homoskedasticity.

e) True. Although there are certain departures from homoskedasticity where the usualt- and F-tests are still asymptotically valid, they will be generally invalid. How-ever, the t-statistic with heteroskedasticity-robust standard errors is asymptoticallynormal distributed and the Wald test with a heteroskedasticity-robust covariance

matrix can also be used for inference (see the exercises from week 3 for the relationof F-tests and Wald tests). Note that exact small sample distributions (i.e. t- andF-distributions) are no longer available under autocorrelation even if we assumenormality of the error term.

9


36/72



This week we review the properties of OLS and the relevant assumptions. We shall seeexamples when OLS cannot be saved anymore. The instrumental variables estimator willbe introduced as a solution for these cases.

2 Reading

Obligatory reading: Verbeek, Sections 5.1-5.3.1


Greene, Chapter 12.

3 Exercises

1) (adapted from Stock/Watson)

The demand for a commodity is given byQ= 1+2P+u, whereQdenotes the quantity,Pdenotes the the price, and udenotes factors other than price that determine demand.Supply for the same commodity is given by Q= 1+ 2P+v, where v denotes factorsother than price that determine supply. Assume that u and v both have a mean of zero,variances2

uand 2

v, and are uncorrelated, i.e. cov{u, v}= 0.

a) Solve the two equations to show how Qand P depend on uand v.b) Calculate cov{P, u} and cov{P, v} and interpret the results.

c) Derive cov{P,Q} and V{P}.

d) A random sample of observations (Qi, Pi), i = 1, . . . , N , is collected, and Qi isregressed on Pi. Use the answer from c) to derive the asymptotic Ordinary LeastSquares regression coefficient ofPi.

e) Suppose the OLS estimate from d) is used to estimate the slope of the demandfunction, 2. Is the estimated slope (asymptotically) correct, too large or too small?

1


37/72

(Hint: Use the fact that demand curves usually slope down and supply curves slopeup.)

2)

Consider the instrumental variable regression modelyi=1+2x1i+3x2i+i

where x1i is correlated with i and z1i is a potential instrument. Which assumption ofthe instrumental variables estimator is not satisfied if

a) z1i is independent of (yi, x1i, x2i)?

b) z1i=x2i?

c) z1i=cx1i (where c is a constant) ?


During the 1880s, a cartel known as the Joint Executive Committee (JEC) controlledthe rail transport of grain from the midwest to eastern cities in the United States. Thecartel preceded the Sherman Antitrust Act of 1890, and it legally operated to increasethe price of grain above what would have been the competitive price. From time to time,cheating by members of the cartel brought about a temporary collapse of the collusiveprice-setting agreement.

The data filerailway.gdtcontains weekly observations on the rail shipping price and otherfactors from 1880 to 1886. Suppose that the demand curve for rail transport of grain

is specified as log(Qt) = 1+ 2log(Pt) + 3Icet+ t , where Qt is the total tonnage ofgrain shipped in week t, Pt is the price of shipping a ton of grain by rail and Icet is abinary variable that is equal to 1 if the Great Lakes are not navigable because of ice. Iceis included because grain could also be transported by ship when the Great Lakes werenavigable. Further, the variable cartel is a dummy variable for the activity of the cartel.

a) Estimate the demand equation by OLS. What is the estimated value of the demandelasticity and its standard error?

b) In exercise 1 we have analyzed that the interaction of supply and demand is likely to

make the OLS estimator of the elasticity biased. Consider now using the variablecartel as an instrumental variable for log(P). Use economic reasoning to arguewhether cartelplausibly satisfies the two conditions for a valid instrument.

c) Regress log(Pt) on cartelt and I cet. What do the results tell you about the qualityofcartelas an instrument?

d) Estimate the demand equation by instrumental variable regression. What is theestimated demand elasticity and its standard error? Compare the results to theOLS estimates.

e) Perform the Durbin-Wu-Hausman test and interpret the results.

2


38/72


1)

Consider the example from slide 3 of lecture 7,

yt= 1+2yt1+3yt2+t.

Show the result from the lecture, namely that the assumption E{|X} can not hold forthis model. (Xdenotes as always the matrix of regressors which are here lagged valuesofyt.)


1)

First plot: = 0; second plot: = 0.9; third plot: = 0.9.

If the autocorrelation is positive and high ( = 0.9), the process tends to stay above(below) its mean (zero) in the next period if it is above (below) its mean in the currentperiod. For = 0.9 the process tends to reverse its sign from one period to another.

2)

dw=

T

t=2

(et et1)2

T

t=1 e

2t

=

T

t=2

(e2t2etet1+e

2t1)

T

t=1 e

2t

=

T

t=2

e2t

T

t=1

e2t

+

T

t=2

e2t1

T

t=1

e2t

2

T

t=2

etet1

T

t=1

e2t

22

T

t=2

etet1

T

t=1

e2t

as the sample size becomes large because both

T

t=2e2

tT

t=1e2t

and

T

t=2e2

t1T

t=1e2t

tend to 1.

Since

T

t=2etet1

T

t=1e2t

is an estimator of, dw tends to 22.

It can be shown that this estimate of

is very close to the estimate which results fromregressing et on et1 by OLS.

3)

a) See exercise 4. Another example would be the omission of a variable that describesseasonal patterns in monthly or quarterly data. Consider for instance the case thatyou want to explain the activity in the construction industry based on monthlydata. It could then be that the residuals in your model would have a tendencyto be negative in the winter months and positive in the summer months. A win-ter/summer dummy variable could possibly solve this problem. Another typical

example is an omitted lagged dependent variable.

3


39/72

b) LetRtbe the annual return of a stock, i.e. the return over the period [t12, t], wheretime is measured in months. AsRt+1refers to the period [t11, t+1],RtandRt+1have 11 months in common and are therefore not stochastically independent andprobably heavily autocorrelated. The autocorrelation ofRt will probably translateinto autocorrelation of the error term in a regression explainingRtsince unexpected

(stock market) events in one month have a direct influence on Rt in the followingtwelve month periods. Under autocorrelation, routinely computed standard errorsand tests will be incorrect and misleading, so that robust standard errors (HAC)are recommended.

4)

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

1960 1965 1970 1975 1980 1985 1990 1995

residual

Regression residuals (= observed - fitted lGpc)

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

1960 1965 1970 1975 1980 1985 1990 1995

residual

Regression residuals (= observed - fitted lGpc)

a) The graph on the left hand side refers to the first model whereas the graph on theright hand side refers to the augmented model. The model with more regressors

seems to suffer less from autocorrelation since the residuals cross the zero linemore often. This is an example where the omission of relevant variables leads toautocorrelation.

b) i) Regressing the residuals of our model on their first lag (use thelags. . .optionin the model specification window to add the lagged residual), we get estimatedfirst-order autocorrelations of 0.948469 (p value 2.49e-014) and 0.268612 (p value0.1147). Thus, we clearly reject the null hypothesis of no autocorrelation for thefirst model whereas we do not reject the null hypothesis for the second model. Ofcourse, not rejecting the null hypothesis (especially in small samples) does not meanthat the null hypothesis is true.

ii) The Durbin-Watson statistics are 0.172878 and 1.373491, respectively. Lookingat the bounds for the critical values given in the lecture we see that we reject thenull hypothesis clearly for the first model and that we are in the inconclusive regionfor the augmented model (the bounds for K=10 are not too far away from thebounds for K=9).

iii) Running auxiliary regressions from the residuals on their first three lags yieldsR2 values of 0.847827 and 0.107939. Thus, the Breusch-Godfrey test statistics are32 0.847827 = 27.13 and 32 0.107939 = 3.454. Using the p value finder of GRETLand applying 3 degrees of freedom, we get p values of 5.52923e-006 and 0.326771,

respectively.

4


40/72

Thus, for each test, the results fit to the graphical analysis from part a).

c) Applying Newey-West standard errors yields in our case standard errors which arelower than the default standard errors. More often, it is the other way around.

5)

a) A regression of the residuals on their first lag gives a coefficient of 0.148841 whichis statistically significant from zero with a p value of 0.0006. The Durbin-Watsonstatistic is 1.702273. Going toTestsDurbin-Watsonpvalue gives 0.000292831,so that we also reject H0.

b) The standard errors increase as expected since standard errors that ignore autocor-relation are usually (asymptotically) downward biased. Note that HAC standarderrors are also not unbiased but they are at least asymptotically unbiased in contrastto non-robust standard errors.

c) Going to Model Time series Cochrane-Orcutt we can perform the FeasibleGeneralized Least Squares approach from the lecture. The iterations lead to aslightly increased estimated (first-order) autocorrelation. The coefficient ofr20t1is now somewhat lower than with OLS estimation. The standard errors tend to besmaller than in part b) pointing to a more efficient estimation than by doing OLS.Note that a comparison with unadjusted standard errors (part a)) is not meaningfulsince these standard errors are invalid under autocorrelation.

d) Go to Model Time series Prais-Winsten. The results are very similar.

This is no surprise since the information from one additional observation shouldnot make a big difference especially if the sample size is relatively large as it is thecase here.


1)

a) False.

b) False.

c) True. If the error terms are correlated, an appropriate GLS estimator is moreefficient in theory. This, however, only holds if the correlations (and the variances)of the error terms are known. In practice, we have to specify a model for theautocorrelations of the error term and estimate it. Then, there is no guarantee thatWLS is more efficient. However, FGLS will be often be more efficient especially ifthe degree of autocorrelation is high.

d) True. In the derivation of this formula we used the assumption of uncorrelatederror terms.

5


41/72

e) True. However, the t-statistic with HAC standard errors is asymptotically normaldistributed and the Wald test with a HAC covariance matrix can also be used forinference. Exact small sample distributions (i.e. t- and F-distributions) are nolonger available under heteroskedasticity even if we assume normality of the errorterm.

f) Depends. A high degree of autocorrelation may point to misspecification but thereis no general rule that there must be misspecification. In practice, we should justcheck the functional form and test additional regressors if these are available.

6


42/72



This week we continue studying instrumental variables estimators in the various situationwhen they are needed. We then generalize the estimator and look at specification tests.

2 Reading

Obligatory reading: Verbeek, Sections 5.3.2-5.5


Greene, Chapter 12.

3 Exercises

1)

Consider the following two equations:

yi=1+2xi+ui

xi=1+2yi+vi

ui andviare the error terms of the models and are assumed to be uncorrelated with eachother having variances 2u >0 and

2v >0.

a) Show that xi is correlated with ui if2= 0.

b) What are the consequences of your finding?

2)

Consider the equations

yi=1+2x1i+3z2i+4z3i+ui

xi=1+2yi+3z2i+vi

and assume that cov{z2i, ui}= cov{z3i, ui}= cov{z2i, vi}= cov{z3i, vi}= 0.

1


43/72

a) What do the assumptions mean?

b) Can we consistently estimate 2?

c) Can we consistently estimate 2?

3)

a) Briefly describe the Two-stage Least Squares (2SLS) approach.

b) Show that the 2SLS estimator derived in the lecture is identical to the (generalized)Instrumental Variables estimator. Consider the case of overidentification as well asthe case of exact identification.

4)

Consider again the dataset from last week (railway.gdt) and the corresponding regression

log(Qt) =1+2log(Pt) +3Icet+t

a) Use cartelt, cartelt1 and cartelt2 as instruments for log(Pt) and estimate themodel with the instrumental variables estimator.

b) Why may it be sensible to use cartelt1 and cartelt2 as additional instruments?

c) Given your reasoning from part b), how do you interpret the results from part a)?

d) What is the risk of using additional instruments in general? What about the specificcase in this exercise?

e) Check your reasoning from part d) by performing the specification test from theend of lecture 8.

f) Re-estimate the model by applying Newey-West (HAC) standard errors. (Theseare easily generalized from the OLS case given in the lecture to IV regressions.)

g) Test that the demand elasticity is equal to -1.


1)

Why does the Instrumental Variable (IV) estimator lead to a smaller R2 than the OLSone? What does this say of the R2 as a measure for the adequacy of the model?

2


44/72


a)

(1) Q = 1+2P+u

(2) Q = 1+2P+v P =1

2+

1

2Q

1

2v

Substituting P in (1):

Q = 1+2(1

2+

1

2Q

1

2v) +u| 2

2Q = 21 21+2Q 2v+2u

Q = 21 21

2 2+

2u 2v

2 2Substituting Q in (2):

P = 12+ 12

(1+2P+u) 12v| 2

2P = 1+1+2P+u v

P = 1 1

2 2+

u v

2 2

b)

cov(P, u) =cov

1 12 2

+ u v

2 2, u

= 12 2cov(u v, u)

= 1

2 2

V(u) cov(u, v) =0

=

2u2 2

= 0

cov(P, v) =cov

1 12 2

+ u v

2 2, v

=

1

2 2 cov(u v, v)

= 1

2 2

cov(u, v) =0

V(v)

=

2v2 2

= 0

Interpretation: Since cov(P, u) = 0 and cov(P, v) = 0, the regressors are correlatedwith the error terms so that the conditions for the consistency of OLS are not met.

3


45/72

c)

cov(P, Q) =cov

1 12 2

+ u v

2 2,21 12

2 2+

2u 2v

2 2

=

1

(2 2)2 cov(u v, 2u 2v)

= 1

(2 2)2

cov(u, 2u) cov(u, 2v) =0

cov(v, 2u) =0

+cov(v, 2v)

=

22u+2

2v

(2 2)2

V(P) =V

1 12 2

+ u v

2 2

= 1

(2 2)2V(u) +V(v) 2 cov(u, v)

=0

=

2u+2v

(2 2)2

d)

2,OLS=

ni=1(Qi Q)(Pi P)n

i=1(Pi P)2

=

1n

ni=1(Qi Q)(Pi P)1nn

i=1(Pi P)2

=cov(Q, P)V(P) n cov(Q, P)V(P) c)= 2

2u+2

2v

2u+2v

=2

e)

2,OLS 2n

22u+2

2v

2u+2v

22u+

2v

2u+2v

=(2 2)

2u

2u+2v

>0 , if2 >0 and 2


46/72

3)

a) The estimated demand elasticity is -0.63433 with a standard error of 0.0819697.

b) The variable cartel can be reasonably assumed to be relevant for the supply side

only. That means, cartel is likely to be uncorrelated with any demand shocks andthus uncorrelated with the error term of the demand equation. Further, cartelshould be correlated with prices since usually cartels use their power to raise prices.Thus, both conditions, exogeneity and relevance, for a valid instrument are prettylikely to be met in this case.

c) With a t-ratio of 14.7 cartel is highly significant and seems to have importantexplanatory power with respect to prices. This result supports our assumptionthat cartel is a relevant instrument.

d) Go to Model Instrumental variables Two-stage least squares (which is

another name for the IV estimator given in the lecture) and specify log(Qt) asthe dependent variable, log(Pt) and Icet as independent variables and cartelt andIcet as instruments. The demand elasticity is now estimated to be -0.872271 witha standard error of 0.131355. In line with the results from exercise 1, the resultpoints to an overestimation of the demand elasticity if OLS is applied (In exercise1 we did not use logarithmic transformations and no additional covariate but theresults would be similar). Further, the IV estimate is closer to -1 which would bethe demand elasticity expected from economic theory in monopolies. Finally, thestandard error of the IV estimator is larger than that of OLS. This is also expectedfrom theory and does not tell us that OLS should be preferred.

e) Start with the regression from part c) and save the residuals. Then, estimate theoriginal model by OLS as in a) but add the residuals from the auxiliary regression.Under the null hypothesis, that both OLS and IV are consistent, the coeffient ofthis new variable should be zero. However, we get a coefficient of 0.396247 witha t-ratio of 2.385 so that we reject the null hypothesis at the 5% level. Note thatin the Durbin-Wu-Hausman test the alternative hypothesis is inconsistency of OLSbut consistency of IV. Thus, consistency of IV is assumed and not tested.


1)

Note that E{|X}= 0 means

E

1 x11 . . . x1K...

...N xN1 . . . xNK

= 0orE{i|xjk}= 0 for everyi, j = 1, . . . , N andk = 1, . . . , K . Further note that conditionalmean independence implies uncorrelatedness, i.e.

E{i|xik}= 0 cov{i, xjk}= 0.

5


47/72

In our model we have

cov{yt1, t1}= cov{1+2yt2+3yt3+t1, t1}

=V{t1} = 0

if we assume contemporaneous uncorrelatedness of the regressors with the error term, i.e.

cov{yt1, t}= cov{yt2, t}= 0.

Consequently, the assumption E{|X} = 0 (which is necessary for unbiasedness) is notmet in this case whereas E{ixi}= 0 (necessary for consistency) might still hold.

6


48/72



This week the method of maximum likelihood estimation is introduced. This estimationtechnique is used for a wide range of econometric models and is therefore extremely useful.Binary choice models are also treated. Further models that required maximum likelihoodestimation are covered in the following week.

2 Reading

Obligatory reading: Verbeek, 6.1, 7.1.1-7.1.4

Additional reading:

Wooldridge, Relevant parts of Chapter 17

Greene, Relevant parts of Chapter 16 and 23

3 Exercises

1)

Consider a random variable X with an exponential distribution. The correspondingdensity function is given by ( >0):

f(x) =

ex x 0

0 x


49/72

2)

Using the expressions from the score contributions on slide 16 of lecture 9, to derive theinformation matrix, the asymptotic covariance matrix and an estimator for the varianceofin large samples for the case thatis the maximum likelihood estimator in the linearregression model with normally, independently and identically distributed errors. (Hint:Use the fact that for a normally distributed random variable Xwith zero mean it holdsthat E{X4} = 34, where 2 is the variance ofX.)


The dataset insurance.gdt provides data about the insurance status of U.S. citizens rep-resented by the binary variable insuredand further covariates.

a) Regressinsuredon the variableshealthy (a binary variable regarding self reportedhealth status), age, male, married and selfemp (binary variable; 1 = self em-ployed) using (i) the Logit and (ii) the Probit specification. Compare the results.

Which specification would you prefer?

b) Compute the marginal effects of your regression evaluated at the mean values ofthe regressors. Compare again the Logit and the Probit model.

c) Compute the estimated probabilities that a healthy and married man, aged 40, isinsured if he is (i) self-employed and (ii) not self-employed. Interpret the differencebetween (i) and (ii).

d) Imagine that people who are uninsured tend to start self-employment (maybe be-cause they are jobless). Would it then be correct to interpret the marginal effect

from part c) as a causal effect?

e) Augment your model by using age2 as an additional regressor. Does this improveyour model? Compute the marginal effect of age.

f) Include the additional regressor malemarried. For which persons is the variableequal to one? Is the effect of marriage on the probability to be insured higher formen or for women?

2


50/72


1)

a)

cov(ui, xi) =cov(ui, 1+2yi+vi)

=cov(ui, 2yi)

=2cov(ui, 1+2xi+ui)

=2(2cov(ui, xi) +V ar(ui))

=22cov(ui, xi) +22u

cov(ui, xi) 22cov(ui, xi) =22u

cov(ui, xi) = 2

2u

1 22

b) The result from part a) shows that xi is endogenous so that the fundamental as-sumption for the consistency of Ordinary Least Squares (OLS) is not met. Thus,OLS is not an appropriate estimator in this case.

2)

a) The assumptions mean that the variables z2 and z3 are exogenous for both equa-tions, i.e. we can think ofz2 and z3 as being generated from outside the model.

b) No. OLS would be inconsistent because of simultaneous equations bias which canbe derived similarly to exercise 1. We could estimate1 consistently if we wouldhave a valid instrument for x. However, the variables which are relevant for xaccording to the seccond equation, y and z2, can not serve as instruments since yis, of course, endogenous and z2 is already included in the first equation so that itdoes not provide an additional moment condition.

c) Yes. Although OLS in inconsistent for the same reasons as in b), we could use z3as an instrument for y and then consistently estimate 2.

3)

a) The first stage of the 2SLS approach is an OLS regression of all endogenous regres-sors on all instruments. For instance, if we have a single endogenous regressor, sayxk, there is one first-stage regression wherexk is regressed on the other explanatoryvariables of the original equation (1, x2, . . . , xk1, xk+1, . . . , xK) and the excluded in-struments. The vector of predicted values from the first-stage regression, xk =Z(where Zis the matrix containing the exogenous explanatory variables and the ex-cluded instruments and are the OLS estimates from the first-stage regression), isthen substituted for xk in the original equation and the modified original equation

is then estimated by OLS.

3


51/72

b) From slide 30 in lecture 8:

IV = (XX)1Xy

= ((Z(ZZ)1ZX)Z(ZZ)1ZX)1(Z(ZZ)1ZX)y

= (XZ(ZZ)1

ZZ(ZZ)1ZX)1XZ(ZZ)1

Zy

= (XZ(ZZ)1ZX)1XZ(ZZ)1Zy

If R = K, ZX (and XZ) is a square matrix and if we assume the regulatorycondition that ZX is invertible (similar to the required existence of (XX)1 forOLS) we have

IV = (ZX)1ZZ(XZ)1XZ(ZZ)1Zy

= (ZX)1ZZ(ZZ)1Zy

= (ZX)1Zy

4)

a) The coefficient for the demand elasticity is now -0.985085. (Last week we haveestimated demand elasticites of -0.872271 by IV without cartelt1 and cartelt2and -0.63433 by OLS.)

b) In general, additional instruments can help to improve the efficiency of the IVestimator, i.e. they may reduce the variance of the estimator.

c) As expected from part b), the standard error of the demand elasticity is now0.119144 compared to 0.131355 when cartelt1 and cartelt2 are not included.

d) The risk of using additional instruments is that the exogeneity assumption is notmet for the additional instruments. Then, the IV estimator is not consistent. In ourcase, it is hard to argue that cartelt1 and cartelt2 are not exogenous if we arguethatcarteltis exogenous. Since such reasoning is often sensible, lagged instrumentsare quite often applied in practice.

e) Save the residuals from your model and regress them by OLS on the full set ofinstruments, i.e. a constant, Icet, cartelt, cartelt1 and cartelt2. The R

2 from

this regression is 0.016274, so that our test statistic is 326 0.016274 = 5.305. Thetest statistic is 2 distributed with R K= 2 degrees of freedom under the nullhypothesis. Using GRETLs pvalue finder we get a value 0.0704748 so that we donot reject the null hypothesis at a significance level of 5%. Thus, there is no strongevidence against the exogeneity of our additional instruments.

f) Applying Newey-West standard errors increases the standard error of the demandelasticity substantially to 0.212288. This is not surprising given that the Durbin-Watson statistic is 0.461790 so that the first-order autocorrelation is about 1 0.461790/2 = 0.769 (see exercise 2 from week 6 and lecture 6, slide 15) pointing to

considerable autocorrelation.

4


52/72

g) Since we have evidence for autocorrelation we should not usetorFdistributions but

we can use the asymptotic result that under the null hypothesis t =2+1se(2) N(0, 1)

or, equivalently,t2 21(the latter is a Wald test). Doing this in GRETL ( Tests Linear restrictions b[2] = 1) gives a p value of 0.943987 so that the null

hypothesis is not rejected.


1)

OLS minimizes the residual sum of squares and therefore maximizes the R2. Any otherestimator, including instrumental variables, results in a lowerR2. Note that we are oftennot interested in obtaining an R2 that is as high a possible, but in obtaining consistentestimates for the coefficients of interest that are as accurate as possible. The R2 does not

tell us which estimator is the preferred one. The R2 tells us how well the model fits thedata (in a given sample) and typically is only interpreted in this way when the model isestimated by OLS.

5


53/72



This week a number of topics will be treated in somewhat less detail. First, binary choicemodels will be studied in more detail with emphasis on the measurement of the goodnessof fit and the interpretation of an underlying latent variable. This interpretation becomesuseful for other models introduced this week. Next, the model is extended to multiplepossible outcomes, but we restrict attention to ordered probit and logit models, where oneneeds to assume a natural ordering of the variables. The more general case of multinomialmodels is not treated, but the interested student may read the relevant section in thebook. Furthermore, so called count data models are treated, that allow modeling thenumber of certain events. Finally, the important topic of censored regression with thetobit model will be treated briefly.

The treatment of these topics is necessarily somewhat superficial. However, it is impor-tant to know they exist and what the underlying principles are. If you are interestedin more detail you may consider reading additional sections in Verbeek or look at thetreatment of these models in other textbooks (which may be much better and moredetailed).

2 Reading

Obligatory reading: Verbeek, 7.1.5-7.1.6, 7.2.1-7.2.4, 7.3 and 7.4.1-7.4.3. The part on thetruncated regression on page 234 may be skipped.

Furthermore, you also have to read page 433 in Appendix B.

Additional reading: As usually, the treatment in the other books is different from Verbeek,so if you are interested read:

in Wooldridge, the relevant parts of Chapter 17

and

in Greene, the relevant parts of Chapter 16 and 23

1


54/72

3 Exercises

1) (adapted from Greene)

Consider the number of tickets demanded for events in a certain sports arena. Denote

this variable by Y

. What is actually observed is the number of tickets sold (Y) whichis equal to Y if the arena is not sold out and equal to 20000, the maximum capacity,otherwise. Suppose that the mean number of tickets sold is 18000 and that the arena issold out for 25% of all events. Calculate the mean number of tickets demanded under theassumption that Y N(, 2). (Hints: Use that E[Y|Y < a] =+

((a)/)

((a)/)

and that (0.675)/(0.675) = 0.424.)

2)

The dataset credit.gdt (see Verbeek, 7.2.3) is a sample of US firms in 2005 containingtheir Standard & Poors credit ratings and several explanatory variables. (There seemsto be an error in the formulas in Verbeek, p. 217, where the expressions on the righthand side lack a 1 in front of them.)

a) Use an Ordered Logit model to regress rating on booklev (book leverage) andebit (earnings before interest and taxes/total assets). Interpret the coefficients.

b) What is the probability that a firm with 50% book leverage and 10% ebit has arating of 4?

c) What is the probability that a firm with 50% book leverage and 10% ebit has arating of 4 or more?

d) Estimate a Logit model using the binary dependent variable invgrade (Investmentgrade rating) and the same regressors as before. Compare the results. What is youranswer for c) under this model? (A rating of 4 or more is defined to be investmentgrade).

3)

Use the dataset patents.gdt (see Verbeek, 7.3.2) which is a sample of 181 internationalmanufacturing firms containing data on the number of patent applications (patents),the expenditures for research and development (R&D) and several other variables.

a) Regress patents on the logarithm of R&D expenditures, the dummy variables forthe different industries (the reference category is food, fuel, metal and others) andthe dummy variables for the country (the reference category is Europe) by using(i) the Poisson model and (ii) the negative binomial model (NegBin II). Use theoption for robust standard errors.

b) Test for overdispersion in the Poisson model.

c) Interpret the coefficient of log(R&D) in the negative binomial model.

d) Interpret the coefficient of the USA dummy in the negative binomial model.

2


55/72

e) Estimate a linear model instead (by OLS and with patents and not log(patents) asthe dependent variable) and use heteroscedasticity-robust standard errors. (i) Showthat homoscedasticity can be excluded if the Poisson model is correctly specified.(ii) What is the elasticity of the number of patents with respect to R&D expendi-tures according to the linear model for a US firm in the computers industry with

R&D expenditures o

Econometrics WS11-12 Course Manual

Documents