BEE2006 UNIVERSITY OF EXETER BUSINESS SCHOOL May/June 2012 STATISTICS AND ECONOMETRICS Module Convenors: Dr. Paulo M.D.C. Parente Dr. Ana Fernandes Duration: TWO HOURS Answer ONLY ONE question from SECTION A, ONLY ONE question from SECTION B and BOTH questions from SECTION C. Use a separate answer booklet for each section. Materials to be supplied: Statistical Tables Instructions (please read before starting ): Write in a clear legible manner in ink/ballpoint. Do not use pencils or erasable pens. Approved calculators are permitted. Only one sheet (2 sides A4) of notes made exclusively by the student may be consulted (no material distributed by the teacher in any form is allowed). Whenever conducting a test use a 5% significance level unless stated otherwise. Also be sure to state null and alternative hypotheses, null distribution (with degrees of freedom), rejection criterion (critical values and rejection region) and outcome. If you are asked to derive something, give all intermediate steps also. Do not answer questions with a “yes” or “no” only, but carefully justify your answer. 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BEE2006
UNIVERSITY OF EXETER
BUSINESS SCHOOL
May/June 2012
STATISTICS AND ECONOMETRICS
Module Convenors:Dr. Paulo M.D.C. Parente
Dr. Ana Fernandes
Duration: TWO HOURS
Answer ONLY ONE question from SECTION A, ONLY ONE question from SECTIONB and BOTH questions from SECTION C.
Use a separate answer booklet for each section.Materials to be supplied: Statistical Tables
Instructions (please read before starting): Write in a clear legible manner inink/ballpoint. Do not use pencils or erasable pens. Approved calculators are permitted.Only one sheet (2 sides A4) of notes made exclusively by the student may be consulted(no material distributed by the teacher in any form is allowed). Whenever conducting atest use a 5% significance level unless stated otherwise. Also be sure to state null andalternative hypotheses, null distribution (with degrees of freedom), rejection criterion(critical values and rejection region) and outcome. If you are asked to derive something,give all intermediate steps also. Do not answer questions with a “yes” or “no” only, butcarefully justify your answer.
1
Section A - Answer only one question
Question 1Consider the following model to explain child birth weight in terms of various factors
bwght = β0 + β1cigs + β2parity + β3faminc
+β4motheeduc + β5fatheeudc + u,
where u ∼ N (0, σ2) and the variables in the model are:bwght = birth weight in pounds;cigs = average number of cigarettes the mother smoked per day during pregnancy;parity = the birth order of the child;faminc = annual family income;motheduc = years of schooling for the mother;fatheduc = years of schooling for the father.
(a) (6 Marks) Does this regression model necessarily imply a causal relationship be-tween child’s birth weight and the regressors cigs, parity, faminc, motheduc andfatheduc? Justify your answer.
(b) (5 Marks) Interpret β3.
(c) (6 Marks) Using data from the US 1988 National Health Interview Survey thefollowing results were obtained
bwght = 114.524(3.7285)
− 0.5959(0.1104)
cigs + 1.7876(0.6594)
parity + 0.0560(0.0366)
faminc (1)
−0.3705(0.3199)
motheeduc + 0.4724(0.2826)
fatheeduc,
n = 1191, TSS = 482722.355, SSR = 464040.052,
where TSS is the Total Sum of Squares, SSR is the Sum of Squared Residuals,and the standard errors of estimated coefficients are reported in brackets. Test theindividual significance of motheeduc and fatheeduc at 10% level.
(d) (6 Marks) Test the significance of the overall regression.
(e) (6 Marks) Let u denote the residual of regressing bwght on cigs, parity and faminc
and consider the following regression
u = −0.9456(3.7285)
− 0.0019(0.1104)
cigs− 0.0447(0.6594)
parity − 0.011(0.0366)
faminc
−0.3705(0.3199)
motheeduc + 0.4724(0.2826)
fatheeduc,
R2 = 0.0024.
Are motheeduc and fatheeduc jointly significant?
2
(f) (6 Marks) The R2 of the regression of the squared of the residuals of (1) on cigs,parity, faminc, motheduc and fatheduc and respective squares is 0.0029. Test forHeteroskedasticity.
Question 2We are interested in investigating how the price of a house depends on the character-
istics of the house in Boston, US. We consider the model
log(price) = β0 + β1sqrft + β2bdrms + u,
where u ∼ N (0, σ2) and the variables in the model are:price = house price, in thousands of dollars;sqrft =size of house in square feet;bdrms =number of bedrooms.
(a) (5 Marks) Interpret β2.
(b) (6 Marks) Using data collected from the Boston Globe during 1990 the followingresults were obtained
log(price) = 4.76603(0.09704)
+ 0.00038(0.000040)
sqrft + 0.02888(0.02964)
bdrms,
n = 88, R2 = 0.5883,
(Standard errors of estimated coefficients are reported in brackets.) Test whetherthe size of house in square feet has a significant positive effect on log(price).
(c) (6 Marks) Test the overall significance of the regression.
(d) (6 Marks) We are interested in estimating and obtaining a confidence interval forthe percentage change in price when a 150-square-foot bedroom is added to a house.In decimal form, this is θ1 = 150β1 + β2. Estimate and construct a 95% confidenceinterval for θ1 given that the estimated covariance between the OLS estimator forβ1 and β2 is −0.000000681.
(e) (6 Marks) We now include the squares of bdrms in the regression model.
log(price) = 5.07139(0.27108)
+ 0.00038(0.000040)
sqrft− 0.13086(0.13573)
bdrms + 0.01999(0.01657)
bdrms2,
(2)
n = 88, R2 = 0.5883, SSR = 3. 2434.
Test whether the number of bedrooms affects the price of the house taking intoaccount that the R2 of the restricted model is 0.568.
3
(f) (6 Marks) Now we are interested in studying if the regression model differs betweencolonial houses and non-colonial houses. The regression for non-colonial housesyields
log(price) = 6.12642(0.63578)
+ 0.00033(8e−005)
sqrft− 0.76368(0.37576)
bdrms + 0.11269(0.05902)
bdrms2,
n = 27, R2 = 0.6366,
SSR = 0.94035.
Running a regression for colonial houses we obtain
log(price) = 4.7786(0.39637)
+ 0.0004(5e−005)
sqrft + 0.01041(0.18493)
bdrms + 0.00229(0.02126)
bdrms2,
n = 61, R2 = 0.6090,
SSR = 2.021.
Test whether the regression function is identical for colonial and non-colonial houses.
Section B- Answer only one question
Question 1
(a) To study the effect of women’s education (schooling) on fertility we estimate model(3) below where the dependent variable, kids, is the number of children born towomen aged between 35-54 and educ denotes the years of schooling. We also includeas regressors age and its squared term agesq, a binary variable that takes the valueof one if the individual is black and zero otherwise, black; a binary variable thattakes the value of one if the individual lived in a rural area at the age of 16, othrural;and a binary variable taking the value of one if the individual lived in a small cityat the age of sixteen and zero otherwise, smcity.
One could argue that education, educ, is not an exogenous determinant of fertility.Women’s education could be correlated with unobservable characteristics that arejointly determined with fertility. We have two instrumental variable candidates foreducation, the individual’s father’s years of education, feduc, and the individual’smother years of education, meduc. We estimate a number of models, provided
4
below, using OLS and Two Stage Least Squares (2SLS).
Model 1: OLS, using observations 1—1129Dependent variable: kids
Sargan over-identification test — Null hypothesis: all instruments are validTest statistic for over-identification: LM = 0.0582575 with p-value = 0.809272
(i) (5 Marks) Specify the equation for educ and explain why the parameters ofthat equation can be estimated by OLS.
(ii) (6 Marks) Use the relevant output from above to test for instrumental variablerelevance and assess whether meduc and feduc are suitable instruments foreduc.
(iii) (6 Marks) What do you conclude regarding Sargan´s over-identification testresult? (provided at the end of the output for Model 4).
(iv) (6 Marks) Using the relevant output from above, conduct Hausman´s endo-geneity test. Provide the null, the alternative hypothesis and the numericalvalue of the test. What do you conclude regarding the endogeneity of educ?
6
(v) (6 Marks) Since there is no presence of heteroskedasticity the usual standarderrors are reported in all estimated models. Bearing this into considerationand given your decision regarding Hausman´s endogeneity test which is yourpreferred estimate of parameter β1? Why?
(b) (6 Marks) Consider a simple model to estimate the effect of computer ownershipon the average mark of graduating students at a large UK university:
MARK = β0 + β1PC + u.
Is it reasonable to assume that PC ownership is likely to be uncorrelated with u?Explain.
Question 2
(a) (5 Marks) Consider the multiple regression model:
yt = β0 + β1xt1 + ... + βkxtk + ut.
Assume that the explanatory variables, xtj, are strictly exogenous. Further, utfollows an AR(q) process:
ut = ρ1ut−1 + ρ2ut−2 + ... + ρqut−q + et.
Explain how you would test for serial correlation.
(b) (6 Marks) Specify and explain the meaning of the contemporaneous exogeneityassumption for explanatory variables in time series analysis.
(c) Consider the following partial adjustment model:
y∗t = γ0 + γ1xt + et,
yt − yt−1 = λ(y∗t − yt−1) + at, 0 < λ < 1,
where y∗t is the desired growth in firm inventories and yt is the actual (observed)growth. xt represents the growth in firm sales. The parameter γ1 measures theeffect of xt on y∗t .
(i) (6 Marks) Explain what the second equation describes and how you wouldinterpret the parameter λ.
(ii) (6 Marks) Show that we can write:
yt = β0 + β1yt−1 + β2xt + ut.
In particular, provide the expressions for the β’s in terms of γ’s and λ andfind ut in terms of et and at.
(iii) (6 Marks) If E(et|xt, yt−1,xt−1, ...) = 0 and E(at|xt, yt−1,xt−1, ...) = 0, and allseries are weakly dependent, how would you estimate the β’s in the model ofpart (ii) above? Explain.
(iv) (6 Marks) If β1 = 0.7 and β2 = 0.2, what are the estimates of γ1 and λ?
7
Section C- Answer both questions
Question 1Are the following statements correct? (Justify carefully your answers)
(a) (5 Marks) From asymptotic theory we learn that - under appropriate conditions -the error terms in a regression model will be approximately normally distributed ifthe sample size is sufficiently large.
(b) (5 Marks) In a random sample under the assumption of homoskedasticity the gener-alised least squares estimator and the ordinary least squares estimator are identical.
(c) (5 Marks) Suppose that we want to estimate the effect of several variables on annualsaving and that we have a panel data set on individuals collected on January 20,2000, and January 20, 2002. If we include a year dummy for 2002 and use firstdifferencing, we can also include age in the original model.
(d) (5 Marks) We can use first differences when we have independent cross sections intwo years.
Question 2 (10 Marks)Consider the linear regression model
yi = β + ui, i = 1, ..., n,
E(ui|xi) = 0, var(ui|xi) = σ2,
where the observations {(yi, xi), i = 1, ..., n} are independent. Let