Economics 105: Statistics • RAP is due via email at 5:15 last day of exams. Please save as a PDF file first. And email the Excel file separately.
Jan 02, 2016
Economics 105: Statistics• RAP is due via email at 5:15 last day of exams. Please save as a PDF file first. And email the Excel file separately.
Violations of GM AssumptionsAssumption Violation
Homoskedastic errors
Wrong functional formOmit Relevant Variable (Include Irrelevant Var)Errors in VariablesSample selection bias, Simultaneity bias
Model is linear in parameters, the betas (4)i.i.d. sample of data (5)
Heteroskedastic errors
There exists serial correlation in errors
No serial correlation of errors
I know! We can save the model,
but not until Eco205.
Holy endogeneity,
Batman!
Nature of Serial Correlation• Violation of (3)•
• Error in period t is a function of error in prior period alone: first-order autocorrelation, denoted AR(1) for “autoregressive” process
• Usual assumptions apply to new error term
• is positive serial correlation• is negative serial correlation
Nature of Serial Correlation• Error in period t can be a function of error in more
than one prior period • Second-order serial correlation
• Higher orders generated analogously • Seasonally-based serial correlation
Causes of Serial Correlation• The error term in the regression captures
• Measurement error• Omitted variables, that are uncorrelated with the
included explanatory variables (hopefully)• Frequently factors omitted from the model are correlated over
time
1. Persistence of shocks• Effects of random shocks (e.g., earthquake, war, labor
strike) often carry over through more than one time period2. Inertia
• times series for GNP, (un)employment, output, prices, interest rates, etc. follow cycles, so that successive observations are related
Causes of Serial Correlation3. Lags
• Past actions have a strong effect on current ones• Consumption last period predicts consumption this period
4. Misspecified model, incorrect functional form5. Spatial serial correlation
• In cross-sectional data on regions, a random shock in one region can cause the outcome of interest to change in adjacent regions
• “Keeping up with the Joneses”
Consequences for OLS Estimates• Using an OLS estimator when the errors are autocorrelated
results in unbiased estimators• However, the standard errors are estimated incorrectly
– Whether the standard errors are overstated or understated depends on the nature of the autocorrelation
– For positive AR(1), standard errors are too small!– Any hypothesis tests conducted could yield erroneous results– For positive AR(1), may conclude estimated coefficients ARE
significantly different from 0 when we shouldn’t !• OLS is no longer BLUE
– A pattern exists in the errors • Suggesting an estimator that exploited this would be more efficient
Detection of Serial Correlation• Graphical
Detection of Serial Correlation• Graphical
no obvious pattern—the errors seem
random. Sometimes, however,
the errors follow a pattern—they are correlated across
observations, creating a situation in which
the observations are not independent with
one another.
Here the residuals do not seem
random, but rather seem to follow a
pattern.
Detection of Serial Correlation
Detection: The Durbin-Watson Test• Provides a way to test
H0: = 0
• It is a test for the presence of first-order serial correlation
• The alternative hypothesis can be– 0– > 0: positive serial
correlation• Most likely alternative in
economics– < 0: negative serial
correlation
• DW Test statistic is d
Detection: The Durbin-Watson Test• To test for positive serial correlation with the
Durbin-Watson statistic, under the null we expect d to be near 2– The smaller d, the more likely the alternative
hypothesisThe sampling distributionof d depends on the values of the explanatory variables. Since every problem has a different set of explanatory variables, Durbin and Watson derived upper and lower limitsfor the critical value of the test.
Detection: The Durbin-Watson Test• Durbin and Watson derived upper and lower
limits such that d1 d* du
• They developed the following decision rule
Detection: The Durbin-Watson Test• To test for negative serial correlation the decision
rule is
• Can use a two-tailed test if there is no strong prior belief about whether there is positive or negative serial correlation—the decision rule is
Serial Correlation• Table of critical values for Durbin-Watson statistic (table E11, page 833 in BLK textbook)•http://hadm.sph.sc.edu/courses/J716/Dw.html
Serial Correlation Example• What is the effect of the price of oil on the number of wells drilled in the U.S.?•
Year
Total Wells Drilled
real price per bbl
Average Price per bbl
Producer Price Index
1930 212327.98657
7 1.19 14.9
1931 12432 5.15873 0.65 12.6
1932 150407.76785
7 0.87 11.2
1933 123125.87719
3 0.67 11.4
1934 189177.75193
8 1 12.9
1935 214207.02898
6 0.97 13.81987 3519414.9805
4 15.4 102.8
1988 32479 11.76801 12.58 106.9
1989 2782414.1354
7 15.86 112.2
1990 27941 17.2227 20.03 116.3
1991 2996014.1630
9 16.5 116.5
Serial Correlation Example• What is the effect of the price of oil on the number of wells drilled in the U.S.?•
Serial Correlation Example• Analyze residual plots … but be careful …
Serial Correlation Example• Remember what serial correlation is …
• This plot only “works” if obs number is in same order as the unit of time
Serial Correlation Example• Same graph when plot versus “year”
• Graphical evidence of serial correlation
Serial Correlation Example• Calculate DW test statistic• Compare to critical value at chosen sig level
– dlower or dupper for 1 X-var & n = 62 not in table
– dlower for 1 X-var & n = 60 is 1.55, dupper = 1.62
• Since .192 < 1.55, reject H0: = 0 in favor of H1: > 0 at α=5%
ObservationPredicted Total Wells Drilled Residuals e(t-1) e(t) - e(t-1) (e(t)-e(t-1))^2 e(t)^2 Year
1 31744.01844 -10512.01844 110502532 1930
2 24780.30007 -12348.30007 -10512 -1836.28 3371930.199 152480515 1931
3 31205.40913 -16165.40913 -12348.3 -3817.11 14570321.58 261320452 1932
4 26549.55163 -14237.55163 -16165.4 1927.857 3716634.527 202707876 1933
5 31166.20738 -12249.20738 -14237.6 1988.344 3953512.848 150043081 1934
6 29385.89982 -7965.899815 -12249.2 4283.308 18346723.71 63455559.9 1935
61 54488.44454 -26547.44454 -19062 -7485.46 56032054.78 704766811 1990
62 46953.99846 -16993.99846 -26547.4 9553.446 91268331.83 288795984 1991
SUM 1257013355 6517936259
Do’s and Don’ts• Do interpret coefficients carefully by keeping in mind the units of X and of Y
• Do discuss separately – and not conflate – statistical significance and economic magnitude, i.e., the size of the estimated effect (of X on Y)
• Do not say one variable is “more significant” or “more important” than another because it has a smaller p-value
• p-values are measures of evidence (against H0)
• p-values do not give us info about the magnitude of the effect (i.e., the “effect size”)
Do’s and Don’ts• Do not say one variable is “more significant” or “more important” than another because is twice as big as
• remember the ceteris paribus interpretation• don’t compare the magnitudes of coefficients unless they are measured in the same units
• Do not assume that two estimated coefficients are different from one another if one is statistically significant and the other isn’t
• Gelman & Stern (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ is not Itself Statistically Significant,” American Statistician, vol. 60, no. 4