Top Banner
1 Research Method Research Method Lecture 7 (Ch14) Lecture 7 (Ch14) Pooled Cross Pooled Cross Sections and Sections and Simple Panel Data Simple Panel Data Methods Methods ©
47

1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Mar 31, 2015

Download

Documents

Danny Capes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

1

Research MethodResearch Method

Lecture 7 (Ch14)Lecture 7 (Ch14)

Pooled Cross Pooled Cross Sections and Sections and

Simple Panel Data Simple Panel Data MethodsMethods

©

Page 2: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

An independently pooled An independently pooled cross section cross section

This type of data is obtained by sampling randomly from a population at different points in time (usually in different years)

You can pool the data from different year and run regressions.

However, you usually include year dummies.

2

Page 3: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Panel dataPanel data

This is the cross section data collected at different points in time.

However, this data follow the same individuals over time.

You can do a bit more than the pooled cross section with Panel data.

You usually include year dummies as well.

3

Page 4: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Pooling independent Pooling independent cross sections across cross sections across

time.time. As long as data are collected independently, it

causes little problem pooling these data over time.

However, the distribution of independent variables may change over time. For example, the distribution of education changes over time.

To account for such changes, you usually need to include dummy variables for each year (year dummies), except one year as the base year

Often the coefficients for year dummies are of interest.

4

Page 5: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Example 1Example 1 Consider that you would like to see

the changes in fertility rate over time after controlling for various characteristics.

Next slide shows the OLS estimates of the determinants of fertility over time. (Data: FERTIL1.dta)

The data is collected every other year. The base year for the year dummies

are year 1972.5

Page 6: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

6 _cons -7.844731 3.038574 -2.58 0.010 -13.80672 -1.882745 y84 -.5112715 .1496524 -3.42 0.001 -.8049044 -.2176385 y82 -.4892665 .1482989 -3.30 0.001 -.7802437 -.1982893 y80 -.037886 .1598956 -0.24 0.813 -.3516171 .2758452 y76 -.0639849 .1556646 -0.41 0.681 -.3694143 .2414445 y74 .301226 .1488953 2.02 0.043 .0090786 .5933735 smcity .2092197 .1600797 1.31 0.191 -.1048727 .5233121 town .0825938 .124396 0.66 0.507 -.1614836 .3266712 othrural -.1662171 .1751486 -0.95 0.343 -.5098761 .177442 farm -.0553556 .146947 -0.38 0.706 -.3436803 .2329692 west .1989796 .1668093 1.19 0.233 -.1283168 .5262761 northcen .3616071 .1207846 2.99 0.003 .1246157 .5985984 east .2180929 .1327211 1.64 0.101 -.042319 .4785049 black 1.077747 .1733806 6.22 0.000 .7375571 1.417937 agesq -.0058384 .001561 -3.74 0.000 -.0089013 -.0027756 age .535383 .1380659 3.88 0.000 .264484 .8062821 educ -.1287556 .0183209 -7.03 0.000 -.164703 -.0928081 kids Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 3085.5093 1128 2.73538059 Root MSE = 1.5542 Adj R-squared = 0.1169 Residual 2686.24374 1112 2.41568682 R-squared = 0.1294 Model 399.265559 16 24.9540975 Prob > F = 0.0000 F( 16, 1112) = 10.33 Source SS df MS Number of obs = 1129

. reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y80 y82 y84

Dependent variable =# kids per woman

Page 7: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

The number of children one woman has in 1982 is 0.49 less than the base year. Similar result is found for year 1984.

The year dummies show significant drops in fertility rate over time.

7

Page 8: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Example 2Example 2

CPS78_85.dta has wage data collected in 1978 and 1985.

we estimate the earning equation which includes education, experience, experience squared, union dummy, female dummy and the year dummy for 1985.

Suppose that you want to see if gender gap has changed over time, you include interaction between female and 1985; that is you estimate the following.

8

Page 9: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Log(wage)=β0+β1(educ)

+β2(exper)+β3(expersq)+β4(Union)

+β5(female)

+β6(year85)

+β7(year85)(female)

You can check if gender wage gap in 1985 is different from the base year (1978) by checking if β7 is equal to zero or not.

The gender gap in each period is given by:

-gender gap in the base year (1978) = β5

-gender gap in 1985= β5+ β7

9

Page 10: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

10

_cons .3522088 .0763137 4.62 0.000 .2024683 .5019493 y85fem .0884046 .0513498 1.72 0.085 -.0123524 .1891616 y85 .3530916 .0333324 10.59 0.000 .2876877 .4184954 female -.3195333 .0366427 -8.72 0.000 -.3914324 -.2476341 union .205237 .0302943 6.77 0.000 .1457945 .2646795 expersq -.0003975 .0000776 -5.12 0.000 -.0005498 -.0002451 exper .0294761 .0035717 8.25 0.000 .0224679 .0364844 educ .0833217 .0050646 16.45 0.000 .0733841 .0932594 lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 319.091167 1083 .29463635 Root MSE = .41326 Adj R-squared = 0.4204 Residual 183.762464 1076 .170782959 R-squared = 0.4241 Model 135.328704 7 19.332672 Prob > F = 0.0000 F( 7, 1076) = 113.20 Source SS df MS Number of obs = 1084

. reg lwage educ exper expersq union female y85 y85fem

Coefficient for the interaction term (y85)(Female) is positive and significant at 10% significance level. So gender gap appear to have reduced over time. gender gap in 1978 =-0.319 gender gap in 1985=-0.319+0.088 =-0.231

Page 11: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Policy analysis with Policy analysis with pooled cross sections:pooled cross sections:

The difference in The difference in difference estimatordifference estimator

I explain a typical policy analysis with pooled cross section data, called the difference-in-difference estimation, using an example.

11

Page 12: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Example: Effects of Example: Effects of garbage incinerator on garbage incinerator on

housing priceshousing prices This example is based on the studies

of housing price in North Andover in Massachusetts

The rumor that a garbage incinerator will be build in North Andover began after 1978. The construction of incinerator began in 1981.

You want to examine if the incinerator affected the housing price.

12

Page 13: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Our hypothesis is the following.

Hypothesis: House located near the incinerator would fall relative to the price of more distant houses.

For illustration define a house to be near the incinerator if it is within 3 miles.

So create the following dummy variables nearinc =1 if the house is `near’ the

incinerator =0 if otherwise

13

Page 14: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Most naïve analysis would be to run the following regression using only 1981 data.

price =β0+β1(nearinc)+u

where the price is the real price (i.e., deflated using CPI to express it in 1978 constant dollar).

Using the KIELMC.dta, the result is the following

14

_cons 101307.5 3093.027 32.75 0.000 95192.43 107422.6 nearinc -30688.27 5827.709 -5.27 0.000 -42209.97 -19166.58 rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1.6367e+11 141 1.1608e+09 Root MSE = 31238 Adj R-squared = 0.1594 Residual 1.3661e+11 140 975815048 R-squared = 0.1653 Model 2.7059e+10 1 2.7059e+10 Prob > F = 0.0000 F( 1, 140) = 27.73 Source SS df MS Number of obs = 142

. reg rprice nearinc if year==1981

But can we say from this estimation that the incinerator has negatively affected the housing price?

Page 15: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

To see this, estimate the same equation using 1979 data. Note this is before the rumor of incinerator building began.

15

_cons 82517.23 2653.79 31.09 0.000 77280.09 87754.37 nearinc -18824.37 4744.594 -3.97 0.000 -28187.62 -9461.117 rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1.6696e+11 178 937979126 Root MSE = 29432 Adj R-squared = 0.0765 Residual 1.5332e+11 177 866239953 R-squared = 0.0817 Model 1.3636e+10 1 1.3636e+10 Prob > F = 0.0001 F( 1, 177) = 15.74 Source SS df MS Number of obs = 179

. reg rprice nearinc if year==1978

Note that the price of the house near the place where the incinerator is to be build is lower than houses farther from the location.

So negative coefficient simply means that the garbage incinerator was build in the location where the housing price is low.

Page 16: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Now, compare the two regressions.

16

_cons 82517.23 2653.79 31.09 0.000 77280.09 87754.37 nearinc -18824.37 4744.594 -3.97 0.000 -28187.62 -9461.117 rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1.6696e+11 178 937979126 Root MSE = 29432 Adj R-squared = 0.0765 Residual 1.5332e+11 177 866239953 R-squared = 0.0817 Model 1.3636e+10 1 1.3636e+10 Prob > F = 0.0001 F( 1, 177) = 15.74 Source SS df MS Number of obs = 179

. reg rprice nearinc if year==1978

_cons 101307.5 3093.027 32.75 0.000 95192.43 107422.6 nearinc -30688.27 5827.709 -5.27 0.000 -42209.97 -19166.58 rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1.6367e+11 141 1.1608e+09 Root MSE = 31238 Adj R-squared = 0.1594 Residual 1.3661e+11 140 975815048 R-squared = 0.1653 Model 2.7059e+10 1 2.7059e+10 Prob > F = 0.0000 F( 1, 140) = 27.73 Source SS df MS Number of obs = 142

. reg rprice nearinc if year==1981

Year 1978 regression

Year 1981 regression

Compared to 1978, the price penalty for houses near the incinerator is greater in 1981.

Perhaps, the increase in the price penalty in 1981 is caused by the incinerator

This is the basic idea of the difference-in-difference estimator

Page 17: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

The difference-in-difference estimator in this example may be computed as follows. I will show you more a general case later on.

The difference-in-difference estimator :

= (coefficient for nearinc in 1981) ‒ (coefficient for nearinc in 1979) = ‒ 30688.27 ‒(‒ 18824.37)= ‒11846

17

So, incinerator has decreased the house prices on average by $11846.

Page 18: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Note that, in this example, the coefficient for (nearinc) in 1979 is equal to

18

Average price of houses near the incinerator

Average price of houses not near the incinerator

This is because the regression includes only one dummy variable: (Just recall Ex.1 of the homework 2).

Therefore the difference in difference estimator in this example is written as.

far1979,near1979,far1981,near1981,1 Price)(Price)(Price)(Price)(

This is the reason why the estimator is called the difference in difference estimator.

Page 19: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Difference in difference Difference in difference estimator: More general estimator: More general

case.case. The difference-in-difference estimator can be estimated by running the

following single equation using pooled sample.

price =β0+β1(nearinc)

+β2(year81)+δ1(year81)(nearinc)

19

Difference in difference estimator

Page 20: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

20

_cons 82517.23 2726.91 30.26 0.000 77152.1 87882.36 y81nrinc -11863.9 7456.646 -1.59 0.113 -26534.67 2806.867 y81 18790.29 4050.065 4.64 0.000 10821.88 26758.69 nearinc -18824.37 4875.322 -3.86 0.000 -28416.45 -9232.293 rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 3.5099e+11 320 1.0969e+09 Root MSE = 30243 Adj R-squared = 0.1661 Residual 2.8994e+11 317 914632739 R-squared = 0.1739 Model 6.1055e+10 3 2.0352e+10 Prob > F = 0.0000 F( 3, 317) = 22.25 Source SS df MS Number of obs = 321

. reg rprice nearinc y81 y81nrinc

Difference in difference estimator

This form is more general since in addition to policy dummy (nearinc), you can include more variables that affect the housing price such as the number of bedrooms etc. When you include more variables, cannot be expressed in a simple difference-in-difference format. However, the interpretation does not change, and therefore, it is still called the difference-in-difference estimator

Page 21: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Natural experiment (or Natural experiment (or quasi-experiment)quasi-experiment)

The difference in difference estimator is frequently used to evaluate the effect of governmental policy.

Often governmental policy affects one group of people, while it does not affect other group of people. This type of policy change is called the natural experiment.

For example, the change in spousal tax deduction system in Japan which took place in 1995 has affected married couples but did not affect single people.

21

Page 22: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

The group of people who are affected by the policy is called the treatment group.

Those who are not affected by the policy is called the control group.

Suppose that you want to know how the change in spousal tax deduction has affected the hours worked by women. Suppose, you have the pooled data of workers in 1994 and 1995.

The next slide shows the typical procedure you follow to conduct the difference-in-difference analysis.

22

Page 23: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Step 1: Create the treatment dummy such that

Dtreat =1 if the person is affected by the policy change

=0 otherwise.

Step 2: Run the following regression.

(Hours worked)=β0+β1Dtreat+ β0(year95) +δ1(Year95)(Dtreat)+u

23

Difference in difference estimator. This shows the effect of the policy change on the women’s hours worked.

Page 24: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Two period panel data Two period panel data analysisanalysis Motivation:

Remember the effects of employee training grant on the scrap rate. You estimated the following model for the 1987 data.

24

vemploymentsalesgrantScrap )log()log()()log( 3210

_cons 4.986779 4.655588 1.07 0.290 -4.384433 14.35799 lemploy .6394289 .3651366 1.75 0.087 -.095553 1.374411 lsales -.4548425 .3733152 -1.22 0.229 -1.206287 .2966021 grant -.0517781 .4312869 -0.12 0.905 -.9199137 .8163574 lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 95.0906112 49 1.94062472 Root MSE = 1.3854 Adj R-squared = 0.0110 Residual 88.2852083 46 1.91924366 R-squared = 0.0716 Model 6.8054029 3 2.26846763 Prob > F = 0.3270 F( 3, 46) = 1.18 Source SS df MS Number of obs = 50

. reg lscrap grant lsale lemploy if year==1988

You did not find the evidence that receiving the grant will reduce scrap rate.

Page 25: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

The reason why we did not find the significant effect is probably due to the endogeneity problem.

The company with low ability workers tend to apply for the grant, which creates positive bias in the estimation. If you observe the average ability of the workers, you can eliminate the bias by including the ability variable. But since you cannot observe ability, you have the following situation.

25

v

uabilityemploymentsalesgrantScrap )()log()log()()log( 33210

where ability is in the error term v. v=(β3ability+u) is called the composite error term.

Page 26: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Because ability and grant are correlated (negatively), this causes a bias in the coefficient for (grant).

We predicted the direction of bias in the following way.

26

v

uabilityemploymentsalesgrantScrap )()log()log()()log( 33210

)(

)(

1

)(

4

)(

11

~ˆˆ~

True effect of grant Bias term

Effect of ability on scrap rate

Sign is determined by the correlation between ability and grant

The true negative effect of grant is cancelled out by the bias term. Thus, the bias make it difficult to find the effect.

Page 27: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Now you know that there is a bias. Is there anything we can do to correct for the bias?

When you have a panel data, we can eliminate the bias.

I will explain the method using this example. I will generalize it later.

27

Page 28: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Eliminating bias using two Eliminating bias using two period panel dataperiod panel data

Now, go back to the equation.

28

v

uabilityemploymentsalesgrantScrap )()log()log()()log( 43210

The grant is administered in 1988. Suppose that you have a panel data of firms for two period, 1987 and 1988.

Further assume that the average ability of workers does not change over time. So (ability) is interpreted as the innate ability of workers, such as IQ.

Page 29: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

When you have the two period panel data, the equation can be written as:

29

itv

itiit

itititit

uabilityyear

employmentsalesgrantScrap

)()88(

)log()log()()log(

45

3210

i is the index for ith firm. t is the index for the period.

Since ability is constant overtime, ability has only i index.

Now, I will use a short hand notation for β4(ability)i. Since (ability) is assumed constant over time, write β4(ability)i=ai. Then above equation can be written as:

Page 30: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

ai is called, the fixed effect, or the unobserved effect. If you want to emphasize that it is the unobserved firm characteristic, you can call it the firm fixed effect as well

uit is called the idiosyncratic error.

Now the bias in OLS occurs because the fixed effect is correlated with (grant).

So if we can get rid of the fixed effect, we can eliminate the bias. This is the basic idea.

In the next slide, I will show the procedure of what is called the first-differenced estimation.

30

itv

itiit

itititit

uayear

employmentsalesgrantScrap

)()88(

)log()log()()log(

5

3210

Page 31: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

First, for each firm, take the first difference. That is, compute the following.

31

1)log()log()log( ititit ScrapScrapScrap

It follows that,

ititititit

itiitit

itititiit

itititit

uyearemploymentsalesgrant

uayearemployment

salesgrantuayear

employmentsalesgrantScrap

)88()log()log()(

)]()88()log(

)log()([)()88(

)log()log()()log(

5321

11513

121105

3210

The first differenced equation.

Page 32: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

So, by taking the first difference, you can eliminate the fixed effect.

32

itititititit uyearemploymentsalesgrantScrap )88()log()log()()log( 5321

If ∆uit is not correlated with ∆(grant)it, estimating the first differenced model using OLS will produce unbiased estimates. If we have controlled for enough time-varying variables, it is reasonable to assume that they are uncorrelated.

Note that this model does not have the constant.

Now, estimate this model using JTRAIN.dta

Page 33: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

33

diffd88 -.0272418 .120639 -0.23 0.822 -.2705336 .2160501 difflemploy .0233784 .5064015 0.05 0.963 -.9978775 1.044634 difflsales -.1733036 .365626 -0.47 0.638 -.9106586 .5640514 diffgrant -.3223172 .1879101 -1.72 0.093 -.701274 .0566396 difflscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 18.79382 47 .399868511 Root MSE = .61142 Adj R-squared = 0.0651 Residual 16.0749657 43 .373836411 R-squared = 0.1447 Model 2.71885438 4 .679713595 Prob > F = 0.1428 F( 4, 43) = 1.82 Source SS df MS Number of obs = 47

. reg difflscrap diffgrant difflsales difflemploy diffd88 if year<=1988, nocons

. **********************

. * Run the regression *

. **********************

(157 missing values generated). gen diffd88=d88-L.d88

(181 missing values generated). gen difflemploy=lemploy-L.lemploy

(226 missing values generated). gen difflsales=lsales-L.lsales

(157 missing values generated). gen diffgrant=grant-L.grant

(363 missing values generated). gen difflscrap=lscrap-L.lscrap. ******************************. * variables *. * Generate first differenced *. ******************************

delta: 1 unit time variable: year, 1987 to 1989 panel variable: fcode (strongly balanced). tsset fcode year. **************************. * Declare panel *. **************************

Now, the grant is negative and significant at 10% level.

When you use ‘nocons’ option, the stata omits constant term.

Page 34: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Note that, when you use this method in your research, it is a good idea to tell your audience what the potential fixed effect would be and whether it is correlated with the explanatory variables. In this example, unobserved ability is potentially an important source of the fixed effect.

Off course, one can never tell exactly what the fixed effect is since it is the aggregate effects of all the unobserved effects. However, if you tell what is contained in the fixed effect, your audience can understand the potential direction of the bias, and why you need to use the first-differenced method.

34

Page 35: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

General caseGeneral case First differenced model in a more general

situation can be written as follows. Yit=β0+β1xit1+β2xit2+…+βkxitk+ai+uit

If ai is correlated with any of the explanatory variables, the estimated coefficients will be biased. So take the first difference to eliminate ai, then estimate the following model by OLS.

∆Yit=∆ β1xit1+ ∆ β2xit2+…+ ∆ xitk+∆ uit

35

Fixed effect

Page 36: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Note, when you take the first difference, the constant term will also be eliminated. So you should use `nocons’ option in STATA when you estimate the model.

When some variables are time invariant, these variables are also eliminated. If the treatment variable does not change overtime, you cannot use this method.

36

Page 37: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

First differencing for more First differencing for more than two periods.than two periods.

You can use first differencing for more than two periods.

You just have to difference two adjacent periods successively.

For example, suppose that you have 3 periods. Then for the dependent variable, you compute ∆yi2=yi2-yi1, and ∆yi3=yi3-yi2. Do the same for x-variables. Then run the regression.

37

Page 38: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

ExerciseExercise The data ezunem.dta contains the city level

unemployment claim statistics in the state of Indiana. This data also contains information about whether the city has an enterprise zone or not.

The enterprise zone is the area which encourages businesses and investments through reduced taxes and restrictions. Enterprise zones are usually created in an economically depressed area with the purpose of increasing the economic activities and reducing unemployment.

38

Page 39: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Using the data, ezunem.dta, you are asked to estimate the effect of enterprise zones on the city-level unemployment claim. Use the log of unemployment claim as the dependent variable

Ex1. First estimate the following model using OLS. log(unemployment claims)it =β0+β1(Enterprise zone)it

+β(year dummies)it+vit

Discuss whether the coefficient for enterprise zone is biased or not. If you think it is biased, what is the direction of bias?

Ex2. Estimate the model using the first difference method.

Did it change the result? Was your prediction of bias correct?

39

Page 40: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

40

_cons 11.69439 .125291 93.34 0.000 11.44724 11.94155 d88 -1.2575 .1847186 -6.81 0.000 -1.621887 -.893112 d87 -.9188151 .1847186 -4.97 0.000 -1.283203 -.5544275 d86 -.6511313 .1847186 -3.52 0.001 -1.015519 -.2867437 d85 -.6216534 .1847186 -3.37 0.001 -.986041 -.2572658 d84 -.5970717 .1799355 -3.32 0.001 -.9520237 -.2421197 d83 -.2192554 .1771882 -1.24 0.217 -.568788 .1302772 d82 .1354957 .1771882 0.76 0.445 -.2140369 .4850283 d81 -.3216319 .1771882 -1.82 0.071 -.6711645 .0279007 ez -.0387084 .1148501 -0.34 0.736 -.2652689 .187852 luclms Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 100.496279 197 .510133396 Root MSE = .58767 Adj R-squared = 0.3230 Residual 64.9262278 188 .345352276 R-squared = 0.3539 Model 35.5700512 9 3.95222791 Prob > F = 0.0000 F( 9, 188) = 11.44 Source SS df MS Number of obs = 198

. reg luclms ez d81 d82 d83 d84 d85 d86 d87 d88

OLS results

Page 41: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

41

lagd88 -1.192423 .1350488 -8.83 0.000 -1.459046 -.9257998 lagd87 -.8537383 .1269499 -6.72 0.000 -1.104372 -.6031047 lagd86 -.5860544 .1182979 -4.95 0.000 -.8196066 -.3525023 lagd85 -.5565765 .108961 -5.11 0.000 -.7716951 -.3414579 lagd84 -.5580256 .0945636 -5.90 0.000 -.7447196 -.3713315 lagd83 -.2192554 .0797852 -2.75 0.007 -.3767731 -.0617378 lagd82 .1354957 .0651444 2.08 0.039 .0068831 .2641083 lagd81 -.3216319 .046064 -6.98 0.000 -.4125748 -.2306891 lagez -.1818775 .0781862 -2.33 0.021 -.3362382 -.0275169 lagluclms Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 25.1496016 176 .142895463 Root MSE = .21606 Adj R-squared = 0.6733 Residual 7.79583815 167 .046681666 R-squared = 0.6900 Model 17.3537634 9 1.92819594 Prob > F = 0.0000 F( 9, 167) = 41.31 Source SS df MS Number of obs = 176

. reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87 lagd88, nocons

First differencing

Page 42: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

The do file used to generate the results.

tsset city year

reg luclms ez d81 d82 d83 d84 d85 d86 d87 d88

gen lagluclms =luclms -L.luclms gen lagez =ez -L.ez gen lagd81 =d81 -L.d81 gen lagd82 =d82 -L.d82 gen lagd83 =d83 -L.d83 gen lagd84 =d84 -L.d84 gen lagd85 =d85 -L.d85 gen lagd86 =d86 -L.d86 gen lagd87 =d87 -L.d87 gen lagd88 =d88 -L.d88

reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87 lagd88, nocons

42

Page 43: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

The assumptions for the The assumptions for the first difference method.first difference method.

Assumption FD1: Linearity

For each i, the model is written as

yit=β0+β1xit1+…+βkxitk+ai+uit

43

Page 44: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Assumption FD2:

We have a random sample from the cross section

Assumption FD3:There is no perfect collinearity. In

addition, each explanatory variable changes over time at least for some i in the sample.

44

Page 45: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Assumption FD4. Strict exogeneity

E(uit|Xi,ai)=0 for each i.

Where Xi is the short hand notation for ‘all the explanatory variables for ith individual for all the time period’.

This means that uit is uncorrelated with the current year’s explanatory variables as well as with other years’ explanatory variables.

45

Page 46: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

The unbiasedness of first The unbiasedness of first difference methoddifference method

Under FD1 through FD4, the estimated parameters for the first difference method are unbiased.

46

Page 47: 1 Research Method Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods ©

Assumption FD5: Homoskedasticity Var(∆uit|Xi)=σ2

Assumption FD6: No serial correlation within ith individual.

Cov(∆uit,∆uis)=0 for t≠s

Note that FD2 assumes random sampling across difference individual, but does not assume randomness within each individual. So you need an additional assumption to rule out the serial correlation.

47