Pooled and Panel Data Analysis 1 Topics Pooled Data Fixed Effects – Binary Variables Fixed Effects – Within Transformation Reference Baltagi, B. Econometric analysis of panel data. Third Edition. John Wiley & Sons. 2005, Chapters 1-4. Wooldridge, J. M. 2001. Econometric analysis of cross section and panel data. Cap. 10. Panel Data Econometrics Prof. Alexandre Gori Maia State University of Campinas
19
Embed
Pooled and Panel Data Analysis - Unicamp€¦ · regressand(omitted variable bias) using binary variables or the within transformation; Advantages of Panel Data Analysis 18. Exercise
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
PooledandPanelDataAnalysis
1
TopicsPooled Data
Fixed Effects – Binary Variables
Fixed Effects – Within Transformation
ReferenceBaltagi, B. Econometric analysis of panel data. Third Edition. John Wiley
& Sons. 2005, Chapters 1-4.
Wooldridge, J. M. 2001. Econometric analysis of cross section and panel
data. Cap. 10.
Panel Data EconometricsProf. Alexandre Gori MaiaState University of Campinas
Cross-Sectional data
iYni ,...,2,1= 1Y
2Y
nY
...
Time Series
tYTt ,...,2,1= 1Y 2Y TY...
Pooled Data
itYTni ,...,2,1= 11Y
21Y
11nY...
Panel Data
itY
Tt ,...,2,1=12Y22Y
22nY...
TY1
TY2
TnTY
...
... ni ,...,2,1=Tt ,...,2,1=
11Y21Y
1nY
...
12Y22Y
2nY
...
TY1
TY2
nTY
...
...
...
...
...
Different units in a specific period of time
The same unit in different periods of time
Cross-sectional samples (not necessarily the same) are observed in different periods of time
The same cross—sectional sample is observed in different periods of time
SampleDesigns
2
Balanced Panel Data Unbalanced Panel Data
Rotating Panel Data
itYSplit Panel
itY11Y21Y
11nY
12Y22Y
22nY
TY1
TY2
TnTY
...
...
...
Groups of cross-sectional units (rotation groups) are brought in and out of the sample in some periods.
Combines cross-sectional and panel samples at each period.
itY11Y21Y
1nY
...
12Y22Y
2nY
...
TY1
TY2
nTY
...
...
...
...
...
Each cross-sectional units is observed in all periods
itY11Y21Y
...
12Y
2nY
...21Y
3nY
...
...
...
...
...
Some cross-sectional units are not observed in some periods
11Y21Y 22Y
32YTnY 1-
nTY
... ... ... ...
PanelData- Examples
3
Assumes that the relation between Y and X is the same in both periods t=0 and 1.
Y
X
Constant intercept and slope coefficientsY
X
Y
X
eXY ++= bat=1
t=0
t=1
t=0
t=1
t=0
Assume that Y varies in time but the relation between Yand X remains constant.
Different intercepts and constant slope coefficients
etXY +++= dba
Both the intercept and the marginal impact of X on Ychange over time.
Different intercepts and slope coefficients
eXttXY +´+++= )(qdba
RegressionwithPooledData
4
PooledData- Definition
5
• Pooled data presents some main advantages when comparted to cross-sectional data: i) larger sample size; ii) allows us to identify changes in the relation over time;
• If we assume that the relation is the same over time:
• If we assume that the expected value of Y varies over time and the relation between Y and X remains constant:
• If we assume changes in both the expected value of Y and in the relation between Y and X over time:
ij
k
jj eXY ++= å
=10 bb
ij
k
jj etXY +++= å
=
dbb1
0
ij
k
jjj
k
jj etXtXY +´+++= åå
== 110 qdbb
Example– Stata&R
6
• Suppose we have a pooled data with information for the regressand y and two exogenous variables (x1 and x2) across two periods (t=0 and 1):
• The equivalent in R:
Example– Python
7
• The equivalent in Python:
Exercise
8
1) The dataset Data_AgricultureClimate.csv contains information on agricultural production and climate change in São Paulo, Brazil (GORI MAIA, A., MIYAMOTO, B. C, GARCIA, J. R. Climate change and agriculture: Do environmental preservation and ecossystem services matter? Ecoloogical Economics, v. 152 (October 2018), 2018):
a) Develop a regression model for pooled data to analyze the relation between the (log of) production value, (log of) area, temperature and precipitation;
b) Consider changes in the relation before and after 2005 (variable periodo);
Where c is an unobserved component, also called unobserved effect or unobserved heterogeneity. One main assumption in the panel data analysis is that the component c is constant over time. This means:
ccyE += xβx ),|(
• Assume that the relation between y and x ≡ (X1, X2, ..., Xk) is given by:
• When c isn’t correlated to the independent variables – Cov(Xj,c)=0 – then the omission of c in our model will not generate any kind of bias (omitted variable bias). In this case, we could apply OLS using models for pooled data (pooled regression). However, if Cov(Xj,c)≠0, the the pooled regression estimates are biased even for large samples.
Where E(eit|xit, ci) = 0
UnobservedHeterogeneity
11
itiitit ecY ++= βx
• The error eit is called idiosyncratic error, since it varies randomly for all cross-sectional units and periods.
• A simple solution to control the unobserved heterogeneity c is given by the fixed effects estimator with binary variables. This method assumes that cirepresents a parameter that can be estimated using the coefficient associated with the i-th binary variable:
• Suppose the model with unobserved heterogeneity given by:
itiitit ecY ++= βx
itnnkj jjit eIcIcXY
iiit+++++= å =
...221ba
Where Iji=1 if j=i, Iji=0 if j≠i. The estimators of de cj are called binary variables estimators. The name “fixed effect” come from the idea that c is considered to be a parameter (constant value in the population).
FixedEffects–BinaryVariables
12
• One main limitation of the fixed effects estimator with binary variable is that the number of binary variables may be quite large. Most estimates tend to be insignificant if the sample is not large enough to compensate the lost degrees of freedoms.
• Alternatively, through an algebraic transformation, we can estimate the same coefficients using the within estimators.
WithinTransformation
13)()()()( iitiiiitiit eeccYY -+-+-=- βxx ititit eY ~~~ += βx
itiitit ecY ++= βxSuppose the model with unobserved heterogeneity:
This relation is also valid for the average values of each cross-sectional unit:
iiii ecY ++= βx
Subtracting the equations, we have:
Since ci is constant over time, its average is the same than ci.
Yij~
xij~ eij
~
Example– Stata&R
14
• Suppose we have a panel with information for the regressandy and two exogenous variables (x1 and x2) across n cross-sectional units (variable cs=1..n) and T periods (variable time=1..T). The within estimator is given in Stata by:
• The equivalent in R:
Example– Stata&R
15
• The equivalent in Python
• The model with controls for the heterogeneity across cross-sectional units (ci) is also called one-way model:
Two-WayFixedEffectsEstimator
16
itTTikj jjit ePctPctcXY
ttit++++++= å =
...221ba
Where Pji=1 if j=t, Pji=0 if j≠t.
• We can extend this idea, using binary variables to control for the heterogeneity across periods t. The two-way model is:
itikj jjtit ecXY
it+++= å =1
ba
Example– Stata,R&Python
17
• The two-way estimator in Stata:
• The equivalent in R:
• The equivalent in Python:
1) Differences across individuals and periods: Panel data models allow us to use binaries to control the differences across cross-sectional units (individuals) and periods. Cross-sectional data does not provide enough degrees of freedom for such analysis;2) Degrees of freedom: the sample size of a panel data is the number of cross-sectional units multiplied by number of periods. In a cross-sectional (time series) data we only have the number of cross-sectional units (periods);3) Controlling for omitted variable bias: we can control for unobservables that are related to both the regressors and the regressand (omitted variable bias) using binary variables or the within transformation;
AdvantagesofPanelDataAnalysis
18
Exercise
19
1) The dataset Data_AgricultureClimate.csv contains information on agricultural production and climate variables in the state of São Paulo (GORI MAIA, A., MIYAMOTO, B. C, GARCIA, J. R. Climate change and agriculture: Do environmental preservation and ecossystem services matter? Ecoloogical Economics, v. 152 (October 2018), 2018):
a) Analyze the relation between the (log) value of agricultural production, (log) area, temperature and precipitation using the one-way fixed-effects estimators;
b) Now use two-way fixed-effects estimators, identifying the main differences in relation to (a);