A short introduction to applied econometrics Part D: Panel Data Analysis presented by Dipl. Volkswirt Gerhard Kling.

A short introduction to applied econometrics

Part D: Panel Data Analysis

presented by

Dipl. Volkswirt Gerhard Kling

Advantages of panel analysis

More observationsMore degrees of

freedomReduced

multicollinearity

Improved efficiency (unbiased estimator with smallest variance for all possible true parameter values)

Pooling of cross sectional and time series data

Stems from more observations

Especially a problem in distributed lag model

Advantages of panel analysis

Wider range ofproblems

Causality discussion

you can test new hypothesis on individual behavior or policy changes that affect several entities

Dynamics of change e.g. labor market participation

Time structure facilitates discussion

The importance of the data structure

• Example: 11 countries over 10 years

• General note: cross-sectional dimension should be larger than time dimension

• But: many new models currently developed

• Very fertile field for research!

• I prefer the following data structure

The importance of the data structurename code year gdp sav popAlbania ALB 1990 6,75179343 20,9783993 1,6Albania ALB 1991 -11,4142038 -13,0284996 -0,2Albania ALB 1992 -27,5896031 -75,4131012 -1,6Albania ALB 1993 -5,69153612 -33,6716003 -1,4Albania ALB 1994 11,1974627 -9,88263035 0,2Albania ALB 1995 9,1941036 -3,94799995 1,2Albania ALB 1996 7,55757392 -11,8118 1,3Albania ALB 1997 7,73893405 -9,25912952 1,2Albania ALB 1998 -8,06352119 -6,69585991 1,1Albania ALB 1999 -1,66910005 1,1Algeria DZA 1990 2,29575915 27,4666996 2,5Algeria DZA 1991 -3,72084675 36,6562004 2,4Algeria DZA 1992 -3,55414336 32,3755989 2,4Algeria DZA 1993 -0,79384221 27,8384991 2,3Algeria DZA 1994 -4,35723136 27,0359993 2,2Algeria DZA 1995 -3,31007521 28,4333992 2,2Algeria DZA 1996 1,59040861 31,4230003 2,2Algeria DZA 1997 1,58921549 32,1985016 2,2Algeria DZA 1998 -1,03429441 27,0669003 2,1Algeria DZA 1999 1,44857954 31,6912003 2,1

First cross-sectional unit

Time dimensionmissing

Pooled regression

• Combine both dimensions in one data set

• Neglect time and cross-sectional structure

• Run following regression with POLS/SOLS

Thereby, i...countries, t...yearsitititit epopsavgdp

Pooled regression

. reg gdp pop sav variables coefficients t-values p-values pop -1.73028 -1.95 0.055 sav 0.1766935 3.51 0.001 Adjusted R2 0.10 F-test 6.20 (0.003) Observations 95

Autocorrelation

• Now time dimension; hence, correlation among successive residuals possible

• This affects t and p-values – violates assumption E(eiteit-j)=0 for all j0

• How can we test for this problem?

• What can we do if we detect autocorrelation?

Autocorrelation

• Stata should know that the data set is a panel

• Command: tsset (i) year

• note: i=cross-section

• Normal test commands for autocorrelation do not work; hence, develop own test (several procedures!)

Test for Autocorrelation

• Run the following regression and estimate residuals

• Insert lagged residuals in regression

• Run t-test for autocorrelation coefficient• H0: =0 – if rejected autocorrelation• Note: AR(1) and assumption of strict exogenity!

itititit epopsavgdp

ititititit eepopsavgdp 1ˆ

Hint: Construction of Lags with Panel Data

• After regress command – predict r, resid

• Then construct lagged residual

– gen r1=r[_n-1]

• Problem: Panel structure; thus, replace lagged values for first year (1990 in our case) – replace r1=. if year==1990

• Note: t-value reaches 4.62!

Robust Estimation Procedure

• We estimate a so called long-run variance using the Newey-West (1987) procedure

• Estimation of variance-covariance matrix is now robust against heteroscedasticity and autocorrelation

• Command: newey2 gdp pop sav, lag(5)• Number of lags = truncation (can be

determined!)

Robust Estimation Procedure

. newey2 gdp pop sav, lag(5) variables coefficients t-values p-values constant -0.7222 -0.41 0.679 pop -1.7303 -2.28 0.025 sav 0.1767 2.62 0.010 Adjusted R2 0.10 F-test 6.20 (0.003) Observations 95

Note: point estimates are the same!

GLS Estimation Procedure

• Make assumptions regarding heteroscedasticity and autocorrelation

• Note: often called FGLS – feasible!• Command: xtgls – then different specifications

possible• Can also be used to test for specific

heteroscedasticity using log-likelihood ratio tests• Note: If structure too complicated – loss of

degrees of freedom!

GLS Estimation Procedure

. xtgls gdp sav pop, corr(ar1) panels(hetero) force variables coefficients z-values p-values constant -0.2978 -0.22 0.825 pop -0.3767 -0.76 0.450 sav 0.1012 1.82 0.068 Wald chi2 3.41 (0.182) Observations 95

Pitfalls of GLS

• Specification of form of autocorrelation and heteroscedasticity important

• If specification bad – estimates are biased• General: I would prefer this procedure for

larger samples because more parameters need to be estimated

• Can be used to test for instance panel-level heteroscedasticity!

Fixed Effects Regression

• Assumption: partial impact (slope) stays constant over time and across countries

• Different methods– Insert time dummies into regression– Insert dummies for cross-sectional units– Insert both types of dummies

• Note: Sometimes dummies are not reported if too many!

Fixed Effects Regression

Useful command: areg – you do not need to construct dummies by hand!

areg gdp sav pop, absorb(i)

areg gdp sav pop, absorb(year)

both is not possible – but use xi: reg gdp sav pop i.year i.i

variables Year dummies Country dummies Both constant -0.8954 (0.525) -3.8602 (0.106) 2.2578 (0.582) pop -1.5334 (0.099) -0.7835 (0.654) -0.6431 (0.728) sav 0.1705 (0.002) 0.2878 (0.005) 0.2710 (0.017) Adjusted R2 0.07 0.13 0.10 F-test 5.27 (0.007) 4.60 (0.013) 1.51 (0.102) Observations 95 95 95

Fixed Effects Regression:

• Joint F-tests indicate that neither time nor country dummies are relevant

• But: For a few countries dummies might be used• General: You have to estimate lots of additional

coefficients• But: Widely applied and easy to interpret• Note: Time dummies do not eliminate problems

that may arise from stochastic trends!

Random Effects Regression

• We assume the following regression

• Individual effects are random• Estimation with GLS or maximum

likelihood procedure• After estimation: Breusch-Pagan (1980) test

or likelihood ratio test whether random effects should be assumed

itiititit eupopsavgdp

Random Effects Regressionxtreg gdp pop sav, re – random effects with group variable i (countries)

Postestimation command: xttest0 – carries out a LM test (H0: Var(ui)=0)

xtreg gdp pop sav, mle – maximum likelihood estimation

Note: Likelihood ratio test is reported

variables GLS ML constant -0.9731 (0.518) -0.7222 (0.590) pop -1.7037 (0.076) -1.7303 (0.048) sav 0.1860 (0.001) 0.1767 (0.000) Wald chi2 11.54 (0.003) - LR test - 12.01 (0.003) Observations 95 95

Test whether random effects should be used LM test 0.11 (0.736) - LR test - 0.00 (1.000)

Which Procedure should we use?

• Neither fixed nor random effects are superior

• Little evidence that individual effects matter• Hence: stick to POLS/SOLS pooled

regression• Maybe: use dummies for extreme countries• Check stability of coefficients over time

(goes beyond the scope of the course!)

The Causality Issue

• Note: We assume that current saving rate and population growth rate affect GDP growth rate

• But: Possible that causality goes the other way round!

• Solution: VAR model – test for Granger causality• Result: Savings and population growth rate

Granger cause GDP growth rate and not vice versa!

Additional Issues

• Stochastic trends in panel data– Spurious regressions– Unit-root tests – panel based; thus, more

observations– First differencing or deviation from common

trends

• Long-term equilibriums and cointegration

A short introduction to applied econometrics Part D: Panel Data Analysis presented by Dipl. Volkswirt Gerhard Kling.

Documents

pooled regression slide

discussion slide

fixed effects regression

following data structure

gls estimation procedure

autocorrelation note

time dummies

panel structure