NONLINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Why Nonlinear Models? 2. CRE versus Other Approaches 3. Nonlinear Unobserved Effects Models 4. Assumptions 5. Correlated Random Effects Probit 6. CRE Tobit 7. CRE Count Models 8. Nonparametric and Flexible Parametric Approaches 1
86
Embed
IZA Summer School in Labor Economics 1. Why Nonlinear ...conference.iza.org/conference_files/SUMS_2013/... · ∙In linear models, serial dependence of idiosyncratic shocks is easily
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
NONLINEAR MODELSCorrelated Random Effects Panel Data Models
IZA Summer School in Labor EconomicsMay 13-19, 2013
Jeffrey M. WooldridgeMichigan State University
1. Why Nonlinear Models?2. CRE versus Other Approaches3. Nonlinear Unobserved Effects Models4. Assumptions5. Correlated Random Effects Probit6. CRE Tobit7. CRE Count Models8. Nonparametric and Flexible Parametric Approaches
1
1. Why Nonlinear Models?
∙ Suppose yit is binary, xit is a set of observed explanatory variables, ciis heterogeneity. We are interested in the response probability as a
function of xt,c:
pxt,c Pyit 1|xit xt,ci c.
∙ Because pxt,c is a probability, a linear model, say
pxt,c xt c,
can be a poor approximation.
2
∙ Or, suppose yit ≥ 0. An exponential model such as
Eyit|xit,ci ci expxitci ≥ 0
usually makes more sense than a linear model. Plus, we cannot use
logyit if Pyit 0 0.
∙ General idea is to use models that are logically consistent with the
nature of yit.
3
∙ Not a bad idea to start with a linear model. For example, if yit is
binary, we use an unobserved effects linear probability model estimated
by fixed effects.
∙ In comparing across models it is important not to get tripped up by
focusing on parameters. Estimating partial effects (magnitudes, not just
directions) should be the focus in most applications.
4
2. CRE versus Other Approaches
∙ CRE contains traditional random effects as a special case. Can test the
key RE assumption that heterogeneity is independent of time-varying
covariates.
∙ Conditional MLE, which is used to eliminate unobserved
heterogeneity, can be applied only in special cases. Even when it can, it
usually relies on strong independence assumptions.
∙ “Fixed Effects,” where the ci are treated as parameters to estimate,
usually suffers from an incidental parameters problem. Recent work on
adjustments for “large” T seem promising but has drawbacks.
5
FE CMLE CRE
Restricts Dci |xi? No No Yes
Incidental Parameters with Small T? Yes No No
Restricts Time Series Dependence/Heterogeneity? Yes 1 Yes 2 No
Only Special Models? No 3 Yes No
APEs Identified? Yes 4 No Yes
Unbalanced Panels? Yes Yes Yes 5
Can Estimate Dci? Yes 4 No Yes 6
1. The large T approximations, including bias adjustments, assume weak dependence and often stationarity.2. Usually conditional independence, unless estimator is inherently fully robust (linear, Poisson).3. Need at least one more time period than sources of heterogeneity.4. Subject to the incidental parameters problem.5. Subject to exchangeability restrictions.6. Usually requires conditional independence or some other restriction.
6
3. Nonlinear Unobserved Effects Models
∙ Consider an unobserved effects probit model:
Pyit 1|xit,ci xit ci, t 1, . . . ,T,
where is the standard normal cdf and xit is 1 K.
∙ Logit replaces z with z expz/1 expz.
7
∙What are the quantities of interest? In economics, usually partial
effects.
∙ For a continuous xtj, the partial effect is
∂Pyt 1|xt,c∂xtj
jxt c,
where is the standard normal pdf.
∙ This partial effect (PE) depends on the values of the all observed
covariates, and on the unobserved heterogeneity value c.
8
∙ The sign of the PE is the same as the sign of j, but we usually want
the magnitude.
∙ If we have two continuous variables, the ratio of the partial effects is
constant and equal to the ratio of coefficients:
jxt chxt c
jh
∙ The ratio still does not tell us the size of the effect of each. And what
about discrete covariates or more complicated functional forms
(quadratics, interactions)?
9
∙ Discrete changes:
xt1 c − xt
0 c,
where xt0 and xt
1 are set at different values. Again, this partial effect
depends on c (as well as the values of the covariates).
∙ Assuming we can consistently estimate , what should we do about
the unobservable c?
10
∙ General Setup: Suppose we are interested in
Eyit|xit,ci mtxit,ci,
where ci can be a vector of unobserved heterogeneity.
∙ Partial effects: If xtj is continuous, then its PE is
jxt,c ≡∂mtxt,c∂xtj
.
∙ Issues for discrete changes are similar.
11
∙ How do we account for unobserved ci? If we know enough about the
distribution of ci we can insert meaningful values for c. For example, if
c Eci, then we can compute the partial effect at the average
(PEA),
PEAjxt jxt,c ∂mtxt,c∂xtj
Of course, we need to estimate the function mt and c.
∙ If we can estimate other features of the distribution of ci we can insert
different quantiles, or a certain number of standard deviations from the
mean.
12
∙ An alternative measure is the average partial effect (APE) (or
population average effect), obtained by averaging across the
distribution of ci:
APExt Ecijxt,ci.
∙ The APE is closely related to the notion of the average structural
function (ASF) [Blundell and Powell (2003, REStud)]. The ASF is
defined as a function of xt:
ASFxt Ecimtxt,ci.
∙ Passing the derivative (with respect to xtj) through the expectation in
the ASF gives an APE.
13
∙ If
Eyit|xit,ci xit cici Normal0,c2
can show that
PEAjxt jxtAPEjxt cjxtc
where c /1 c21/2.
∙We can have PEAjxt APEjxt or PEAjxt APEjxt and the
direction of the inequality can change with xt.
14
∙ If ci is independent of xit we cannot estimate but we can estimate
the scaled vector, c.
∙ Somewhat counterintuitive, but generally the APE is identified more
often than the PEA.
∙ Example reveals that the “problem” of attenuation bias is a red
herring. If we can estimate c we can get the signs of the PEs and
relative effects. In addition, we can obtain the average partial effects.
15
∙ Important: Definitions of partial effects do not depend on whether xitis correlated with ci. xit could include contemporaneously endogenous
variables or even yi,t−1.
∙Whether we can estimate the PEs certainly does depend on what we
assume about the relationship beween ci and xit.
∙ Focus on APEs means very general analyses are available – even
nonparametric analyses.
16
∙ To summarize a partial effect as a single value, we need to deal with
the presence xt.
∙We can evaluate xt at the sample average (for each t, say, or across all
t). Or, we can average the partial effects across all i. More later.
∙ Stata has three commands, mfx, margeff, and (most recently)
margins. Latter allows PEA or APE calculations (usually).
17
Heterogeneity Distributions
∙With the CRE appoach we can, under enough assumptions, identify
and consistently estimate the parameters in a conditional distribution
Dci|wi for some observed vector wi.
∙ Let fc|w; denote the identified conditional density and let gc be
the unconditional density. Then
ĝc N−1∑i1
N
fc|wi;
is a consistent estimator of gc. See Wooldridge (2011, Economics
Letters).
18
4. Assumptions
∙ The CRE approach typically relies on three kinds of assumptions:
1. How do idiosyncratic (time-varying) shocks (which may be serially
correlated) relate to the history of covariates, xit : t 1, . . . ,T?
2. Conditional Independence (which effectively rules out serial
correlation in underlying shocks) or some other specific form of
dependence.
3. How does unobserved (time-constant) heterogeneity relate to
xit : t 1, . . . ,T?
19
Assumptions Relating xit : t 1, . . . ,T and Shocks
∙ As in linear case, we cannot get by with just specifying a model for
the contemporaneous conditional distribution, Dyit|xit,ci.
∙ For example, it is not nearly enough to just specify
Pyit 1|xit,ci xit ci.
∙ A general definition of strict exogeneity (conditional on the
heterogeneity) models is
Dyit|xi1, . . . ,xiT,ci Dyit|xit,ci.
∙ In some cases strict exogeneity in the conditional mean sense is
sufficient.
20
∙ There is a sequential exogeneity assumption, too. Dynamic models
come later.
∙ Neither strict nor sequential exogeneity allows for contemporaneous
endogeneity of one or more elements of xit, where, say, xitj is correlated
with unobserved, time-varying unobservables that affect yit.
21
Conditional Independence
∙ In linear models, serial dependence of idiosyncratic shocks is easily
dealt with, usually by “cluster robust” inference with RE or FE.
∙ Or, we can use a GLS method. In the linear case with strictly
exogenous covariates, serial correlation never results in inconsistent
estimation, even if improperly modeled.
∙ The situation is different with nonlinear models estimated by full
MLE: If independence is used it is usually needed for consistency.
∙ Conditional independence [where we condition on xi xi1, . . . ,xiT
and ci]:
Dyi1, . . . ,yiT|xi,ci Dyi1|xi,ciDyiT|xi,ci
34
∙Model for Dci|xi:
ci xi ai, ai|xi Normal0,a2.
∙ Chamberlain: Replace xi with xi xi1, . . . ,xiT.
∙ Can obtain the first three assumptions from a latent variable model:
yit 1xit ci uit 0uit|xit,ci Normal0,1Duit|xi,ci Duit|xit,ci
uit : t 1, . . . ,T independent
35
∙ Can include time dummies in xit (but omit from xi). Can also include
time-constant elements as extra controls.
∙ If 0, get the traditional random effects probit model.
∙MLE (conditional on xi) is relatively straightforward. Under the
assumption of iid normal shocks it is based on the joint distribution
Dyi1, . . . ,yiT|xi.
36
∙ In Stata:
egen x1bar mean(x1), by(id)
egen xKbar mean(xK), by(id)
xtprobit y x1 ... xK x1bar ... xKbar d2 ... dT,
re
37
∙With conditional independence we can estimate features of the
unconditional distribution of ci.
∙ For example,
c x
c2 ≡ ′ N−1∑
i1
N
xi − x′xi − x a2
∙ Can evaluate PEs at, say, the estimated mean value, say c, or look at
c kc for various k. Can plug in mean values of xt, too, or other
specific values.
38
∙ As shown in Wooldridge (2011, Economics Letters), the
unconditional heterogeneity distribution is consistently estimated as
ĝc N−1∑i1
N
c − − xi/a/a, c ∈ R
39
∙ The APEs are identified from from the average structural function,
easily estimated as
ASFxt N−1∑i1
N
xta a xia
∙ The scaled coefficients are, for example, a /1 a21/2.
∙ Take derivatives and changes with respect to xt. Can further average
out across xit to get a single APE.
∙ In Stata, margeff evaluates the heterogeneity at the mean (when the
heterogeneity is independent of the covariates) but then averages the
partial effects across the covariates.
40
∙ Conditional independence is strong, and the usual RE estimator not
known to be robust to its violation. (Contrast RE estimation of the
linear model.)
∙ If we focus on APEs, can just use a pooled probit method and
completely drop the serial independence assumption.
∙ Pooled probit estimates the scaled coefficients directly because
Pyit 1|xi Pyit 1|xit, xi xita a xia.
41
∙ In Stata, pooled probit and obtaining marginal effects are
straightforward:
egen x1bar mean(x1), by(id)
egen xKbar mean(xK), by(id)
probit y x1 ... xK x1bar ... xKbar d2 ... dT,
cluster(id)
margeff
margins, dydx(*)
42
∙ Pooled probit is inefficient relative to CRE probit.
∙We can try to get back some of the efficiency loss by using
“generalized estimating equations” (GEE), which is essentially
multivariate nonlinear least squares.
xtgee y x1 ... xK x1bar ... xKbar d2 ... dT,
fam(bin) link(probit) corr(exch) robust
∙ GEE might be more efficient than pooled probit, but there is no
guarantee. It is as robust as pooled probit.
∙ GEE is less efficient than full MLE under serial independence, but the
latter is less robust.
43
∙ As shown in Papke and Wooldridge (2008, Journal of Econometrics),
if yit is a fraction we can use either pooled probit or GEE (but not full
MLE) without any change to the estimation.
∙With 0 ≤ yit ≤ 1 we start with
Eyit|xit,ci xit ci.
∙When the heterogeneity is integrated out,
Eyit|xit, xi xita a xia.
44
∙ Now exploit that the Bernoulli distribution is in the linear exponential
family. Pooled “probit” is a now a pooled quasi-MLE. Make inference
fully robust, as before. Marginal effects calculations are unchanged.
∙ Can also use GEE with the probit response function as the mean but
in a feasible GLS estimation, where the conditional variance-covariance
matrix has constant correlations and is clearly misspecified.
45
glm y x1 ... xK x1bar ... xKbar d2 ... dT,
fam(bin) link(probit) cluster(id)
margins, dydx(*)
xtgee y x1 ... xK x1bar ... xKbar d2 ... dT,
fam(bin) link(probit) corr(exch) cluster(id)
46
EXAMPLE: Married Women’s Labor Force Participation, LFP.DTA. des lfp kids hinc
storage display valuevariable name type format label variable label-------------------------------------------------------------------------------------------------lfp byte %9.0g 1 if in labor forcekids byte %9.0g number children 18hinc float %9.0g husband’s monthly income, $
. tab period
1 through |5, each 4 |
months long | Freq. Percent Cum.-----------------------------------------------
∙Many useful embellishments. For example, we can allow
ci|xi Normal xi,a2 expxi,
and then use a version of “heteroskedastic probit” (probably pooled, but
could use full MLE under conditional independence).
∙ If use the pooled method, applies if yit is binary or fractional.
61
∙ Estimation of APEs is based on
Eyit|xit, xi 1 a2 expxi−1/2xit xi
still straightforward. For continuous xtj,
APEjxt j N−1∑i1
N
1 a2 expxi−1/2
1 a2 expxi−1/2
xta a xia
∙ See Wooldridge (2010, Chapter 15).
62
6. CRE Tobit Model
∙ Unobserved effects Tobit model for a corner at zero is
yit max0,xit ci uitDuit|xit,ci Normal0,u2
∙ Strict exogeneity conditional on ci:
Duit|xi,ci Duit|xit,ci
∙ Conditional independence: The uit : t 1, . . . ,T are independent.
63
∙Model for Dci|xi:
ci xi ai, ai|xi Normal0,a2.
∙ Joint MLE (conditional on xi) is relatively straightforward. It is based
on the joint distribution Dyi1, . . . ,yiT|xi.
64
∙ In Stata, called xttobit with the re option:
xttobit y x1 x2 ... xK x1bar ... xKbar, ll(0) re
∙ As in the probit case, we can estimate c and c2:
c x
c2 ′ N−1∑
i1
N
xi − x′xi − x a2
∙ Same estimate of heterogeneity distribution works, too.
65
∙We can evaluate the partial effects of the Tobit function,
mxt c, u2 at different values of c, including c and c kc.
∙ Take derivatives or changes with respect to xt. For a continuous
variable,
jxt c/u
∙ APEs can be estimated from the mean function for the Tobit:
ASFxt N−1∑i1
n
mxt xi, a2 u2
where mz,2 is the mean function for a Tobit.
66
∙ Take derivatives and differences with respect to elements of xt. Panel
bootstrap for inference.
∙ For a continuous xtj,
APEjxt j N−1∑i1
N
xt xi/a2 u21/2
∙ To estimate the APEs it suffices to estimate the variance of the
composite error, v2 a2 u2.
67
∙ If we drop the conditional independence assumption and allow and
serial dependence in uit then we only have the marginal distributions
Dyit|xi Dyit|xit, xi Tobitxit xi,v2
∙ So, we can apply pooled Tobit, ignoring the serial correlation, to
estimate ,,, and v2. ∙ ∙We use the previous formula for the APEs.
We cannot estimate PEAs because Eci is not identified; neither is
nor u2.
68
. use psid80_92
. des hours nwifeinc exper ch0_2 ch3_5 ch6_17
storage display valuevariable name type format label variable label-------------------------------------------------------------------------------------------------hours int %9.0g annual work hoursnwifeinc float %9.0g (faminc - wife’s labor income)/1000exper float %9.0g years of workforce experiencech0_2 byte %9.0g number of children in FU, 0-2ch3_5 byte %9.0g number of children in FU, 3-5ch6_17 byte %9.0g number of children in FU, 6-17
. sum hours nwifeinc exper ch0_2 ch3_5 ch6_17
Variable | Obs Mean Std. Dev. Min Max---------------------------------------------------------------------