3/3/2014 1 CDS M Phil Econometrics Vijayamohan CDS M Phil Econometrics Vijayamohanan Pillai N 1 3-Mar-14 CDS Mphil Econometrics Vijayamohan Limited Dependent Variable Limited Dependent Variable Models: Models: Tobit Tobit CDS M Phil Econometrics Vijayamohan 3 3-Mar-14 Introduction Introduction Limited Dependent Variable Models: Limited Dependent Variable Models: Truncation and Censoring Truncation and Censoring Maddala Maddala, G. 1983. , G. 1983. Limited Dependent and Limited Dependent and Qualitative Variables in Econometrics Qualitative Variables in Econometrics. . Cambridge University Press. Cambridge University Press. CDS M Phil Econometrics Vijayamohan 4 3-Mar-14 Truncation Truncation A A truncated distribution truncated distribution is the part of an is the part of an untruncated untruncated distribution that is above or distribution that is above or below some specified value. below some specified value. If a continuous random variable If a continuous random variable x has has pdf pdf f(x) f(x) and and a is a constant, then the density of the is a constant, then the density of the truncated RV is truncated RV is ) a x ( ob Pr ) x ( f ) a x | x ( f > = > 3-Mar-14 CDS M Phil Econometrics Vijayamohan 5 Truncated standard normal distribution for a = – 0.5, 0, and 0.5 a = a = – 0.5 0.5 a = 0 a = 0 a = 0.5 a = 0.5 CDS M Phil Econometrics Vijayamohan 6 3-Mar-14 Truncation Truncation Truncation occurs when some observations on Truncation occurs when some observations on both the dependent variable and both the dependent variable and regressors regressors are lost. are lost. For example, income may be the dependent For example, income may be the dependent variable and only low variable and only low- -income people are income people are included in the sample. included in the sample. In effect, truncation occurs when the sample In effect, truncation occurs when the sample data is drawn from a subset of a larger data is drawn from a subset of a larger population. population.
9
Embed
CDS M Phil Econometrics Limited Dependent Variable Models ... · 10/05/2011 · Limited Dependent Variable Models: Tobit CDS M Phil Econometrics Vijayamohan 3-Mar-14 3 Introduction
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
MaddalaMaddala, G. 1983. , G. 1983. Limited Dependent and Limited Dependent and Qualitative Variables in EconometricsQualitative Variables in Econometrics. . Cambridge University Press.Cambridge University Press.
CDS M Phil Econometrics Vijayamohan
43-Mar-14
TruncationTruncation
A A truncated distribution truncated distribution is the part of an is the part of an untruncateduntruncated distribution that is above or distribution that is above or below some specified value.below some specified value.
If a continuous random variable If a continuous random variable xx has has pdfpdff(x) f(x) and and aa is a constant, then the density of the is a constant, then the density of the truncated RV istruncated RV is
)ax(obPr
)x(f)ax|x(f
>=>
3-Mar-14 CDS M Phil Econometrics Vijayamohan
5
Truncated standard
normal distribution
for a = – 0.5, 0, and 0.5
a = a = –– 0.5 0.5
a = 0 a = 0
a = 0.5 a = 0.5
CDS M Phil Econometrics Vijayamohan
63-Mar-14
TruncationTruncation
Truncation occurs when some observations on Truncation occurs when some observations on both the dependent variable and both the dependent variable and regressorsregressorsare lost. are lost.
For example, income may be the dependent For example, income may be the dependent variable and only lowvariable and only low--income people are income people are included in the sample. included in the sample.
In effect, truncation occurs when the sample In effect, truncation occurs when the sample data is drawn from a subset of a larger data is drawn from a subset of a larger population.population.
3/3/2014
2
CDS M Phil Econometrics Vijayamohan
7
CensoringCensoring
3-Mar-14
censoring occurs when the value of an censoring occurs when the value of an observation is only partially known.observation is only partially known.
One of the earliest attempts to One of the earliest attempts to analyseanalyse a a statistical problem involving censored data: statistical problem involving censored data:
Daniel Bernoulli's 1766 analysis Daniel Bernoulli's 1766 analysis
of smallpox morbidity and mortality data to of smallpox morbidity and mortality data to demonstrate the efficacy of vaccination.demonstrate the efficacy of vaccination.
3-Mar-14 CDS M Phil Econometrics Vijayamohan
8
Censored Regression ModelCensored Regression Model
Censoring occurs when data on the Censoring occurs when data on the dependent variable is lost (or limited) dependent variable is lost (or limited)
but not data on the but not data on the regressorsregressors..
When the dependent variable is censored, When the dependent variable is censored, values in a certain range are all transformed values in a certain range are all transformed to (or reported as) a single value.to (or reported as) a single value.
CDS M Phil Econometrics Vijayamohan
9
Censored Regression ModelCensored Regression Model
3-Mar-14
For example, For example,
people of all income levels may be included in people of all income levels may be included in the sample, the sample,
but for some reason but for some reason
the income of highthe income of high--income people income people
may be topmay be top--coded as, say, Rs100,000. coded as, say, Rs100,000.
A defect in the sampleA defect in the sample
CDS M Phil Econometrics Vijayamohan
103-Mar-14
An ExampleAn ExampleA labor supply model estimates the relationship between hours worked by employees and characteristics of employees such as age, education and family status.
For people who are unemployed, it is not possible to observe the number of hours they would have worked had they had employment.
Still we know age, education and family status for those observations.
Another ExampleAnother Example
CDS Mphil Econometrics Vijayamohan
Suppose we are interested in finding out Suppose we are interested in finding out the amount of money a HH spends on a the amount of money a HH spends on a house in relation to sociohouse in relation to socio--economic economic variables.variables.
Many HHs may not have purchased Many HHs may not have purchased house:house:
Zero expenditure for themZero expenditure for them
CDS Mphil Econometrics Vijayamohan
3/3/2014
3
CDS M Phil Econometrics Vijayamohan
133-Mar-14
Another ExampleAnother Example
•• Suppose we are interested in studying Suppose we are interested in studying
how much an individual how much an individual desireddesired to give to to give to charity. charity.
•• For many people the amount we observe is For many people the amount we observe is zero, zero,
•• i.e. they give nothing to charity. i.e. they give nothing to charity.
•• For others, we observe the actual amount For others, we observe the actual amount they contributed. they contributed.
CDS M Phil Econometrics Vijayamohan
14
Censored & Truncated Regression ModelCensored & Truncated Regression Model
where only the value for the dependent where only the value for the dependent variable (hours of work for example) variable (hours of work for example)
is unknown is unknown
while the value of the independent variable while the value of the independent variable
(age, education, family status) (age, education, family status)
is still available is still available
TobitTobit ModelModel
Censored regression ? Censored regression ? oror
Truncated regression ?Truncated regression ?
3-Mar-14 CDS M Phil Econometrics Vijayamohan
16
CDS Mphil Econometrics Vijayamohan
Original Original TobitTobit model suggested by model suggested by James Tobin (1918 James Tobin (1918 –– 2002)2002)
3-Mar-14 CDS M Phil Econometrics Vijayamohan
18
3/3/2014
4
CDS M Phil Econometrics Vijayamohan
193-Mar-14
Some examples in the empirical literatureSome examples in the empirical literature
Analyze a dependent variable that is zero for a Analyze a dependent variable that is zero for a significant fraction of the observations.significant fraction of the observations.
CDS M Phil Econometrics Vijayamohan
203-Mar-14
TobitTobit ModelModel
•• The structural equation in the The structural equation in the TobitTobit model is:model is:
•• where where uuii ∼∼ N(0, N(0, σσ22))
y*y* is a latent variable that is observed for is a latent variable that is observed for
values greater than values greater than ττ and and
censored otherwise.censored otherwise.
ii*i uxy +β=
CDS M Phil Econometrics Vijayamohan
213-Mar-14
TobitTobit ModelModel
•• The observed y is defined by the following The observed y is defined by the following measurement equationmeasurement equation
y*, if y* > y*, if y* > ττ
ττyy, if y* , if y* ≤≤ ττyyii = =
ii*i uxy +β= •• In the typical In the typical TobitTobit model, model,
•• we assume that we assume that ττ = 0 = 0 •• i.e. the data are censored at 0.i.e. the data are censored at 0.
•• Thus, we haveThus, we have
CDS Mphil Econometrics Vijayamohan
yyii = = y*, if y* > 0
0, if y* ≤ 0
TobitTobit ModelModel
CDS M Phil Econometrics Vijayamohan
233-Mar-14
TobitTobit ModelModel
This model contains This model contains a a ProbitProbit model model
for for yyii being zero or positive being zero or positive
and a and a standard Regression model standard Regression model
for the positive values of for the positive values of yyii. .
yyii = = y*, if y* > 0
0, if y* ≤ 0
( )2i ,0N~u σ
ii*i uxy +β=
CDS M Phil Econometrics Vijayamohan
243-Mar-14
TobitTobit ModelModel
The Probit model may, for example,
describe the influence of explanatory variables on the decision
whether or not to donate to charity,
while
the Regression model measures
the effect of the explanatory variables
on the size of the amount for
donating individuals.
3/3/2014
5
CDS M Phil Econometrics Vijayamohan
253-Mar-14
TobitTobit ModelModel
• Why Use the Tobit Model?
• Why not just use the observations for which y > 0 and estimate the model using OLS?
• The answer:
• if you do, your parameter estimates will be biased and inconsistent.
• The degree of bias will also increase as the number of observations that take on the value of zero increases.
CDS M Phil Econometrics Vijayamohan
263-Mar-14
( )2i ,0N~u σ
yyii = =
yi*, if
0, if yi* ≤ 0
> 0
Neglecting the truncation can lead to biased estimates of α and β
=> ]x,0y|y[E iii
Why Use the Why Use the TobitTobit Model?Model?
]/)x[(
]/)x[(x
i
ii σβΦ
σβφσ+β
ii*i uxy +β=
CDS M Phil Econometrics Vijayamohan
273-Mar-14
The last term on the RHS [The last term on the RHS [ σλσλ((αα)) ] :] :
the inverse Mills ratio / hazard function the inverse Mills ratio / hazard function
for the std N distribution.for the std N distribution.
φφ = = pdfpdf and and ΦΦ = = cdfcdf: p(: p(yyii > 0)> 0)
Why Use the Why Use the TobitTobit Model?Model?
E[y | truncation] = µ + σλ(α)
]/)x[(
]/)x[(x
i
ii σβΦ
σβφσ+β=> ]x,0y|y[E iii
CDS M Phil Econometrics Vijayamohan
283-Mar-14
Inverse Mills RatioNamed after John P. Mills, Named after John P. Mills,
The ratio of the Probability Density Function over theThe ratio of the Probability Density Function over the
Cumulative Distribution Function of a distribution.Cumulative Distribution Function of a distribution.
If If xx is a random variable distributed normallyis a random variable distributed normally
with mean with mean µµ and variance and variance σσ22, then, then
where where αα is a constant, is a constant,
ϕϕ denotes the standard normal denotes the standard normal pdfpdf, and , and
ΦΦ is the standard normal is the standard normal cdfcdf..
σµ−αΦ
σµ−αϕ
σ+µ=α> ]x|x[E
)z(
)z(
Φϕσ+µ=
CDS M Phil Econometrics Vijayamohan
29
Why Use the Why Use the TobitTobit Model?Model?
3-Mar-14
• Consider for example,
• the amount a person gives to charity.
• Suppose the true relationship between the amount a person wantswants to give to charity and that person’s income is
→
CDS Mphil Econometrics Vijayamohan
Why Use the Why Use the TobitTobit Model?Model?
3/3/2014
6
CDS M Phil Econometrics Vijayamohan
313-Mar-14
Why Use the Why Use the TobitTobit Model?Model?
The lower income people would actually like The lower income people would actually like to give negative amounts to give negative amounts (i.e. get money back!). (i.e. get money back!).
The red line indicates the true regression line for the relationship between income and donations
CDS M Phil Econometrics Vijayamohan
323-Mar-14
In reality, we do not observe individuals In reality, we do not observe individuals making negative contributions. making negative contributions.
The observed data looks like this:The observed data looks like this:
What we observe is What we observe is they give nothing. they give nothing.
Why Use the Why Use the TobitTobit Model?Model?
CDS M Phil Econometrics Vijayamohan
333-Mar-14
Why Use the Why Use the TobitTobit Model?Model?
If we simply estimated the model by OLS,
the parameter estimates would be biased downwards.True relationship
OLS regression line
CDS M Phil Econometrics Vijayamohan
343-Mar-14
Why Use the Why Use the TobitTobit Model?Model?
OLS tends to underestimate the magnitude of the slope.
the parameter estimates would be biased downwards.
True relationship
OLS regression line
A bit more complex than interpreting estimated coefficients from the OLS model.
In particular, the estimated coefficients represent the marginal effect of x on y*.
That is :
marginal effect of x on the latent variable y* not on the observed variable y.
Method of maximum likelihoodMethod of maximum likelihood
Olsen’s (1978) Olsen’s (1978) reparameterizationreparameterization simplifies ML simplifies ML estimation.estimation.
James Heckman has proposed a simple alternative to James Heckman has proposed a simple alternative to the ML method:the ML method:
J. J. Heckman, “Sample Selection Bias as a J. J. Heckman, “Sample Selection Bias as a Specification Error,” Specification Error,” EconometricaEconometrica, vol. 47, pp. , vol. 47, pp. 153153––161.161.
EstimationEstimation
CDS M Phil Econometrics Vijayamohan
403-Mar-14
Heckman AlternativeHeckman Alternative
Consists of a twoConsists of a two--step estimating procedure:step estimating procedure:
Step 1: estimate the probability of, say, Step 1: estimate the probability of, say,
a consumer owning a house, a consumer owning a house,
on the basis of the on the basis of the probitprobit model. model.
CDS M Phil Econometrics Vijayamohan
413-Mar-14
Heckman AlternativeHeckman Alternative
Step 2: estimate the model
by adding to it the inverse Mills ratio or the hazard rate that is derived from the probit estimate.
yyii = = yi*, if
0, if yi* ≤ 0
> 0
IMRIMR
=> ]x,0y|y[E iii]/)x[(
]/)x[(x
i
ii σβΦ
σβφσ+β
ii*i uxy +β=
CDS M Phil Econometrics Vijayamohan
423-Mar-14
Heckman AlternativeHeckman Alternative
The Heckman procedure yieldsThe Heckman procedure yields
consistent estimates of the parameters, but consistent estimates of the parameters, but
they are not as efficientthey are not as efficient
as the ML estimates.as the ML estimates.
An Example of An Example of TobitTobitmodel: model:
3/3/2014
8
CDS M Phil Econometrics Vijayamohan
433-Mar-14 CDS M Phil Econometrics Vijayamohan
443-Mar-14
CDS M Phil Econometrics Vijayamohan
453-Mar-14
05
1015
Ext
ram
arita
l
10 15 20Education
451 observations lying along the horizontal axis. 451 observations lying along the horizontal axis. ⇒⇒ a censored sample, a censored sample,
⇒⇒ a a tobittobit model may be appropriate.model may be appropriate.