3/3/2014 1 CDS M Phil Econometrics Vijayamohan CDS M Phil Econometrics Vijayamohanan Pillai N 1 3-Mar-14 CDS Mphil Econometrics Vijayamohan Dummy variable Models Dummy variable Models CDS Mphil Econometrics Vijayamohan Dummy X Dummy X- -variables variables Dummy Dummy Y Y- -variables variables CDS Mphil Econometrics Vijayamohan Dummy X Dummy X- -variables variables CDS M Phil Econometrics Vijayamohan 5 3-Mar-14 Dummy X Dummy X-variables variables Dummy variable: Dummy variable: variable assuming values 0 and 1 to indicate variable assuming values 0 and 1 to indicate some attributes some attributes To classify data into mutually exclusive To classify data into mutually exclusive categories categories Also called: Also called: indicator variable, binary variable, indicator variable, binary variable, dichotomous variable, categorical variable, dichotomous variable, categorical variable, qualitative variable qualitative variable 3-Mar-14 CDS M Phil Econometrics Vijayamohan 6 Dummy X Dummy X-variables variables Y i = = α α + + β βD i + + u i Y i = Wage rate of an agricultural = Wage rate of an agricultural labourer labourer D i = 1, if male worker = 1, if male worker 0, otherwise. 0, otherwise. Mean wage of a male agri. worker? Mean wage of a male agri. worker? E(Y E(Y i | D | D i = 1) = = 1) = α α + + β β
14
Embed
Dummy X -variables Dummy Y-variables - WordPress.com · 2011-05-09 · Dummy Y -variables Discrete Choice Models Many situations in which the dependent variable is not a continuous
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
3/3/2014
1
CDS M Phil Econometrics Vijayamohan
CDS M Phil Econometrics
Vijayamohanan Pillai N
13-Mar-14 CDS Mphil Econometrics Vijayamohan
Dummy variable ModelsDummy variable Models
CDS Mphil Econometrics Vijayamohan
Dummy XDummy X--variablesvariables
Dummy Dummy YY--variablesvariables
CDS Mphil Econometrics Vijayamohan
Dummy XDummy X--variablesvariables
CDS M Phil Econometrics Vijayamohan
53-Mar-14
Dummy XDummy X--variablesvariables
Dummy variable: Dummy variable:
variable assuming values 0 and 1 to indicate variable assuming values 0 and 1 to indicate
some attributes some attributes
To classify data into mutually exclusive To classify data into mutually exclusive
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
Mean wage of a male agri. worker?Mean wage of a male agri. worker?
E(YE(Yii | D| Dii = 1) = = 1) = αα + + ββ
3/3/2014
2
3-Mar-14 CDS M Phil Econometrics Vijayamohan
7
Dummy XDummy X--variablesvariables
YYii = = αα + + ββDDii + + uuii
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
Mean wage of a female agri. worker?Mean wage of a female agri. worker?
E(YE(Yii | D| Dii = 0) = = 0) = αα
3-Mar-14 CDS M Phil Econometrics Vijayamohan
8
Dummy XDummy X--variablesvariables
YYii = = αα + + ββDDii + + uuii
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
HH00: no sex discrimination : no sex discrimination ⇒⇒
HH00: : ββ= 0.= 0.
3-Mar-14 CDS M Phil Econometrics Vijayamohan
9
Dummy XDummy X--variablesvariables
YYii = = αα + + ββDDii + + uuii
YYii = Wage rate of an agricultural = Wage rate of an agricultural labourerlabourer
DDii = 1, if male worker= 1, if male worker0, otherwise.0, otherwise.
Analysis of Variance (ANOVA) Model:Analysis of Variance (ANOVA) Model:
Mean difference testMean difference test
CDS M Phil Econometrics Vijayamohan
103-Mar-14
In eco applications, In eco applications, control for other sociocontrol for other socio--eco factors:eco factors:caste, nature of work, experience, caste, nature of work, experience, …………Both quantitative and qualitative Both quantitative and qualitative variables:variables:Analysis of Covariance (ANCOVA) Analysis of Covariance (ANCOVA) ModelModel
Dummy XDummy X--variablesvariables
CDS M Phil Econometrics Vijayamohan
113-Mar-14
YYii = = αα00 ++ αα11DDii ++ββXXii ++ uuii
Mean wage of a female agri. worker?Mean wage of a female agri. worker?
E(YE(Yii | D| Dii = 0) = = 0) = αα00 + + ββXXii
Dummy XDummy X--variablesvariables
Mean wage of a male agri. worker?Mean wage of a male agri. worker?
DDii = 1, if male worker= 1, if male worker= 0, otherwise.= 0, otherwise.Interaction termInteraction term
CDS M Phil Econometrics Vijayamohan
163-Mar-14
Dummy XDummy X--variablesvariablesA
gri
cu
ltu
ral
wag
e r
ate
Ag
ricu
ltu
ral
wag
e r
ate
XX
αα00
ββ11ββ11+ + ββ22
αα11
((αα00++αα11))
CDS Mphil Econometrics Vijayamohan
Dummy YDummy Y--variablesvariables
Discrete Choice ModelsDiscrete Choice Models Many situations Many situations in in which which the the dependent variable dependent variable is is not a continuous variablenot a continuous variable..
Discrete Discrete or, or, qualitativequalitative
CDS Mphil Econometrics Vijayamohan
Dummy YDummy Y--variablevariable
3/3/2014
4
CDS M Phil Econometrics Vijayamohan
193-Mar-14
In general 2 types:In general 2 types:
1.1. dependent variables which take one dependent variables which take one of two values of two values (binary/ dichotomous choice), and(binary/ dichotomous choice), and
2. dependent variables which can 2. dependent variables which can take more than two values but are take more than two values but are finite (finite (polychotomouspolychotomous; multiple ; multiple choice).choice).
Dummy YDummy Y--variablevariable
Choose A Don’t Choose A
Individual i
To be Not to be
Binary Choice Model
Binary Choice Model
Choose A Don’t Choose A
Individual i
By car Not
Individual i
Alternatives J
2cycle
3car …
1walk bus train
Multinomial Choice Model
ExamplesExamples
•• Labour Force Labour Force Participation:Participation:
-- occupational choice (multiple choice)occupational choice (multiple choice)-- employed or unemployed (binary choice)employed or unemployed (binary choice)-- to be employed fullto be employed full--time, parttime, part--time or time or
-- to vote or not to vote (binary choice)to vote or not to vote (binary choice)-- to vote to vote Congress, BJP, Communists, Congress, BJP, Communists, or or
We We focus on single equation binary outcomes:focus on single equation binary outcomes:
A fundamental difference between a quantitative A fundamental difference between a quantitative response model response model and and a qualitative response a qualitative response model:model:
The latter The latter is is a a probability model. probability model.
{ }10 ,∈iy
3-Mar-14 CDS M Phil Econometrics Vijayamohan
29
Linear Probability ModelLinear Probability Model
In general:In general:
ProbProb(event (event j occursj occurs) ) = P(Y = j= P(Y = j) ) = f (relevant variables; = f (relevant variables; parametersparameters))= f(x= f(x ii, , ββ))
where [ ]ik1ii x,...,xx =
are the variables and β is a vector of parameters.
;),x(fP)x1y(P iiii β===
;),x(f1P1)x0y(P iiii β−=−==
CDS Mphil Econometrics Vijayamohan
Given Given
yi follows Bernoulli probability distribution
{ }10 ,∈iy
Linear Probability ModelLinear Probability Model
3/3/2014
6
CDS M Phil Econometrics Vijayamohan
313-Mar-14
How do we How do we specify f(xspecify f(xii,,ββ)?)?
•• Linear Probability ModelLinear Probability Model
An obvious choice is the familiar least squares procedure:
β=β ii x),x(f ⇒ iii uxy +β=
This leads to the linear probability model (LPM).
)1y(P1)0y(P0)xy(E iiii =⋅+=⋅=
β=== iiiii x)x1y(P)xy(E
Conditional expectation = conditional probability
The regression eqn describes the probability that yi = 1 given information on xi
CDS Mphil Econometrics Vijayamohan
Assuming E(u) = 0, it follows that
Linear Probability ModelLinear Probability Model
XXk
1
0
β1 +β2Xk
y, P
β1
The probability of the event occurring, p, is assumed to be a linear function of the variable X.
CDS Mphil Econometrics Vijayamohan
The case of a single explanatory variable:
yi = β1 +β2 Xi
Linear Probability ModelLinear Probability Model
Now an example….Now an example….CDS Mphil Econometrics Vijayamohan
group 601 1 5 3.932 1.103 -0.836 0.100 -0.204 0.199
Interpretation ?
LPM LPM –– Ray Fair ModelRay Fair Model
3/3/2014
7
CDS Mphil Econometrics Vijayamohan
LPM LPM –– Ray Fair ModelRay Fair Model
CDS Mphil Econometrics Vijayamohan
UnstandardizedStandar-
dized
Beta
t Sig.
B Std. Error
(Constant) 0.736 0.152 4.859 0.000
Sex 0.045 0.040 0.052 1.129 0.259
Age -0.007 0.003 -0.159 -2.463 0.014
Married years 0.016 0.005 0.206 2.911 0.004
Have children 0.054 0.047 0.057 1.168 0.243
Religiocity -0.054 0.015 -0.145 -3.608 0.000
Education 0.003 0.009 0.017 0.360 0.719
Occupation 0.006 0.012 0.025 0.499 0.618
Marriage rating -0.087 0.016 -0.223 -5.472 0.000
Dependent Variable: Extramarital affairs
Interpretation ?
LPM LPM –– Ray Fair ModelRay Fair Model
CDS Mphil Econometrics Vijayamohan
Chap 13-39
LPM LPM –– Ray Fair ModelRay Fair Model
CDS Mphil Econometrics Vijayamohan
sex Ageyears
married childrenreligiou
sEducat-
ionOccup-ation
marriagrating
PredProb
1 57 15 1 1 20 7 1 0.614
Predicted Probabilities
LPM LPM –– Ray Fair ModelRay Fair Model
0 57 15 1 1 20 7 1 0.569
1 57 15 1 1 20 7 5 0.2650 57 15 1 1 20 7 5 0.219
CDS Mphil Econometrics Vijayamohan
sex Ageyears
married childrenreligiou
sEducat-
ionOccup-ation
marriagrating
PredProb
1 27 4 0 5 9 1 5 -0.0270 27 4 0 5 9 1 5 -0.072
Predicted Probabilities
NegativeNegative probability !probability !
LPM LPM –– Ray Fair ModelRay Fair ModelSome Some serious shortcomings.serious shortcomings.
(i)(i) The The distribution of the disturbance is nondistribution of the disturbance is non--normalnormal..
As y As y can can take only one take only one of two of two values, the values, the error term also error term also has a discrete (nonhas a discrete (non--normal) distributionnormal) distribution..
β−= ixyuThe probability distribution of u is:
β−=⇒= ii x1u1y
β−=⇒= ii xu0y
In effect, u follows a Bernoulli distribution
CDS Mphil Econometrics Vijayamohan
The Linear Probability Model
with Pi
with (1 – Pi)
3/3/2014
8
CDS Mphil Econometrics Vijayamohan
Normal PP Plot of Regression Standardized Residuals
Histogram
LPM LPM –– Ray Fair ModelRay Fair Model
CDS M Phil Econometrics Vijayamohan
443-Mar-14
The Linear Probability Model
((ii) the ii) the error term is error term is heteroskedasticheteroskedastic
)x1)(x()u(Var ii β−β=
Which clearly varies with the value of xWhich clearly varies with the value of xii
Some serious shortcomings.Some serious shortcomings.
CDS Mphil Econometrics Vijayamohan
Chap 13-45
LPM LPM –– Ray Fair ModelRay Fair Model
CDS M Phil Econometrics Vijayamohan
463-Mar-14
these these problems problems not insurmountablenot insurmountable::
•• Problem of nonProblem of non--normality can be normality can be
circumvented provided we have a large circumvented provided we have a large
sample size (invoke the central limit theorem)sample size (invoke the central limit theorem)
•• Problem of Problem of heteroskedasticityheteroskedasticity can be can be
removed by using White’s removed by using White’s heteroskedasticheteroskedastic
standard errorsstandard errors
The Linear Probability Model
CDS M Phil Econometrics Vijayamohan
473-Mar-14
The Linear Probability ModelThe Linear Probability Model
(iii) The main problem (iii) The main problem isis
the the NonNon--fulfilment of 0 fulfilment of 0 ≤≤ E(YE(Yii) ) ≤≤ 11
There is no There is no guaranteeguarantee that the predicted values that the predicted values of Y will all lie between 0 and 1. of Y will all lie between 0 and 1.
NegativeNegativeprobability !probability !
sex Ageyears
married childrenreligiou
sEducat-
ionOccup-ation
marriagrating
PredProb
1 27 4 0 5 9 1 5 -0.0270 27 4 0 5 9 1 5 -0.072
CDS Mphil Econometrics Vijayamohan
The Linear Probability ModelThe Linear Probability Model
What we require therefore is a way of constraining the LPM constraining the LPM so that the predicted probabilities do lie in the [0,1] range.
In general we use alternative estimation models to do this.
3/3/2014
9
CDS M Phil Econometrics Vijayamohan
493-Mar-14
The SolutionThe Solution
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
The usual way of avoiding this problem is to hypothesize that
the probability is a sigmoid (S-shaped) function of Z, F(Z),
where Z is a function of the explanatory variables.
Several mathematical functions are sigmoid in character.
The SolutionThe Solution
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
CDS Mphil Econometrics Vijayamohan
Alternatives to Alternatives to
The Linear Probability ModelThe Linear Probability Model
CDS M Phil Econometrics Vijayamohan
523-Mar-14
Alternatives Alternatives
• The distribution
– Normal: PROBIT, natural for behavior
– Logistic: LOGIT, allows “thicker tails”
– Gompertz: asymmetric, underlies the
basic logit model for multiple choice
Underlying Probability Distributions for Binary ChoiceUnderlying Probability Distributions for Binary Choice
CDS Mphil Econometrics Vijayamohan
3 March 2014 Vijayamohan CDS 54
The The LogitLogit ModelModel
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
Z
Z
Z e1
e
e1
1)Z(F
+=
+= −
β= ixZ where
Several mathematical functions are sigmoid in Several mathematical functions are sigmoid in character. character.
One One is the logistic is the logistic function. function.
)(ZF
Z
3/3/2014
10
3 March 2014 Vijayamohan CDS 55
The The LogitLogit ModelModel
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6
Z
Z
Z e1
e
e1
1)Z(F
+=
+= −
β= ixZ where
As As Z Z →→ ∞∞, e, e-- ZZ →→ 0 and p 0 and p →→ 1 1
(but cannot exceed 1).
As As Z Z →→ –– ∞∞, e, e-- ZZ →→ ∞∞ and and p p →→ 0 0
(but cannot be below 0).
)(ZF
Z
CDS M Phil Econometrics Vijayamohan
563-Mar-14
Normal distribution vs. Normal distribution vs. Logistic distributionLogistic distribution
CDS M Phil Econometrics Vijayamohan
573-Mar-14
Logistic distributionLogistic distributionThe Logistic distribution has density function: The Logistic distribution has density function:
wherewhere
aa is the mean of the distributionis the mean of the distribution
bb is the scale parameteris the scale parameter
ee is the base of the natural logarithm, Euler's e is the base of the natural logarithm, Euler's e
(2.71...) (2.71...)
2b/)az(
b/)az(
)e1(
e)b/1()z(f −−
−−
+=
Here a = 0; b = 1, 2, and 3
CDS M Phil Econometrics Vijayamohan
583-Mar-14
Logistic distributionLogistic distributionWith a = 0 and b = 1, the Logistic distribution has density
function:
Integrating the pdf gives the distribution function:
2z
z
)e1(
e)z(f −
−
+= –∞ < z < ∞
ze1
1)z(F −+
= –∞ < z < ∞
Here a = 0; b = 1, 2, and 3
Z
)(ZF
CDS Mphil Econometrics Vijayamohan
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6Z
Z
Z e1
e
e1
1
+=
+= −
β= ixZ where
PPii = E(y = 1|X= E(y = 1|Xii) = F(Z)) = F(Z)
;e1
1P1
zi+
=−
zZ
z
i
i ee1
e1
P1
P=
++=
− −
;e1
1P
zi −+=
Odds ratioOdds ratio
The Logit Model: Odds Ratio
3 March 2014 Vijayamohan CDS 60
The Logit Model: Odds Ratio
)(ZF
0.00
0.25
0.50
0.75
1.00
-8 -6 -4 -2 0 2 4 6 Z
zZ
z
i
i ee1
e1
P1
P=
++
=− −
Odds ratio
β= ixZ where
Taking log of the odds ratio,Taking log of the odds ratio,β==
Only 4 variables significantly different from zero Only 4 variables significantly different from zero at at αα = 0.05 = 0.05
3/3/2014
12
CDS Mphil Econometrics Vijayamohan
Logit – Ray Fair ModelB S.E. Wald Sig. Exp(B)
Sex 0.280 0.239 1.374 0.241 1.324
Age -0.044 0.018 5.881 0.015 0.957
Married years 0.095 0.032 8.655 0.003 1.099
Have children 0.398 0.292 1.861 0.173 1.488
Religiocity -0.325 0.090 13.089 0.000 0.723
Education 0.021 0.051 0.174 0.677 1.021
Occupation 0.031 0.072 0.186 0.667 1.031
Marriage rating -0.468 0.091 26.555 0.000 0.626
Constant 1.377 0.888 2.407 0.121 3.964
If B > 0, OR > 1; If B > 0, OR > 1; If B < 0, OR < 1. If B < 0, OR < 1. If B = 0, odds unchanged If B = 0, odds unchanged
z
i
i eP1
P=
−
Exp(Bi) = odds ratio =
Factor by which the odds change when the ithindependent variable ↑ by one unit.
e.g., When No. of years married e.g., When No. of years married ↑↑ by 1 unitby 1 unit, log , log of odds of odds for affairs for affairs ↑↑ by 1.099 or 9.9%, ceteris paribus. by 1.099 or 9.9%, ceteris paribus.