Top Banner

of 31

The econometrics of inequality and poverty

Apr 06, 2018

Download

Documents

baltazak36
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 The econometrics of inequality and poverty

    1/31

    The econometrics of inequality and poverty

    Lecture 6: Modeling the income distribution

    Michel Lubrano

    January 2010

    Contents

    1 Type of survey samples 21.1 Random samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Using weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Stratified samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4 Two stage sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.5 Grouped data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.6 IID samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Natural estimators and resampling methods 5

    2.1 The use of order statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Jacknife and bootstraping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    3 Non parametric estimation of densities 7

    3.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    3.2 Estimation par noyau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    4 Proprietes dechantillonage 11

    4.1 Assumptions and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    4.2 Biais et variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    4.3 Approximations du biais et de la variance . . . . . . . . . . . . . . . . . . . . . 12

    4.4 Determination de la fenetre et du noyau ideaux . . . . . . . . . . . . . . . . . . 13

    5 Choix de la fenetre 145.1 Choix subjectifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    5.2 Reference a une distribution connue . . . . . . . . . . . . . . . . . . . . . . . . 15

    5.3 Cross validation sur la vraisemblance . . . . . . . . . . . . . . . . . . . . . . . 16

    5.4 Least squares cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    1

  • 8/3/2019 The econometrics of inequality and poverty

    2/31

    5.5 Density estimation with weighted samples . . . . . . . . . . . . . . . . . . . . . 17

    5.6 Using R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    6 General estimation methods 18

    6.1 Inference for grouped data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    6.2 Inference for Pareto IID samples . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    6.3 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.4 Graphical and regression methods . . . . . . . . . . . . . . . . . . . . . . . . . 20

    6.5 Using R for Pareto fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    6.6 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    6.7 Inference for the Lognormal process . . . . . . . . . . . . . . . . . . . . . . . . 24

    6.8 Using R to compare Pareto and Lognormal . . . . . . . . . . . . . . . . . . . . 25

    7 Using mixtures for IID samples 26

    7.1 Informal introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    7.2 Mixture of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    7.3 Estimation procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    7.4 Difficulties of estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297.5 Estimating mixture in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    1 Type of survey samples

    Please read the first chapter of Deaton (1997).

    The data we are interested are survey data concerning households. Many type of information

    can be asked to household such as unemployment, wages, education, health status. Here we are

    mainly concerned with income and sometime consumption. We have a finite population of size

    N. We want to draw a sample of a smaller size n from that population. How can we proceed?The design of a survey has to follow precise rules. We want to get information on a population

    and it is too costly to ask the entire population every year. A census occurs at most every five

    years and gives information on the whole population. The coverage of the population is usually

    not complete: homeless people, armed forces,...

    1.1 Random samples

    A survey has to be framed, which means that we have to know the size and composition of the

    true population. A census is useful to frame a survey, other administrative data can be used too.

    The census for instance provide a list of households to sample.Then we have to decide about the size n of the survey. The sample survey is then drawn at

    random. The sample mean

    x =1

    n

    ni

    xi

    2

  • 8/3/2019 The econometrics of inequality and poverty

    3/31

  • 8/3/2019 The econometrics of inequality and poverty

    4/31

    where xs is the estimated mean for each strata. In each strata, we can of course have a particularweighting scheme which is superimposed to the stratification. Stratification often improves the

    representativeness of the sample by reducing sampling error. It can produce a weighted mean

    that has less variability than the arithmetic mean of a simple random sample of the population.

    In fact

    Var(x) =S

    s=1(Ns

    N)2Var(x

    s)

    because the strata are independent. It can be shown that this variance is lower than the variance

    of

    xsrs =S

    s=1

    nsn

    xs.

    1.4 Two stage sampling

    Within each strata, most household survey collect their data in two stages, first sampling clusters,

    and then selecting household within each cluster. This has the advantage of diminishing the cost

    of interviews which are done in the same cluster or village, of easing the re-interview. Butobservation within a cluster are correlated. So the information collected might be less precise.

    The mean is computed in the same way using weights. But it variance might be greater than in

    the one stage sampling.

    1.5 Grouped data

    Survey data report private information on households. These data are politically sensitive de-

    pending on their content. For instance, there are in France questionings about the use of racial

    information to study discrimination. In Belgium, it is forbidden to ask question on the language

    used at home (French or Flemish). So for a long time, these data were simply not available.

    Researcher had access to data that were so aggregated, that they were presented in groups. Thetreatment of these grouped data needed special tools and estimation techniques. For instance,

    Singh and Maddala or McDonald use grouped data for the US income. We reproduce here these

    data: We have percentages summing 100% in all the columns with dates. The first column repre-

    sent the end of class for each group. It is presumably in thousands dollars per year per household.

    This lead to an histogram, as we shall explain below.

    1.6 IID samples

    When the statistician is lucky, the data come from an IID sample, like macroeconomic data.

    These samples are much easier to analyse. A mean is computed in the usual way. Parametricdensities can be estimated using maximum likelihood.

    4

  • 8/3/2019 The econometrics of inequality and poverty

    5/31

    Table 1: US Data on income

    Endpoints 1970 1975 1980

    2.5 6.6 3.5 2.1

    5.0 12.5 8.5 4.1

    7.5 15.2 10.6 6.2

    10.0 16.6 10.6 6.512.5 15.8 11.4 7.3

    15.0 11.0 10.9 6.9

    20.0 13.1 18.8 14.0

    25.0 4.6 11.6 13.7

    35.0 3.0 9.5 19.8

    50.0 1.1 3.2 12.8

    0.5 1.4 6.7

    2 Natural estimators and resampling methods

    Order stat

    Estimation of F(x)

    Estimation of L(j/n)

    Estimation of Poverty deficit curves: to be completed for stochastic dominance

    Bootstrapping

    2.1 The use of order statistics

    Les premieres techniques destimation que lon va presenter ici sont relativement simples. Elles

    reposent sur le fait que lon puisse ordonner les observations et donc calculer ce que lon appelleles statistiques dordre. Supposons donc que les observations de X soient classees par ordrecroissant et que lon note ce classement

    x(1) x(2) x(n). (1)x(1) represente la plus petite observation et x(n) la plus grande. Il devient facile dans ce cadredestimer de facon naturelle une distribution et ses quantiles. En effet, un distribution se definit

    comme F(x) = Prob(X < x). Elle peut sapproximer au moyen de

    Prob(X x(i)) i/n (2)

    quand on dispose de suffisamment dobservations. Le premier decile de cette distribution corre-spond a la valeur x0.10 telle que Prob(X x0.10) = 0.10. Il suffira alors de trouver lobservationdont le rang i correspondra a peu pres a i/n = 0.10 dans la suite ordonnee des X. Dune manieregenerale, si lon note Q(p) le quantile dordre p, celui-ci sestime comme

    Q(p) = x(s) s 1 np s. (3)

    5

  • 8/3/2019 The econometrics of inequality and poverty

    6/31

    Lestimation des quantiles permet par exemple de calculer une mesure de dispersion qui est

    lecart interdecile (x0.90 x0.10)/x0.50.A partir de ces memes statistiques dordre, on peut definir un estimateur pour la courbe de

    Lorenz generalisee en tracant

    L(p = i/n) =1

    n

    i

    j=1x(j). (4)

    On a utilise ici les sommes partielles des statistiques dordre. La courbe de Lorenz sobtient en

    normalisant par la moyenne. Enfin, on peut estimer le coefficient de Gini au moyen dune simple

    somme et non dune double somme comme dans la definition originale in term of expected

    absolute difference divided by twice the mean:

    IG =2

    n(n 1)

    i

    i x(i) n + 1n 1 . (5)

    Ce type de calcul peut egalement sappliquer pour calculer lindice de pauvete de Sen-Schorrocks-

    Thon:

    ISS T =1

    n2

    qi=1

    (2n 2i + 1)z

    y(i)z .

    ou q correspond au rang de z dans la distribution de X.

    2.2 Jacknife and bootstraping

    Thus we have simple estimators, but we do not know all the time how to compute standard

    deviations. For instance it was rather easy to compute the variance of the mean. But the variance

    of the mode is much more difficult to establish, especially when the sampling design is more

    complex. The bootstrap and eventually the jacknife are tow methods for assessing sampling

    variability.

    Two sources of randomness

    1. We have samples from a finite population. We must know the sample design, which can be

    quite complicated in order to appreciate the source of randomness. Not always easy. For

    instance N might not be known precisely.

    2. There are errors of observations, or simply the nature of the variable which is observed is

    random as it results from decision making under uncertainty.

    Two types of methods were designed in the literature

    1. The jacknife

    2. The bootstrap

    These are resampling techniques. There are packages in R for bootstraping. The bootstrap re-

    sample with replacement n data from the original sample. The jacknife provides n 1 samples

    6

  • 8/3/2019 The econometrics of inequality and poverty

    7/31

    by eliminating one observation each time of the original sample. With each technique, the statis-

    tics for which we want to compute a variance is evaluated for each bootstrap or jacknife sample.

    The bootstrap is available in R. We must first call the library boot. Then define a function

    with two arguments: the fist argument represents the original data, the second argument indicates

    the weights of the bootstraping generated by the package. Here we have given an example with

    the Gini coefficient.

    library(boot,Gini)

    r

  • 8/3/2019 The econometrics of inequality and poverty

    8/31

    Distribution of the Gini

    r$t

    Density

    0.250 0.255 0.260 0.265

    0

    50

    100

    150

    Figure 1: Bootstraping the Gini

    hist(y79,breaks=50)

    where y79 is the FES data for 1979. The corresponding graph is given in Figure 2.

    3.2 Estimation par noyau

    The histogram has the bad property of being a step function: it is discontinuous and not differ-

    entiable. We would like to get a smooth representation, and we feel that this is possible when

    we have a full sample and not grouped data. Rosenblatt (1956) had the idea of replacing the

    indicator function by a kernel K which integrates to one like the indicator function. We thushave the new estimator

    f(x) =

    1

    n

    n

    i=1

    1

    hK(). (7)

    On peut deduire certaines proprietes du noyau en partant des proprietes de la fonction indicatrice

    et de lhistogramme.

    -

    K() d = 1,

    8

  • 8/3/2019 The econometrics of inequality and poverty

    9/31

    - h 0 when n ,- K() = 0,- A common choice for K is the standardised normal density. Then K( > 3) 0.- The value chosen for h is capital for defining the neighbourhood (x

    xi)/h

    3.

    It is very important to understand the role played by h in determining the shape of the ob-tained density. We have simulated 500 observations drawn from a mixture of normals N(i,1)with 1 = 1, 2 = 5 and p = 0.25.

    f(x) = 0.25f(x|1, 1) + (1 0.25)f(x|5, 1)On a ensuite estime la densite de ces observations en utilisant un noyau Normal et trois valeurs

    de h. For the while, we accept the fact that the optimal value ofh is given by

    h = c n1/5

    We have selected three values for c in the following graphs. On distingue bien la bimodalite dansle cas central; elle disparait dans la premiere figure et des aleas dechantillonnage apparaissent

    dans la troisieme.

    Histogram of y79

    y79

    Frequency

    0 100 200 300 400

    0

    200

    400

    60

    0

    800

    Figure 2: Histogram with 50 cell of FES 1979

    9

  • 8/3/2019 The econometrics of inequality and poverty

    10/31

    10

  • 8/3/2019 The econometrics of inequality and poverty

    11/31

    4 Proprietes dechantillonage

    On a passe en revue beaucoup de facteurs qui influaient sur le resultat final de lestimation de

    la densite. Les deux ingredients de base ont ete le choix du noyau et le choix de la fenetre de

    lissage. Comment mesurer exactement leur influence sur la precision du resultat final? On veut

    pouvoir mesurer la difference finale entre lestimateur et la vraie densite. Une mesure naturelle

    decart entre un estimateur et la vraie valeur est le Mean Squared Error

    MSEx() = E[ ]2 (8)que lon peut decomposer facilement en

    MSEx() = Biais[]2 + Var[] (9)

    Mais on veut estimer une densite et non un estimateur ponctuel. Il faut donc une mesure globale

    qui tienne compte de tout x. On va donc integrer sur x pour obtenir le MISE, ou Mean IntegratedSquared Error

    MISEx(f) = E

    [f(x) f(x)]2dx (10)Ceci correspond a une notion de risque. Si lon veut simplement minimiser la perte, il suffit deconsiderer:

    ISEx(f) =

    [f(x) f(x)]2dx (11)Le MISE est le critere le plus employe, mais il est difficile a calculer. On se contente souvent

    dapproximations qui peuvent se trouver en remarquant que le MISE peut se decomposer en

    MISEx(f) =

    [E(f(x)) f(x)]2dx +

    Var[f(x)] dx (12)

    Il suffit alors de trouver des approximations pour le biais et la variance et de les reporter dans

    cette formule.

    4.1 Assumptions and notations

    We already made some assumptions concerning the Kernel and the window size. We recall them

    and introduce some useful notations:

    -

    K(t) dt = 1

    -

    K2(t) dt = cK < -

    2K(t) dt = 2

    La quantite 2 va jouer un role important dans lexpression des resultats. Enfin, concernant

    la fenetre

    - h 0 quand n - n h quand n

    Elle doit tendre vers zero quand la taille de lechantillon augmente, mais pas trop vite.

    11

  • 8/3/2019 The econometrics of inequality and poverty

    12/31

    4.2 Biais et variance

    Le biais et la variance de lestimateur peuvent se calculer comme des esperance par rapport a la

    vrai distribution inconnue f(.). En partant de la formule de lestimateur par noyau simple

    E(f(x)) =

    1

    hK

    x yh

    f(y) dy (13)

    qui servira a calculer le biais et

    nVarf(x) = 1

    h2K

    x yh

    2 f(y) dy1

    hK

    x yh

    f(y) dy2 (14)

    4.3 Approximations du biais et de la variance

    Les formules exactes du biais et de la variance fonc intervenir des int egrales et ne sont pasapplicables directement, sauf dans des cas tres particuliers, de peu dinteret pratique. Aussi,

    a-t-on cherche des approximations au moyen dun developpement de Taylor reduit au premier

    ordre.

    Commenons par poser le changement de variable y = x ht de Jacobien h. En faisant cechangement de variable dans lexpression du biais, il vient que

    biais =

    K(t)[f(x ht) f(x)]dt (15)

    Developpons f(x ht) autour de h = 0

    f(x ht) = f(x) htf(x) + 12 h2t2f(x) + . . . (16)En utilisant le fait que le noyau est desperance nulle et de variance 2

    biais 12

    h2f(x)2 + . . . (17)

    Des calculs similaires pour la variance montrent que

    Var(f(x)) 1nh

    f(x) ck (18)

    en supposant que n est grand et h petit. Lapproximation du MISE est donc

    AMISE 14

    h422

    f(x)2dx +

    1

    nhck (19)

    Le biais ne depend que de la taille de la fenetre et non de la taille de lechantillon. La variance,

    par contre, depend de la taille de lechantillon. De plus on diminue le biais en diminuant h,

    12

  • 8/3/2019 The econometrics of inequality and poverty

    13/31

    mais diminuer h augmente la variance. Le choix de h implique un arbitrage entre les erreurssystematiques et les erreurs aleatoires. Cest ce que lon retrouve sur le graphique a fenetres. Si

    lon veut minimiser le MISE (ou AMISE ici), on se rend compte que le premier terme est du

    meme ordre que h4 alors que le second terme est du meme ordre que 1/(nh). Biais et variancesont alors du meme ordre pour

    h

    n1/5 (20)

    On retrouvera ce critere tout au long de linference nonparametrique.

    4.4 Determination de la fenetre et du noyau ideaux

    On va deriver le MISE approche par rapport a h, et trouver le h optimal en meetant cette expres-sion a zero. Il en resulte que

    hopt = 2/52 c

    1/5K {

    f(x)2dx}1/5 n1/5

    = cK

    n 22 f(x)2dx1/5

    (21)

    La fenetre ideale depend de beaucoup de choses:

    - Elle tend vers zero a une tres faible vitesse

    - Elle depend des fluctuations de f. Si f fluctue beaucoup, il faudra un petit h. Certainesmethodes vont caller h par rapport a une densite connue comme la Normale (Regle deSilverman).

    - Enfin, h depend du Noyau. Ceclui-ci peut toujours etre normalise de telle sorte que 2 = 1.

    Ce qui fait que le noyau nintervient plus quau travers de cK. La regle de Silvermanexploitera encore une fois ce fait.

    Si on reporte le h optimal dans lexpression du MISE, on va arriver a

    MISE 54

    2/52 c

    4/5K

    f(x)2dx

    1/5n4/5 (22)

    Le noyau ideal est celui qui minimise le MISE a f donne. Pour le trouver il faut en fait minimiserck sous contrainte que le noyau trouve soit bien une densite, cest a dire sintegre a un et soitnorme, cest a dire que 2 = 1. On montre que le noyau qui realise cette minimisation est le

    noyau de Epanechnikov qui a une expression toute simple

    K(t) =

    3

    4

    5(1 t

    2

    5) si t 5

    0 autrement

    (23)

    13

  • 8/3/2019 The econometrics of inequality and poverty

    14/31

    On peut encore simplifier cette expression par changement de variable x = 5

    5, mais alors cenoyau qui sintegre encore a 1, nest plus de variance egale a 1.

    On peut calculer lefficacite des autres noyaux par rapport au noyau dEpanechnikov en

    definissant le rapport t2Ke(t) dt

    Ke(t)

    2dt

    t2K(t) dt K(t)2dt(24)

    En utilisant les proprietes du noyau dEpanechnikov, on arrive a

    2/(5

    5t2K(t) dt

    K(t)2dt

    (25)

    Il est alors interessant de calculer lefficacite des noyau usuels. Avec ce qui semble la pire

    Kernel K(t) efficiency

    Epanechnikof

    3

    45(1 t2

    /5) 1

    Biweight15

    16(1 t2)2 0.99

    Gaussian12

    exp12t2 0.95

    Rectangular1

    2pour |t| < 1 0.93

    des solutions, le noyau rectangulaire qui conduit a lhistogramme, lefficacite est tres prochede 1. On ne consacrera donc pas trop de temps a choisir un noyau efficace. Seule dautre

    considerations peuvent enter en ligne de compte. Le noyau dEpanechnikov a linconvenient

    de netre derible quau premier ordre, alors que le biweight lest a lordre 2 et que le noyau

    Gaussien est infiniement derivable. Certains noyau sont a support fini, dautre infini. Cela fait

    une difference en terme defficacite numerique. Pour le noyau Gaussien, on peut passer son

    temps par exemple a calculer beaucoup de nombres qui recevront un poids tres faible.

    5 Choix de la fenetre

    Le choix de la fenetre est determinant sur laspect du resultat. Il est guide par le but poursuivi.Sil sagit de presenter le contenu des donnees, un choix subjectif fera souvent laffaire. Sil

    sagit de presenter des conclusion, un peu dundersmoothing sera utile, le lecteur pouvant lisser

    a loeil, mais par contre ne peut reconstituer des details qui ont ete gommes par un h trop grand.Quand un grand nombre de resultats sont a presenter, une methode automatique est tres utile. De

    14

  • 8/3/2019 The econometrics of inequality and poverty

    15/31

    meme si lon veut comparer des resultats, on aura interet a avoir une methode standardisee pour

    choisir h. On notera que les methodes automatiques ne peuvent etre qualifiee dobjectives carelles reposent toutes sur des hypotheses particulieres.

    5.1 Choix subjectifs

    On considere plusieurs graphiques et lon choisi a loeil la valeur de h qui donne le resultat leplus esthetique. On peut se reporter aux figures precedentes.

    5.2 Reference a une distribution connue

    Nous avons vu que le h optimal etait donne par

    hopt = 2/52 c

    1/5K

    f(x)2dx

    1/5n1/5 (26)

    Certains elements de cette expression sont connus comme n et K. Mais f est bien sur inconnue,

    puisquon cherche a lestimer et lon doit calculer f(x)2dx. Si lon suppose que la vraiedistribution f est Normale de moyenne nulle et de variance 2, alors

    fN(0,2)(x)2dx = 5

    0.375

    0.2125 (27)

    Si maintenant on choisit un noyau normal, on verifie que 2 = 1 et cK = 0.5/

    . Il suffit alorsde collecter ensemble tous les petits bouts pour trouver que dans ce cas, le h optimal est donnepar

    h 1.06 n1/5 (28)Il suffit destimer de maniere consistante la variance de lechantillon et lon a un h optimal. Cestce que lon a coutume dappeler la regle de Silverman.

    Ceci marche tres bien tant que lon est proche du cas Normal, mais beaucoup moins bien

    des que lon sen ecarte. En particulier, si la vrai distribution f est une mixture, la formule deSilverman aura tendance a oversmoother des que les modes de la mixture secartent. Differentes

    etudes on montre que cette regle oversmoothe aussi en cas dasymetrie de f, mais pas dans lecas de kurtosis. En particulier si f est une Student, la regle marche bien.

    Pour ameliorer la regle de Silverman, on cherche a trouver une meilleure evaluation de la

    dispertion. Si R est le range interquartile, alors la regle se transforme en

    h = 0.79 R n1/5 (29)

    Mais avec cette regle on aggrave loversmoothing en cas de bimodalite. Finalement on pourra

    choisir

    h = 0.9 A n1/5 (30)

    15

  • 8/3/2019 The econometrics of inequality and poverty

    16/31

    A est le minimum entre et R/1.34. Les graphiques presentes ont ete calcules avec

    h = c A n1/5 (31)

    avec c prenant comme valeurs 10, 1 et 0.1.

    5.3 Cross validation sur la vraisemblance

    On va poursuive lidee de la vraisemblance et lappliquer cette fois-ci au choix de h. Si lafonction de vraisemblace est donnee par

    log f(xi), une pseudo fonction de vraisemblance est

    log L =

    log f(xi, h). (32)

    Le probleme cest que loptimum de cette fonction est obtenu pour h = 0. On va donc appliquerle cross-validation principle et evaluer non pas f(xi, h), mais fi(xi, h)

    fi(xi, h) =1

    h(n 1)

    n

    j = 1i = j

    Kxj xi

    h (33)

    qui consiste a laisser de cote une observation a chaque fois. Cest un principe general en ap-

    proche nonparametrique qui sera utilise par la suite.

    Cette methode de vraisemblance revient a choisir le h qui minimise la distance de Kulback-Leibler entre f et f, soit

    f(x)log

    f(x)

    f(x)

    dx. (34)

    Mais le h obtenu est severement affecte par le comportement des queues de f. Aussi ce criterenest pas tres employe. Il est utile de le connaitre car il a permi dintroduire un principe qui sera

    tres utile par la suite pour la regression nonparametrique.

    5.4 Least squares cross validation

    On va chercher cette fois-ci a optimiser un critere qui est plus elabore que la simple pseudo

    fonction de vraisemblance precedente. Considerons lIntegrated Squared Error

    ISE(h) =

    (f(x, h) f(x))2dx (35)

    En developpant le carre, on se rend compte que lon peut simplifier cette expression car un des

    membres ne depend pas de h

    ISE(h)

    f(x, h)2 dx 2

    f(x, h) f(x) dx (36)

    16

  • 8/3/2019 The econometrics of inequality and poverty

    17/31

    On va donc estimer cette quantite en utilisant lechantillon et ensuite trouver la valeur de h quila minimise.

    La methode de cross-validation tire son nom de la methode particliere utilisee pour estimer

    f(x) en laissant tomber une des observations. Definissons lestimateur

    fi(x, h) = 1h(n 1) j=i Kx xjh (37)

    La notation i signifie que lon laisse tomber lobservation i pour estimer f(xi). A partir de la,on remarque que

    f(x, h) f(x) dx est lesperance de f(x, h). Un estimateur sans biais de cette

    esperance est donne par la moyenne empirique des fi(x, h), soit

    E(f(x, h)) 1n

    ni=1

    fi(xi) (38)

    Il faudrait montrer pourquoi on choisit cet estimateur de leave-out et pourquoi on nutilise pas

    tout lechantillon.Il faut maintenant calculer le premier element du ISE au moyen de

    f2dx =1

    n2h2

    i

    j

    x

    K

    xi xh

    K

    xj xh

    dx (39)

    La solution est donnee par f2dx =

    1

    n2h2

    i

    j

    K

    xi xjh

    (40)

    K = K K. Si le noyau est une Normale (0,1), alors K = N(0, 2).On aura tout de suite compris que la methode est lourde a employer sur le plan des calculs.

    Pour chaque valeur de h, il faut evaluer ISE(h) qui fait intervenir une double somme. De plus lafonction peut avoir plusieurs minima locaux. Pagan et Ullah mentionnent la technique de bin-

    ning utilisee dans Xplore pour reduire les temps de calcul. On remarque egalement la parentee

    avec la methode precedente, mais les fonctions de lechantillon utilisees sont plus complexes.

    5.5 Density estimation with weighted samples

    When there are weights wi, we must first impose that the weights sum to unity. The usual formula

    is simply modified into

    f(x) =1

    n h

    wiK

    x xi

    h

    17

  • 8/3/2019 The econometrics of inequality and poverty

    18/31

    5.6 Using R

    The standard stats package includes a routine for estimating densities. The density object is cre-

    ated by simply calling density(x) where x represents the data set, assuming that the data arepresented in a column. By default a Gaussian kernel is used and the classical rule of Silverman

    for the bandwidth. Of course many options are possible which can be found on the help. We

    present these option in the following table To obtain a graph, it suffices to use the routine plot

    Bandwidth Kernel Weight

    bw = nrd0(x) kernel = gaussian weights = rep(1/nx, nx)

    bw=bw.ucv(x) kernel = epanechnikov

    bw=bw.SJ(x) kernel = triangular

    Table 3: Options for density estimation

    together with the output object of density. For instance plot(density(x)). If we want to

    change the default method for determining the bandwidth, using for instance the cross validation

    method, we can use

    plot(density(y79,bw=bw.ucv(y79)))

    6 General estimation methods

    Inference, including bayesian inference for Pareto and other simple densities. R and inference

    for SM using the FES data.

    6.1 Inference for grouped data

    La fonction de vraisemblance pour ce type de donnees secrit sous forme multinomiale

    L() = N!g

    i=1

    Pi()ni

    ni!

    ou Pi() est la probabilite detre dans la classe ith des g groupes de la population:

    Pi() =

    Iif(y; )dy

    ni/Nsont les frequences observees.Lajustement de diverses fonctions par cette methode donne des densites en cloche, alors

    que les valeurs des parametres trouvees par Thurow (1970) impliquaient des formes en U ou en

    L. Les coefficient de Gini calcules correspondent a ceux estimes de maniere directe par le US

    Bureau of Census.

    18

  • 8/3/2019 The econometrics of inequality and poverty

    19/31

    Pour comparer les densites entre elles, McDonald utilise la valeur de la fonction de vraisem-

    blance, des SSR

    SS R =

    (niNpi())2

    ou des statistiques de 2

    Chi =

    (

    niNpi())2/pi()

    Un test de rapport de rapport de vraisemblance permet de tester les reductions parametriques et

    de comparer certaines distributions entre elle. Pour les donnees de revenu, la GB2 domine, mais

    la SM vient juste apres devant la GG par exemple. La log Normale est mauvaise.

    On trouve dans McDonald-Ransom (1979) quelques details supplementaires sur lestimation

    sur donnees groupees.

    La premiere methode consiste a maximiser la fonction de vraisemblance multinomiale. Cest

    la methode dite du scoring.

    La methode du 2 minimum consiste a minimiser la distance du 2

    n (nin

    pi)

    2

    pi

    Cette distance est distribuee selon un 2 a g k 1 degres de liberte ce qui permet de testerladequation de la fonction aux donnees.

    La methode des moindres carres consiste a minimiser(

    ninpi)2

    Cette derniere methode donne des resultats en general different des deux premieres et assez mau-

    vais.

    La methode utilisee par Singh et Maddala (1976) est un peu differente. On ne va pas chercher

    a baser lestimation sur la probabilite dune intervalle et donc en fait a chercher a minimiser ladifference entre un histogramme et son approximation parametrique, mais la difference entre

    lestimateur naturel de la distribution empirique et lexpression analytique de cette distribution.

    Comme pour la SM on a

    F(x) = 1 1(1 + a1xa2)a3

    lestimation consiste a minimiser[log(1 F) + a3 log(1 + a1xa2i )]2

    On constate plusieurs choses sur cette methode:

    il sagit de minimiser une norme de moindres carres et non de 2

    . Donc, on risque davoirla une premiere source derreurs (moindres carres non ponderes).

    pour la borne infinie, on ne peut calculer log(1F) alors que cetait sans probleme dans lecas base sur la densite. Ce probleme se pose pour les troncatures du style revenus inferieurs

    a ou bien revenus superieurs a...

    19

  • 8/3/2019 The econometrics of inequality and poverty

    20/31

    6.2 Inference for Pareto IID samples

    Inference is quite easy for the usual Pareto I model. It is detailed for instance in Arnold (2008).

    Let us suppose that we have an IID sample of X which is drawn from a Pareto I. Once we havemanaged to obtain the estimates ofxm and of, it is easy to produce an estimate for the neededtransformations of these parameters such as for instance the Gini coefficient and to find their

    standard deviation using the delta method (which is not very precise). In the case of Bayesianinference, such estimates are easier to obtain.

    6.3 Maximum likelihood

    The likelihood function is

    L(x; xm, ) = nxnm (

    xi)

    (+1)1I(xi xm)It is easy to see that we have two sufficient statistics which give imediately the MLE

    xm = x(1)

    = 1n log(xi/x(1))1As underlined by Arnold (2008), these estimators are positively biased as

    E(xm) = xm(1 1/(n))1

    Var(xm) = x2mn(n 1)2(n 2)1

    E() = n/(n 2)Var() = 2(n 2)2(n 3)1

    Knowing the bias, it is easy to propose unbiased estimators by simply correcting the initial max-

    imum likelihood estimators. Do it as an exercise. Once we know the estimates of xm and of,it is easy to produce an estimate for the needed transformations of these parameters such as for

    instance the Gini coefficient and to find their standard deviation using the delta method (which

    is not very precise).

    6.4 Graphical and regression methods

    This is in fact the original method. We know how to estimate the empirical distribution function

    in a simple way. We know that

    (1 F(xi)) = (xi/xm)

    Taking the logs each side leads to the regressionlog(1 F(xi)) = cste log(xi) + i

    If we do not get a straight line when plotting the two logs, it is a test that the sample does not

    come from a Pareto distribution. We can also estimate in a similar way using the empiricalLorenz curve. These estimators are consistent.

    20

  • 8/3/2019 The econometrics of inequality and poverty

    21/31

    6.5 Using R for Pareto fit

    # Compute and plot (1-F), log(1-F) for FES data

    library(ineq)

    data1=read.table("fes79.csv",header=F,sep=";")

    data2=read.table("fes88.csv",header=F,sep=";")

    data3=read.table("fes92.csv",header=F,sep=";")data4=read.table("fes96.csv",header=F,sep=";")

    y79 = sort(data1[,1])/223.5*223.5

    y88 = sort(data2[,1])/421.7*223.5

    y92 = sort(data3[,1])/546.4*223.5

    y96 = sort(data4[,1])/602.4*223.5

    y79 = sort(data1[,1])/223.5*223.5

    pareto

  • 8/3/2019 The econometrics of inequality and poverty

    22/31

    !h

    0 2 4 6 8

    8

    6

    4

    2

    0

    log(y)

    log(1

    F)

    Figure 3: Pareto tail for the income distribution

    22

  • 8/3/2019 The econometrics of inequality and poverty

    23/31

    where sm is also an unknown parameter, inference becomes more delicate and a Gibbs sampleris needed. When the sample is given and observed, it is natural to assign to xm the minimumvalue of the sample, either the observed value or a value determined on an a priori ground. We

    have the same problem with the Weibull where it is supposed that X has to be positive, so thereis a minimum value taken equal to zero when the general form of the density includes a location

    parameter.

    Consider a gamma prior for . Write the likelihood function, after a transformation of thevariable. In this case, the log transformation of the data density into that of translated exponential

    density. The Pareto distribution is related to the exponential distribution as follows. Suppose X

    is Pareto-distributed with minimum xm and index . Let

    Y = log

    X

    xm

    .

    Then Y is exponentially distributed with intensity , or equivalently with expected value 1/:

    Pr(Y > y) = ey.

    so that the cumulative density function is 1

    ey and the pdf

    f(x; ) =

    ex, x 0,

    0, x < 0.

    The likelihood function for , given an independent and identically distributed sample y =(y1,...,yn) drawn from the variable, is

    L(; y) =n

    i=1

    exp(yi) = n exp

    ni=1

    yi

    = n exp(ny) ,

    where

    y = 1n

    ni=1

    yi

    is the sample mean ofy. The conjugate prior for the exponential distribution is the gamma distri-bution (of which the exponential distribution is a special case). The following parameterisations

    of the gamma pdf is useful:

    Gamma( ; , s) =s

    ()1 exp( s).

    The posterior distribution p can then be expressed in terms of the likelihood function definedabove and a gamma prior:

    p(|y) L(; y) Gamma( ; , s)

    = n exp( nx) s

    ()1 exp( s)

    (+n)1 exp( (s + nx)).

    23

  • 8/3/2019 The econometrics of inequality and poverty

    24/31

    Now the posterior density p has been specified up to a missing normalising constant. Since it hasthe form of a gamma pdf, this can easily be filled in, and one obtains

    p(|y) = Gamma( ; + n, s + nx).

    Here the parameter can be interpreted as the number of prior observations, and s as the sum of

    the prior observations.Once the posterior is obtained, we can generate random number from it in order to find the

    distribution of the Gini coefficient for instance or of any of the other transformation of.

    6.7 Inference for the Lognormal process

    The probability density function of a log-normal distribution is:

    fX (x; , ) =1

    x

    2exp

    (ln x )222

    , x > 0where and are the mean and standard deviation of the variables natural logarithm. Thismeans for instance that = E(log(x)). The likelihood function is rather simple to write once wenote that this pdf is just the normal pdf times the Jacobian of the transformation which is 1/x.We have

    fL(x; , ) =n

    i=1

    1

    x i

    fN(ln xi; , )

    where by fL we denote the probability density function of the log-normal distribution and by fNthat of the normal distribution. Therefore, using the same indices to denote distributions, we can

    write the log-likelihood function in the following way:

    L(, |x1, x2, . . . , xn) = k ln xk + N(, | ln x1, ln x2, . . . , ln xn)= constant + N(, | ln x1, ln x2, . . . , ln xn).

    Since the first term is constant with regard to and , both logarithmic likelihood functions, Land N, reach their maximum with the same and . Hence, using the formulas for the normaldistribution maximum likelihood parameter estimators and the equality above, we deduce that

    for the log-normal distribution it holds that

    =

    k ln xk

    n,

    2 =

    k (ln xk )2

    n.

    This means that in a lognormal sample, the two parameters can be estimated by the sample mean

    of the logs and the variance of the logs.

    24

  • 8/3/2019 The econometrics of inequality and poverty

    25/31

    6.8 Using R to compare Pareto and Lognormal

    When you have declared the package ineq and loaded your observations, you can use the follow-

    ing code to compare the non-parametric Lorenz curves and the Lorenz curves corresponding to

    a Pareto density and to a lognormal density.

    plot(Lc(y79))

    lines(p,Lc.pareto(p, parameter=2),col="red")

    text(0.9,0.6,"Pareto 2.0")

    lines(p,Lc.lognorm(p, parameter=0.45),col="blue")

    text(0.45,0.4,"Lognormal 0.45")

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lorenz curve

    p

    L(p)

    Pareto 2.0

    Lognormal 0.45

    Figure 4: Lorenz for Pareto and Lognormal

    25

  • 8/3/2019 The econometrics of inequality and poverty

    26/31

    7 Using mixtures for IID samples

    7.1 Informal introduction

    Let us go back to the FES data sets. Which kind of density can we fit to these data? We have

    illustrated several stylised facts

    The Pareto does not fit the data as shown by the Lorenz curve The lognormal seems to fit the data better as shown again by the Lorenz curve The high incomes, greater than exp(4.5) = 90.02, seem to behave like a Pareto

    If we estimate the Pareto regression on the whole sample, the results seem to be good, when in

    fact there are not, as shown by the graphs

    ls.out |t|)

    Intercept 7.5148 0.0502 149.5581 0

    X -1.9736 0.0116 -170.3988 0

    When done on the truncated sample, they are far better

    Residual Standard Error=0.1453

    R-Square=0.9786

    F-statistic (df=1, 2190)=99911.3

    p-value=0

    Estimate Std.Err t-value Pr(>|t|)

    Intercept 18.6256 0.0622 299.6337 0

    X -4.0838 0.0129 -316.0875 0

    But we need to confirm these results by a plot of the Lorenz curve

    z = exp(4.5)

    plot(Lc(y79[y79>z]))

    lines(p,Lc.pareto(p, parameter=4),col="red")

    lines(p,Lc.lognorm(p, parameter=0.25),col="blue")

    26

  • 8/3/2019 The econometrics of inequality and poverty

    27/31

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lorenz curve

    p

    L(p)

    Figure 5: Lorenz fit for high incomes

    We see on this plot that the red Lorenz curve corresponding to a Pareto with = 4.0 fitsslightly better than the lognormal with 2 = 0.25. So we cannot use a single distribution tomodel these data. This is confirmed by a non-parametric estimate of the density. We estimate the

    data density using a Kernel and plot it together with a fit for the lognormal density obtained with

    plot(density(y79))

    lines(dlnorm(seq(0,350,1),meanlog=mean(ly79) ,sdlog=sd(ly79) ),col

    We see clearly that if the overall fit could pass for being nice, the two modes are of course

    smoothed into something with is even not in between, while the right tail seems to be fitted quite

    well.

    7.2 Mixture of distributions

    When a single density is not enough to represent correctly the distribution of a sample, a simple

    explanation is that the observed sample is heterogenous and this result from the mixing of dif-

    ferent populations, each being represented by a particular density indexed by a given parameter.

    27

  • 8/3/2019 The econometrics of inequality and poverty

    28/31

    0 100 200 300 400 500

    0.0

    00

    0.0

    02

    0.0

    04

    0.0

    06

    0.0

    08

    0.0

    10

    0.0

    12

    density.default(x = y79)

    N = 6230 Bandwidth = 5.982

    Density

    Figure 6: Non parametric estimate of the density for FES79

    The trouble is that we do not know first how many different sub-populations there are and secondwhat is their proportion. This lack of knowledge makes the problem difficult. For a simplifica-

    tion, let us suppose that we have only two subpopulations, each one being described by a density

    indexed by i and in unknown proportion p. The density of one observation is

    f(x|) = p fN(x|1, 21) + (1 p) fN(x|2, 22)

    if we suppose as a simplification that the two members of the mixture are normal densities. If

    we knew the sample separation, i.e. which observation belongs to group 1 or 2, the inference

    problem would be very simple. But of course, the allocation of the observations is unknown.

    7.3 Estimation procedures

    It is convenient to introduce a new random variable called Z that will be associated to eachobservation xi and that will say if xi belongs to the first component of the mixture zi = 1 or tothe second component of the mixture zi = 2. Suppose that we know the n values of z. We can

    28

  • 8/3/2019 The econometrics of inequality and poverty

    29/31

  • 8/3/2019 The econometrics of inequality and poverty

    30/31

    7.5 Estimating mixture in R

    The complexity of the estimation procedures is reflected in the procedures proposed in R. In order

    to simplify the problem, the program start by considering an histogram, which means grouped

    data. So we have first to select the number of cells in the histogram. Then we have to give

    starting values for the parameters, and first of all the number of components. It it is quite safe

    to estimate a two component mixture, many references in the empirical literature indicate thattrying to fit more than two component is rarely successful. Usually an equal weight is given as a

    starting value for the pi. A visual inspection of the histogram gives clues about plausible valuesfor the mean. The prior variance is small when the prior mean correspond to a sharp part of the

    histogram and much larger for the prior mean corresponding to the tail.

    library(mixdist)

    FES.mix

  • 8/3/2019 The econometrics of inequality and poverty

    31/31

    Table 4: Parameter estimates for a twin mixture

    member p

    1 0.1369 45.42 6.764

    2 0.8631 89.14 40.811

    0 100 200 300 400

    0.0

    00

    0.0

    05

    0.0

    10

    0.0

    15

    X

    ProbabilityDensity

    Lognormal Mixture

    Figure 7: Mixture of two lognormal densities

    References

    ARNOLD, B. C. (2008): Pareto and Generalized Pareto Distributions, in Modeling Income

    Distribuions and Lorenz Curves, ed. by D. Chotikapanich, vol. 5 of Economic Studies in

    Equality, Social Exclusion and Well-Being, chap. 7, pp. 119145. Springer, New-York.

    DEATON, A. (1997): The Analysis of Household Surveys. The John Hopkins University Press,

    Baltimore and London.

    31