Top Banner
Journal of Economic Inequality 1: 129–146, 2003. © 2003 Kluwer Academic Publishers. Printed in the Netherlands. 129 A family of correlation coefficients based on the extended Gini index E. SCHECHTMAN 1,and S. YITZHAKI 2 1 Department of Industrial Engineering and Management, Ben Gurion University of the Negev, Beer Sheva, Israel, E-mail: [email protected] 2 Department of Economics, The Hebrew University of Jerusalem, Jerusalem, Israel, E-mail: [email protected] Abstract. The extended Gini is a family of measures of variability which is mainly used in the areas of finance and income distribution. Each index in the family is defined by specifying one parameter, which reflects the social evaluation of the marginal utility of income. The higher the parameter, the more weight is attached to the lower portion of the cumulative distribution, reflecting higher concern for poverty. In this paper we list and investigate the properties of the equivalents of the correlation coefficient that are associated with the extended Gini family. In addition, we show that the extended Gini of a linear combination of random variables can be decomposed, in a way which is equivalent to the decomposition of the variance, with, in addition, terms that reflect additional properties of the random variables. The implication of these properties is that any decomposition that is performed with the coefficient of variation can be replicated by an infinite number of indices that are based on the Extended Gini coefficient. Key words: decomposition, extended Gini, Gini correlation. 1. Introduction The extended Gini is a family of measures of variability which is mainly used in the areas of finance and income distribution. (See the surveys by Lien and Tse [19] on hedging theory and of Wodon and Yitzhaki [27] on its application to the area of income distribution. See also Gregory-Allen and Shalit [13] and Shalit and Yitzhaki [25] for its relevance to stock market analysis.) One member of this family is Gini’s mean difference (hereafter GMD) hence its name. Each index in the family is defined by specifying one parameter. The higher the parameter, the more weight is attached to the lower portion of the cumulative distribution. Re- cent developments of the use of this family include the development of regression coefficients based on the family for the multiple regression case (see [23]) and numerous other papers in the surveys mentioned above for the simple regression. Those regression coefficients call for the use of the equivalent of the correlation coefficient. * Author for correspondence.
18

A family of correlation coefficients based on the extended Gini index

May 05, 2023

Download

Documents

Yishai Kiel
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A family of correlation coefficients based on the extended Gini index

Journal of Economic Inequality 1: 129–146, 2003.© 2003 Kluwer Academic Publishers. Printed in the Netherlands.

129

A family of correlation coefficients basedon the extended Gini index

E. SCHECHTMAN1,∗ and S. YITZHAKI2

1Department of Industrial Engineering and Management, Ben Gurion University of the Negev,Beer Sheva, Israel, E-mail: [email protected] of Economics, The Hebrew University of Jerusalem, Jerusalem, Israel,E-mail: [email protected]

Abstract. The extended Gini is a family of measures of variability which is mainly used in the areasof finance and income distribution. Each index in the family is defined by specifying one parameter,which reflects the social evaluation of the marginal utility of income. The higher the parameter, themore weight is attached to the lower portion of the cumulative distribution, reflecting higher concernfor poverty. In this paper we list and investigate the properties of the equivalents of the correlationcoefficient that are associated with the extended Gini family. In addition, we show that the extendedGini of a linear combination of random variables can be decomposed, in a way which is equivalentto the decomposition of the variance, with, in addition, terms that reflect additional properties of therandom variables. The implication of these properties is that any decomposition that is performedwith the coefficient of variation can be replicated by an infinite number of indices that are based onthe Extended Gini coefficient.

Key words: decomposition, extended Gini, Gini correlation.

1. Introduction

The extended Gini is a family of measures of variability which is mainly usedin the areas of finance and income distribution. (See the surveys by Lien andTse [19] on hedging theory and of Wodon and Yitzhaki [27] on its application tothe area of income distribution. See also Gregory-Allen and Shalit [13] and Shalitand Yitzhaki [25] for its relevance to stock market analysis.) One member of thisfamily is Gini’s mean difference (hereafter GMD) hence its name. Each index inthe family is defined by specifying one parameter. The higher the parameter, themore weight is attached to the lower portion of the cumulative distribution. Re-cent developments of the use of this family include the development of regressioncoefficients based on the family for the multiple regression case (see [23]) andnumerous other papers in the surveys mentioned above for the simple regression.Those regression coefficients call for the use of the equivalent of the correlationcoefficient.

* Author for correspondence.

Page 2: A family of correlation coefficients based on the extended Gini index

130 E. SCHECHTMAN AND S. YITZHAKI

The aim of this paper is to list and investigate the properties of the equivalentsof the correlation coefficient that are associated with the extended Gini family. Inparticular, we show that those correlation coefficients can be used to decomposethe extended Gini of the sum of random variables in a way, which resembles thedecomposition of the variance of the sum of random variables. If the distributionsof the random variables are “well behaved”, the decomposition is then identical tothe decomposition of the variance. If, on the other hand, some of the distributionsare not “well behaved” there will be an additional term that is added to the decom-position, a term that will enable us to quantify which individual random variableis not “behaving well”, and its degree of “improper behavior”. The meaning of theterm “well behaved” will be clarified below. The implication of having this familyof correlations is that any model that is based on the decomposition of the variance(i.e., OLS regression, mean-variance in portfolio theory) can be replicated withsome modifications by an infinite number of models.

The extended Gini inequality index is intended to respond to an observationmade by Atkinson [1], who showed that the intersection of two Lorenz curvesis a necessary and sufficient condition for the existence of two alternative socialwelfare functions that rank inequality in opposite ways. One should therefore statehis assumptions concerning the social evaluation of the marginal utility of incomefirst, and only then construct the inequality index. It is argued that the implicationof Atkinson’s observation should affect all empirical analyses that are concernedwith inequality. The estimation of Engel curves, for example, for the evaluationof the impact of a tax reform on inequality, should also take into account the factthat different social welfare functions may yield different qualitative results. Theextended Gini correlation coefficient, together with the formula of the decomposi-tion of the Gini of a linear combination of random variables, enables one to firstselect the social welfare function, and only then to conduct the empirical analysissubject to the chosen welfare function. Only in the case where changing the welfarefunction does not affect the conclusion may one safely ignore the need to select asocial welfare function.

The structure of the paper is as follows: in the second section of the paper,the extended Gini and the family of correlations are presented, in three differentways. Section 3 is devoted to the properties of the family of Gini correlations. InSection 4, the decomposition of the extended Gini is proposed. Section 5 providesthe link with inequality measurement, while Section 6 concludes the paper.

2. The extended Gini and the family of Gini correlations

Let X and Y be two random variables with continuous distribution functions FX

and GY , respectively, and a joint distribution function H(X, Y ). The family of cor-relations is based on the extended Gini (EG) measure of variation and co-variation,and on one parameter, ν.

Page 3: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 131

There are three alternative definitions of the EG that are convenient to use indifferent contexts. In the continuous case, all definitions are identical. However,they may differ in the discontinuous case. To avoid the adjustments needed forthe discontinuous case, it is assumed that all variables are continuous. The firstdefinition is based on the Lorenz curve [15,32]:

EG(X, ν) = ν(ν − 1)

∫ 1

0(1 − p)ν−2(µp − A(p)) dp, (1)

where ν > 0 is a parameter chosen by the investigator, µ is the mean of X andA(p) is the absolute Lorenz curve. The definition of the absolute Lorenz curve is:

A(p) =∫ x(p)

−∞t dFX(t), (2)

where p = ∫ x(p)

−∞ dFX(t). As can be seen from Equation (2), the absolute Lorenzcurve is the Lorenz curve multiplied by the expected value of the distribution. Thesecond definition is:

EG(X, ν) = −ν cov(X, (1 − FX(X))ν−1). (3)

(See, for example [27].) The third, which holds only when ν is an integer, is

EG(X, ν) = µ − E(min(X1, . . . , Xν))

=∫

((1 − F(x)) − [1 − F(x)]ν) dx. (4)

(See, for example, Kleiber and Kotz [16].) The parameter ν, which is deter-mined by the investigator, has the following properties: If ν → 1 then the variabil-ity index represents the attitude of someone who does not care about variability, i.e.,the index tends to zero independently of the variability of the distribution. On theother extreme, ν → ∞ represents variability as viewed by a max-min investigator(i.e. someone who cares only about the lowest portion of the distribution). Thecase where ν = 2 represents the Gini mean-difference, which is a symmetric indexwith respect to the cumulative distribution (i.e., F(X) and 1 − F(X) are assignedthe same weight). One of the properties of the EG family is that the membersare always non-negative, and for ν > 1, a mean-preserving spread will alwaysincrease their value. It is worth noting that due to the asymmetric nature of theindices, EG(X, ν) �= EG(−X, ν). This property will play an important role in theproperties of the EG correlation coefficients. The two EG are equal for ν = 2; thatis:

EG(X, 2) = EG(−X, 2) = GMD(X) = GMD(−X).

Equation (3) is the most convenient one for defining the equivalent of the covari-ance and the equivalent of the correlation. The equivalent of the covariance in the

Page 4: A family of correlation coefficients based on the extended Gini index

132 E. SCHECHTMAN AND S. YITZHAKI

extended Gini framework, the co-Gini between X and Y , for a given ν, is definedby:

ECG(X, Y, ν) = −ν cov(X, [1 − GY (Y )]ν−1).

The family of correlations ζ(X, Y, ν) is defined as:

ζ(X, Y, ν) = −ν cov(X, (1 − GY (Y ))ν−1)

−ν cov(X, (1 − FX(X))ν−1), (5)

and similarly,

ζ(Y, X, ν) = −ν cov(Y, (1 − FX(X))ν−1)

−ν cov(Y, (1 − GY (Y ))ν−1).

That is, similar to the Gini correlation [21,22], each member of the EG family hastwo correlation coefficients associated with it. Note that the measure of correla-tion is not symmetric in X and Y . The choice between ζ(X, Y, ν) and ζ(Y, X, ν)

depends on which variable is ranked and which is given in its variate values.A unified way to express correlation coefficients is based on the difference

H(X, Y ) − F(X)G(Y ). Using this difference, one can write Pearson’s ρ, Spear-man’s s or the Gini correlation γ as follows:

ρ(X, Y ) =∫∫

(H(x, y) − F(x)G(y)) d(x) d(y)/σXσY ,

s(X, Y ) = 12∫∫

(H(x, y) − F(x)G(y)) dF(x) dG(y)

and

γ (X, Y ) = ζ(X, Y, 2)

=∫∫

(H(x, y) − F(x)G(y)) d(x) dG(y)/cov(X, F (X)).

See [14, 22 and 24] for details. This presentation hints that the properties of Ginicorrelation are a mixture of the properties of Pearson and Spearman correlations.

An alternative definition will turn out to be useful too. Define g(y) = E(X |Y = y) as the conditional expectation. Similar to (2), define the concentrationcurve of X as:

CX,Y (p) =∫ y(p)

−∞g(y)fY (y) dy, (6)

where p = ∫ y(p)

−∞ fY (y) dy. Then

ECG(X, Y, ν) = ν(ν − 1)

∫ 1

0(1 − p)ν−2(µp − CX,Y (p)) dp. (7)

Page 5: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 133

CLAIM 1. The family of extended Gini correlation coefficients can be expressedas follows:

ζ(X, Y, ν) = (ν − 1)∫ ∫

(H(x, y) − F(x)G(y))(1 − G(y))(ν−2) dG(y) dx

cov(X, −[1 − F(X)]ν−1). (8)

The proof of the claim is given in the Appendix. The special case, with ν = 2,was suggested by Schechtman and Yitzhaki [21] and its properties were studiedthere, as well as in Schechtman and Yitzhaki [22]. Some of the properties applyto the general case, while others fail to hold. In what follows we shall study theproperties of the family of correlation coefficients. Proofs which are similar to thespecial case will not be repeated.

3. The properties of the family of extended Gini correlations

The main properties of ζ(X, Y, ν) are:

1. Let F and G be cumulative distribution functions of X and Y , respectively.Then, for every joint distribution function H(X, Y ) and for every ν,

ζ(X, Y, ν) ≤ 1

for all (X, Y ).2. If Y is a monotone increasing function of X, then ζ(X, Y, ν) = 1 for all ν.3. If X and Y are independent, then ζ(X, Y, ν) = ζ(Y, X, ν) = 0 for all ν.4. ζ(X, Y, ν) is invariant under all strict monotonic increasing transformations

of Y .5. Let (X, Y ) have a bivariate Normal distribution with correlation coefficient ρ,

then ζ(X, Y, ν) = ζ(Y, X, ν) = ρ for all values of ν.6. Invariance under exchangeability. Let (X, Y ) be exchangeable up to a linear

transformation.1 Then, ζ(X, Y, ν) = ζ(Y, X, ν) for all ν.

The proofs of properties 2, 3, 4 and 5 are similar to the ones for the specialcase ν = 2 and can be found in Schechtman and Yitzhaki [21]. The proofs ofproperties 1 and 6 are given in the Appendix.

An alternative sufficient condition for the equality of the Gini correlation coef-ficients, ζ(X, Y, ν) = ζ(Y, X, ν), for every ν, is that CX,Y (p) = CY,X(p) for all p,where CX,Y (p) is the concentration curve as defined in (6). Using Equation (7), theproof is immediate. We define distributions with CX,Y (p) = CY,X(p) for all p as“well behaved” distributions.

It is interesting to note that for the special case ν = 2, if Y is a monotonedecreasing function of X, then ζ(X, Y, ν) = −1 (the lower bound for the specialcase), but this does not hold for the general case, as the following example shows:

Let Y = −X. For this case, ζ(X, Y, ν) = −1 implies that

cov(X, −[F(X)]ν−1) = −cov(X, −[1 − F(X)]ν−1).

Page 6: A family of correlation coefficients based on the extended Gini index

134 E. SCHECHTMAN AND S. YITZHAKI

For ν = 2, the condition holds. For ν = 3, the condition translates into whether ornot

cov(X, F (X)) = cov(X, F 2(X)),

which is generally not true. For example, choose F(x) = x2, for 0 ≤ x ≤ 1.Then, cov(X, F (X)) = 1/15, but cov(X, F 2(X)) = 4/63. The lower bound forthe general case is discussed later.

The family of measures ζ(X, Y, ν) differs from the classical correlation ρ inthree major properties:

(a) ζ(X, Y, ν) = 1 whenever Y is an increasing function of X, not necessarilylinear. (This property holds for the Spearman coefficient as well.)

(b) Let F and G be cumulative distribution functions of X and Y , respectively.Then, there exists a joint distribution function H(X, Y ) such that for every ν,ζ(X, Y, ν) = 1. The fact that the upper bound of 1 can always be achieved ishelpful as a benchmark. See the discussion on the proper bounds of Pearson,Spearman and Gini correlation coefficients in, for example, Schechtman andYitzhaki [22].

(c) The lower bound of ζ(X, Y, ν) is given by

ζ(X, Y, ν) ≥∫

F(x)(F ν−1(x) − 1) dx∫((1 − F(x))(1 − (1 − F(x))ν−1) dx

, (9)

and is obtained when Y = −X. Two special cases are worth mentioning: thecase where X comes from a symmetric distribution, and the case with ν = 2.In these two cases, the lower bound is −1, as for the classical correlationcoefficient. However, in general the lower bound depends on the cumulativedistribution – the more concave it is, the lower it can get. The proofs ofproperties (b) and (c) are given in the Appendix.An intuitive explanation of (c) will be given following the decomposition ofthe EG of a sum of random variables (end of Section 4).

The lower bound can be expressed, for the case where ν is an integer, as

−EG(−X, ν)

EG(X, ν)= µ − Emax(X1, X2, . . . , Xν)

µ − Emin(X1, X2, . . . , Xν).

We illustrate this lower bound for X exponentially distributed, with the scale pa-rameter equal to unity. Kleiber and Kotz [16] show that EG(X, ν) = 1 − 1/ν. Itcan be shown that E max(X1, X2, . . . , Xν) = ∑ν

k=1(−1)(k+1)(ν

k

)1k. Therefore, the

lower bound for ν = 3 is 1−11/61−1/3 = −5/4. The lower bound for ν = 4 is −13/9,

smaller than for ν = 3 since the cumulative distribution is more concave.

4. The decomposition of the extended Gini

Let (Y1, Y2) be drawn from a bivariate distribution. In what follows we will showthat if Y0 is a linear combination of Y1 and Y2, then the extended Gini coefficient

Page 7: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 135

of Y0 can be decomposed in a way which is similar to the decomposition of thevariance, plus an additional term which reflects the asymmetry of the correlationcoefficient. Note the change in notation. We use (Y1, Y2) rather than (X, Y ) sincethe decomposition can easily be extended so that Y0 is a linear combination ofY1, Y2, . . . , Yk.

CLAIM 2. Let (Y1, Y2) be drawn from a bivariate distribution. Let Y0 = αY1 +βY2, where α and β are given constants. Then,

(a) G20 − [αD10G1 + βD20G2]G0

= α2G21 + β2G2

2 + αβG1G2(ζ(Y1, Y2, ν) + ζ(Y2, Y1, ν)), (10)

where Gi = EG(Yi, ν), i = 0, 1, 2 is the extended Gini coefficient (see (3)), and

ζ(Yi, Yj , ν) = cov(Yi, [1 − Fj(Y )]ν−1)

cov(Yi, [1 − Fi(Y )]ν−1),

for i, j = 0, 1, 2, is the extended Gini correlation, Dij = ζ(Yi, Yj , ν)−ζ(Yj , Yi, ν),for i, j = 0, 1, 2 is the difference between the Gini correlations.

(b) Provided that Dij = 0 for i, j = 0, 1, 2, the following decomposition holds:

G20 = α2G2

1 + β2G22 + 2αβG1G2ζ, (11)

where ζ = ζ(Y1, Y2, ν) = ζ(Y2, Y1, ν) is the extended Gini correlation between Y1

and Y2, and between Y2 and Y1.

The structure of Equation (11) is identical to the decomposition of the variance,with G2

i substituting for the variance and ζ substituting for Pearson’s correlation.Note that by a proper choice of α and β, Equations (10) and (11) can be appliedboth to absolute measures like the GMD and to relative measures such as the Ginicoefficient. The proof of Claim 2 is given in the Appendix.

Clearly, property (b) of the claim is a special case of property (a). However,because of its similarity to the variance decomposition, the practical importanceof case (b) is much greater than that of the general case, since it implies thatANY variance-based model can be replicated, using the Gini as a substitute forthe variance as a measure of dispersion.

It is worthwhile to mention that (11) is easier to work with than (10). The ques-tion is, however, how restrictive the assumption Dij = 0 really is. Schechtman andYitzhaki [21] showed that a sufficient condition for Dij = 0 is that the variablesare exchangeable up to a linear transformation.

However, this is only one possible sufficient condition for Dij = 0. An alter-native sufficient condition can be formed in terms of concentration curves. As canbe seen from simple modifications of (7), if Ci,j (p)/Ci,i(p) = Cj,i(p)/Cj,j (p)

for all p then Dij = 0 for all ν. Clearly, this sufficient condition does not exhaustall possibilities and we leave it to further research to state whether one can find

Page 8: A family of correlation coefficients based on the extended Gini index

136 E. SCHECHTMAN AND S. YITZHAKI

the necessary and sufficient conditions for Dij = 0. Note also that under bivariatenormality Dij = 0.

While Equation (11) enables one to imitate variance based models, Equation (10)opens new possibilities that, as far as we know, do not exist yet. It is worth stressingthat each violation of the condition Dij = 0 is reflected in a specific term in thedecomposition Equation (10). Therefore, one can identify the random variableswhose distributions are not “well behaved” and attach to the violation a quantitativevalue (see [33]).

We conclude this section by using the decomposition to give an intuitive expla-nation for why the lower bound of the EG correlation may differ from −1. Let usstart by decomposing the identity Y = X + (−X) = 0. Clearly, EG(Y, ν) = 0 byconstruction. Also DXY = D−XY = 0. Hence:

0 = EG2(X, ν) + EG2(−X, ν) ++ EG(X, ν)EG(−X, ν)ζ(X, −X, ν) ++ EG(X, ν)EG(−X, ν)ζ(−X, X, ν).

Should the EG of X and −X be equal, the lower bound of the correlation coefficientwould be −1 since the lower bound of this correlation is obtained between X and−X. To see this, note that if EG(−X, ν) = EG(X, ν), the decomposition wouldbecome:

0 = EG2(X, ν)[2 + ζ(X, −X, ν) + ζ(−X, X, ν)],so that the sum of the correlation coefficients is equal to −2, and if they are equal,then the lower bound is −1. However, since the EG is asymmetric with respect tothe cumulative distribution, EG(−X, ν) may not be equal to EG(X, ν) and there-fore, it is impossible to have both a lower bound of −1 and a decomposable index.

5. Justification and possible applications

In his seminal paper [1], Atkinson proved that if the expected values of two incomedistributions are equal, the intersection of the two Lorenz curves is a necessary andsufficient condition for the existence of two legitimate social welfare functionsthat rank the inequality in the distributions in opposite ways. Shorrocks [26] hasgeneralized this statement by showing that the intersection of two Absolute Lorenzcurves is a necessary and sufficient condition for the existence of two social welfarefunctions that rank inequality in opposite ways.

The implication of Atkinson’s observation is that in the general case, when onedoes not know whether the Lorenz curves intersect or not, one has to state hissocial preferences first, and then, relying on those preferences, conduct inequalitymeasurement.

As far as we can see, the implications of the above observations go far beyondinequality measurement and should affect the way in which empirical economic

Page 9: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 137

analysis in the area of welfare economics (and risk) is performed. To be con-crete, let us demonstrate this point by an example. Assume that the governmentis interested in redesigning the indirect tax system in order to improve its povertyalleviation policy. An investigator is asked which policy is more effective: to sub-sidize bread or butter. Assuming that all other components of the problem areidentical (e.g., marginal excess burdens are identical), the investigator may facethe situation that if she assumes one social welfare function then she will find outthat subsidizing bread is more effective than subsidizing butter but if an alternativesocial welfare function is assumed, the conclusion may be reversed. It is worthstressing that our argument holds only for an Engel curve of a non-specified curva-ture because if the Engel curves are linear, the choice of the social welfare functionceases to affect the conclusion.

To take into account Atkinson’s observation, we should define our social pref-erences first, and only then conduct our empirical analysis, subject to our socialviews.

The correlation coefficients investigated in this paper are basic building blocksin constructing such a method of empirical research. To see this, note that thefamily of extended Gini inequality measures enables the investigator to state hissocial preferences first, and then to measure inequality, taking into account hissocial preferences. By choosing the parameter ν, the investigator expresses hissocial attitude. It is shown in [31] that

µX − EG(X, ν) ≥ µY − EG(Y, ν) (12)

for all ν > 1 is a necessary condition for E(U(X))≥E(U(Y )) for all social welfarefunctions with U ′ > 0, U ′′ < 0. Moreover, condition (12) is also a special caseof Yaari’s [29] social welfare function. Therefore, by choosing ν, the investigatordefines the social evaluation of the marginal utility of income, which determineshis social preferences.

The extended Gini correlation enables the investigator to decompose the ex-tended Gini in a way which is similar to the decomposition of the variance, andtherefore, opens the way for imitating most of the statistical analyses performedwith the decomposition of the variance, subject to the definition of social prefer-ences. However, there is a long way to go until we reach that stage, because thestatistical estimation procedure and hypotheses testing should be developed first.It is therefore important to show the kind of problems that can be handled by theextended Gini correlation.

Numerous papers use the extended Gini income elasticity (GIE) to show theimpact of a small increase (decrease) in a tax on a commodity X on the extendedGini of inequality in real incomes, Y [27]. Using the expressions of the extendedGini covariance and correlation, it is easy to show that the GIE of commodity X canbe expressed in a way, which is identical to the presentation of an income elasticityin an Ordinary Least Squares regression. Formally:

Page 10: A family of correlation coefficients based on the extended Gini index

138 E. SCHECHTMAN AND S. YITZHAKI

GIE(ν) = bXY (ν)µY

µX

= ζ(X, Y, ν)γ (X, ν)

γ (Y, ν), (13)

where GIE(ν) is the extended Gini income elasticity,

bXY (ν) = cov(X, [1 − G(Y )](ν−1))

cov(Y, [1 − F(Y )](ν−1))

is the extended Gini regression coefficient of X on Y [23] (i.e., an estimate ofthe slope of the Engel curve of commodity X with respect to income Y ), so thatthe middle term in (13) is the marginal propensity to spend on X, divided by theaverage propensity, while γ (X, ν) = EG(X, ν)/µX is the extended Gini inequalityindex. Equation (13) enables us to determine whether the given value of the GIEis due to the correlation of the commodity with income or its inequality among thepopulation. It also shows that the GIE of the extended Gini obeys all the algebra ofelasticities.

Equations (10) and (11) open also the way to directions of research that are cur-rently restricted to the use of variance and coefficient of variation. We illustrate thepoint by two examples. In both cases the following linear relationship is assumed:

y0 = a1y1 + · · · + anyn, (14)

where ai (i = 1, . . . , n) are given constants. The two examples are based on thedecomposition of the inequality of y0 which may be shown to be linked to theinequalities in yi’s and the correlations among them. An example of using such arelationship is given in Wodon and Yitzhaki [28] who observed that yearly incomesare the sums of monthly incomes, so that one can connect the inequality of yearlyincomes to the inequality in monthly incomes, and the correlations among them,to evaluate the impact of the length of the accounting period on inequality. Ac-cording to the theory of permanent income, and assuming that the discount ratesare given, one is led to the same kind of relationship between consumption andperiodical incomes. Hence, one can investigate the relationship between consump-tion inequality and income inequality with a decomposition of the extended Giniinequality index. Papers linking consumption inequality and income inequality are,among others, [3,10,4].

The other line of research that can be investigated by a relationship of the typeof (14) is the effect of spouse and other income components on household’s in-come. See for example the analysis of Cancian and Reed [5] and the referencesthere to decompositions that are based on the coefficient of variation.

Finally, one may wonder which value of ν should be selected. To answer thisquestion we have to return to Atkinson’s findings. First, one should view the useof different ν’s as a sensitivity analysis to see whether social preferences affect theconclusions. In the example of an indirect tax reform, if all Engel curves involvedare linear, so that all extended Gini correlation coefficients are equal (except forsampling variability), and hence all extended Gini regression coefficients are equal,

Page 11: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 139

we may reach conclusions that will be appropriate for a wide range of social wel-fare functions. This means that we may analyze the data, ignoring the social prefer-ences towards inequality. If, on the other hand, the use of different extended Gini’sleads to opposite conclusions, we have to admit that without additional assumptionson the social welfare function, there is no way of giving appropriate advice.

In the latter case, one may want to estimate the ν that reflects the social attitude.Provided that we have a set of decisions made by the society and one is readyto assume that all marginal excess burdens are equal (or can be estimated), thenone may use methods along the lines presented in Mera [20] to estimate the ν thatbest reflects the social attitude. This problem is identical to that of estimating theparameter ε in Atkinson’s index [1] that best reflects the social attitude. However,as far as we know, no one has tried to estimate ε in Atkinson’s index yet, so wecan’t say that the way to estimate ν in a reliable way is clear. Further data andadditional research are needed to answer this question.

6. Conclusion

This paper investigates and develops the properties of the extended Gini correlationcoefficient. This family enables one to stress differently various sections of thedistributions of the X’s so that social preferences can govern the analysis. The ex-tended Gini, and the equivalents of the covariance and correlations offer a methodthat on one hand enables one to decompose the EG of a sum of random variablesin a way which imitates the decomposition of the variance, while on the other handalso enables one to discover and quantify the impact on the decomposition of adeviation from “well behaved” distributions. Some properties have been derivedthat had not been stressed before, but it will take time to grasp all the implicationsof such a method. It can be used whenever one is not comfortable with a variancebased analysis of the data, or whenever economic considerations lead one to stressspecific portions of the distributions of the independent variables as is common inFinance or Income Distribution Analysis. In some sense the method provides analternative treatment of extreme irrelevant observations. Instead of dropping themfrom the sample the method enables the investigator to reduce the weight attachedto those observations in a systematic way. The drawback of the new method isthat it is more complicated than the decomposition of the variance because it aban-dons the symmetry imposed by Pearson’s correlation coefficient on the randomvariables.

Acknowledgement

This research was partially supported by a grant from GIF. We would like to thankProfessor Samuel Kotz and Professor Gideon Schechtman for helpful commentsand discussions, and two anonymous referees and the editors for helping us toimprove the quality of the paper.

Page 12: A family of correlation coefficients based on the extended Gini index

140 E. SCHECHTMAN AND S. YITZHAKI

Note1 X and Y are exchangeable up to a linear transformation if there exist a and b (a > 0) such that

(X,Y ) and (aY + b, X) have the same distribution. Intuitively, this means that there exists a lineartransformation that will make the shapes of the marginal distributions identical.

Appendix

PROOF OF CLAIM 1 OF SECTION 2

For simplicity, we prove the claim for X, Y ≥ 0. All the integrals are from 0 to ∞unless stated otherwise. We shall use Kruskal’s method [17] to show that

−cov(X, (1 − G(Y ))ν−1)

= (ν − 1)

∫∫[H(u, y) − F(u)G(y)](1 − G(y))ν−2 du dG(y).

Let (X1, Y1) and (X2, Y2) be i.i.d. As shown in [17],

2cov(X, Y ) = E[(X1 − X2)(Y1 − Y2)]= E

∫∫[I (u, X1) − I (u, X2)][I (t, Y1) − I (t, Y2)] du dt,

where

I (u, X) ={

1, if u≤X,0, otherwise.

There are two types of components in the integral:

(a) I (u, X1)I (t, Y1), where X1 and Y1 are dependent, and(b) I (u, X1)I (t, Y2), where X1 and Y2 are independent.

Replacing Y by (1 − G(Y ))ν−1 we get:For (a)

E[I (u, X1)I (t, (1 − G(Y1))ν−1)]

= P [u≤X1, t≤(1 − G(Y1))ν−1]

= P(u≤X1, G(Y1)≤1 − t1/(ν−1)) = P(u≤X1, Y1≤G−1(1 − t1/(ν−1))))

= P(Y1≤G−1(1 − t1/(ν−1))) − P(Y1≤G−1(1 − t1/(ν−1)), X1≤u)

= 1 − t1/(ν−1) − H(u, G−1(1 − t1/(ν−1))).

For (b)

E[I (u, X1)I (t, (1 − G(Y2))ν−1)]

= P(u≤X1)P (t≤(1 − G(Y2))ν−1)

= P(u≤X1)P (G(Y2)≤1 − t1/(ν−1)) = (1 − F(u))(1 − t1/(ν−1))

= 1 − F(u) − t1/(ν−1) + F(u)t1/(ν−1).

Page 13: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 141

Combining the pieces and substituting into the integrals, we get:

2cov(X, (1 − G(Y ))ν−1)

= 2∫ ∞

0

∫ 1

0[1 − t1/(ν−1) − H(u, G−1(1 − t1/(ν−1)))]−

−[1 − F(u) − t1/(ν−1) + F(u)t1/(ν−1)] du dt

= 2∫ ∞

0

∫ 1

0[F(u) − H(u, G−1(1 − t1/(ν−1)))−

− F(u)t1/(ν−1)] du dt.

Substituting t1/(ν−1) = 1 − G(y) and dt = −(ν − 1)(1 − G(y))ν−2 dG(y), we get

−2∫ ∞

0

∫ 0

∞[F(u) − H(u, y) − F(u)(1 − G(y))](1 − G(y))ν−2(ν − 1) du dG(y)

= −2(ν − 1)

∫ ∞

0

∫ ∞

0(H(u, y) − F(u)G(y))(1 − G(y))ν−2 du dG(y)

and thus,

−cov(X, (1 − G(Y ))ν−1)

= (ν − 1)

∫∫[H(u, y) − F(u)G(y)](1 − G(y))ν−2 du dG(y).

PROOFS OF PROPERTIES 1 AND 6 OF SECTION 3

Proof of property 1. The proof is based on two claims:

C1. Given the marginal distribution functions of X and Y , and assuming that thedensities exist and are positive everywhere, cov(X, Y ) is maximal when Y isan increasing function of X, and minimal when Y is a decreasing functionof X.

C2. Y is an increasing function of X if and only if GY (Y ) is an increasing functionof X, and then

E[(X − E(X))(GY (Y ) − 0.5)] = E[(X − E(X))(FX(X) − 0.5)].We need to show that

cov(X, −[1 − GY (Y )]ν−1) ≤ cov(X, −[1 − FX(X)]ν−1).

Note that GY (Y ) is U(0, 1), so U = 1 − GY (Y ) is also U(0, 1), and

E(Uν−1) =∫ 1

0uν−1 du = 1/ν.

Page 14: A family of correlation coefficients based on the extended Gini index

142 E. SCHECHTMAN AND S. YITZHAKI

Therefore,

E(1 − FX(X))ν−1 = E(1 − GY (Y ))ν−1 = 1/ν.

Note that X and −[1 − FX(X)]ν−1 are non-decreasing functions of X. ByClaim 1, cov(X, −(1 − GY (Y ))ν−1) achieves its maximal value when−(1 − GY (Y ))ν−1 is an increasing function of X. Now, −(1 − GY (Y ))ν−1 is nondecreasing if and only if GY (Y ) is a non-decreasing function of X which impliesFX(X) = GY (Y ), and that means that the maximum is achieved at cov(X, −[1 −FX(X)]ν−1), which completes the proof.

Proof of property 6. Under exchangeability, H(X, Y ) = H(Y, X). The proofis similar to the proof in Schechtman and Yitzhaki [21] for the GMD, except thatevery F should be replaced by −(1 − F)ν−1 and E(F) = 0.5 should be replacedby E(−(1 − F)ν−1) = −1/ν.

PROOFS OF PROPERTIES (b) AND (c) OF SECTION 3

Proof of (b). Fréchet [11] has shown that there exist bivariate distributionsH0(x, y) and H1(x, y) with marginals (F, G) such that for ANY bivariate distribu-tion H(x, y) with the same marginals,

H0(x, y) ≤ H(x, y) ≤ H1(x, y), (A1)

where H0(x, y) = max{F(x) + G(y) − 1, 0} and H1(x, y) = min{F(x), G(y)}are the Fréchet minimal and Fréchet maximal distributions, respectively [7]. UsingFréchet results, we obtain the upper bound as follows (all the integrals are from−∞ to ∞ unless stated otherwise):

−cov(X, (1 − G(Y ))ν−1)

= (ν − 1)

∫∫(H(x, y) − F(x)G(y))(1 − G(y))ν−2 dG(y) dx

= (ν − 1)

∫∫H(x, y)(1 − G(y))ν−2 dG(y) dx −

− (ν − 1)

∫∫F(x)G(y)(1 − G(y))ν−2 dG(y) dx.

Using Fréchet minimal distribution, and using integration by parts, we get that thefirst integral on the right-hand side can be bounded as follows:∫∫

H(x, y)(1 − G(y))ν−2 dG(y) dx

≤∫∫

min(F (x), G(y))(1 − G(y))ν−2 dx dG(y)

=∫∫ G−1(F (x))

−∞G(y)(1 − G(y))ν−2 dx dG(y) +

Page 15: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 143

+∫∫ ∞

G−1(F (x))

F (x)(1 − G(y))ν−2 dx dG(y)

= 1

(ν − 1)

∫F(x)(1 − F(x))ν−1 dx +

+ 1

(ν − 1)

∫−G(y)(1 − G(y))ν−1

∣∣∣∣G−1(F(x))

−∞+

+ 1

(ν − 1)

∫∫ G−1(F (x))

−∞(1 − G(y))ν−1 dG(y)

= 1

(ν − 1)

∫∫ G−1(F (x))

−∞(1 − G(y))ν−1 dG(y)

= 1

ν(ν − 1)

∫(1 − (1 − F(x))ν) dx. (A2)

The second integral, again by integrations by parts, can be expressed as:∫∫

F(x)G(y)(1 − G(y))ν−2 dG(y) dx

=∫

F(x)

{∫1

(ν − 1)(1 − G(y))ν−1 dG(y)

}dx

= 1

ν(ν − 1)

∫F(x) dx. (A3)

Combining the two parts and multiplying by (ν − 1), we get:

−cov(X, (1 − G(Y ))ν−1) ≤ 1

ν

∫{(1 − F(x)) − (1 − F(x))ν} dx. (A4)

Similar arguments show that

−cov(X, (1 − F(X))ν−1) = 1

ν

∫{(1 − F(x)) − (1 − F(x))ν} dx, (A5)

since in this case, H(x, y) = min(F (x), G(y)). Therefore, by choosing H(x, y)

to be Fréchet minimal distribution, the upper bound of 1 is obtained.

Proof of (c). Using Fréchet [11] results, and integrating by parts, we get∫∫

H(x, y)(1 − G(y))ν−2 dG(y) dx

≥∫∫

max(F (x) + G(y) − 1, 0)(1 − G(y))ν−2 dG(y) dx

=∫∫ ∞

G−1(1−F(x))

F (x)(1 − G(y))ν−2 dG(y) dx +

Page 16: A family of correlation coefficients based on the extended Gini index

144 E. SCHECHTMAN AND S. YITZHAKI

+∫∫ ∞

G−1(1−F(x))

G(y)(1 − G(y))ν−2 dG(y) dx −

−∫∫ ∞

G−1(1−F(x))

(1 − G(y))ν−2 dG(y) dx

=∫

F(x)F ν−1(x) dx

(ν − 1)+

∫(1 − F(x))F ν−1(x) dx

(ν − 1)+

+∫ −(1 − G(y))ν |∞

G−1(1−F(x))

dx

ν(ν − 1)−

∫F ν−1(x) dx

(ν − 1)

= F ν(x)

ν(ν − 1).

Combining the pieces together, we get

−cov(X, (1 − G(Y ))ν−1)

= (ν − 1)

∫∫(H(x, y) − F(x)G(y))(1 − G(y))ν−2 dG(y) dx

≥∫

F ν(x) dx

ν−

∫F(x) dx

ν

= 1

ν

∫(F ν(x) − F(x)) dx. (A6)

Using (A5) and (A6), we then get

ζ(X, Y, ν) ≥∫

F(x)(F ν−1(x) − 1) dx∫((1 − F(x))(1 − (1 − F(x))ν−1) dx

.

PROOF OF CLAIM 2 OF SECTION 4

Proof of (a). Using the properties of the covariance we can write:

G0 = −νcov(αY1 + βY2, [1 − F0(Y )]ν−1)

= −ν(αcov(Y1, [1 − F0(Y )]ν−1) + βcov(Y2, [1 − F0(Y )]ν−1)

= αζ(Y1, Y0, ν)G1 + βζ(Y2, Y0, ν)G2.

Define now the identity ζ(Yi, Y0, ν) = Di0 + ζ(Y0, Yi, ν) for i = 1, 2, whereDi0 is the difference between the two Gini correlations. Using the identity, we get:

G0 = α(ζ(Y0, Y1, ν) + D10)G1 + β(ζ(Y0, Y2, ν) + D20)G2. (A7)

Rearranging the terms, we see that

G0 − αD10G1 − βD20G2 = αζ(Y0, Y1, ν)G1 + βζ(Y0, Y2, ν)G2. (A8)

Page 17: A family of correlation coefficients based on the extended Gini index

CORRELATION COEFFICIENTS BASED ON THE EXTENDED GINI INDEX 145

Using the properties of the covariance, we now get that

ζ(Y0, Y1, ν) = cov(Y0, [1 − F1(Y )]ν−1)

cov(Y0, [1 − F0(Y )]ν−1)

= αcov(Y1, [1 − F1(Y )]ν−1) + βcov(Y2, [1 − F1(Y )]ν−1)

cov(Y0, [1 − F0(Y )]ν−1)

= αG1 + βG2ζ(Y2, Y1, ν)

G0.

Writing ζ(Y0, Y2, ν) in a similar manner, and applying it to (A8), we get:

G20 − [αD10G1 + βD20G2]G0

= αG1(αG1 + βG2ζ(Y2, Y1, ν)) + βG2(αζ(Y1, Y2, ν)G1 + βG2)

= α2G21 + β2G2

2 + αβG1G2(ζ(Y1, Y2, ν) + ζ(Y2, Y1, ν)). (A9)

Note that one can substitute Di0 by the difference in correlations and get:

G20 − [α(ζ(Y1, Y0, ν) − ζ(Y0, Y1, ν))G1 + β(ζ(Y2, Y0, ν) −

− ζ(Y0, Y2, ν))G2]G0

= α2G21 + β2G2

2 + αβG1G2(ζ(Y1, Y2, ν) + ζ(Y2, Y1, ν)).

Proof of (b). Assuming equality of the two Gini correlation coefficients be-tween Y0 and Y1 sets D10 = 0. A similar assumption with respect to Y0 and Y2 setsD20 = 0. The assumption of ζ = ζ(Y1, Y2, ν) = ζ(Y2, Y1, ν) completes the proofof (b).

References

1. Atkinson, A.B.: On the measurement of inequality, J. Econom. Theory 2 (1970), 244–263.2. Blitz, R.C. and Brittain, J.A.: An extension of the Lorenz diagram to the correlation of two

variables, Metron XXIII(1–4) (1964), 137–143.3. Blundell, R. and Preston, I.: Consumption inequality and income uncertainty, Quart. J. Econom.

113 (1998), 603–640.4. Bukhauser, R. and Poupore, J.: A cross national comparison of permanent inequality in the

United States and Germany, Rev. Econom. Statist. 79 (1997), 10–17.5. Cancian, M. and Reed, D.: Assessing the effects of wive’s earnings on family inequality, Rev.

Econom. Statist. 80(1) (1998) (February).6. Chakravarty, S.R.: Ethical Social Index Numbers, Springer-Verlag, New York, 1990.7. De Veaux, R.: Technical Report No. 5, SIAM Institute for Mathematics and Society, Stanford

University, 1976.8. Donaldson, D. and Weymark, J.A.: A single-parameter generalization of the Gini-indices of

inequality, J. Econom. Theory 22 (1980), 67–86.9. Donaldson, D. and Weymark, J.A.: Ethically flexible Gini indices for income distributions in

the continuum, J. Econom. Theory 29 (1983), 353–358.10. Fisher, J.D. and Johnson, D.S.: Consumption mobility in the United States: Evidence from two

panel data sets, Bureau of Labor Statistics, October 2002, mimeo.

Page 18: A family of correlation coefficients based on the extended Gini index

146 E. SCHECHTMAN AND S. YITZHAKI

11. Fréchet, M.: Sur les tableaux de correlation dont les marges sont données, Annals Universitede Lyon, Sect. A. Ser. 3 14 (1951), 53–77.

12. Giorgi, G.M.: Bibliographic portrait of the Gini concentration ratio, Metron XLVIII(1–4)(1990), 183–221.

13. Gregory-Allen, R. and Shalit, H.: The estimation of systematic risk under differentiated riskaversion: A mean-extended Gini approach, Rev. Quantitative Finance and Accounting 12(2)(March 1999), 135–157.

14. Hoeffding, W.: A nonparametric test for independence, Ann. Math. Statist. 19 (1948), 546–557.15. Kakwani, N.C.: On a class of poverty measures, Econometrica 48 (1980).16. Kleiber, C. and Kotz, S.: A characterization of income distributions in terms of generalized

Gini coefficients, Social Choice and Welfare 19 (2002), 789–794.17. Kruskal, W.H.: Ordinal measures of association, J. Amer. Statist. Assoc. 53 (1958), 814–861.18. Lerman, R.I.: How do income sources affect income inequality, In: J. Silber (ed.), Handbook on

Income Inequality Measurement, Kluwer Academic Publishers, Dordrecht, 1999, pp. 341–357.19. Lien, D. and Tse, Y.K.: Some recent developments in futures hedging, Working paper,

University of Texas at San Antonio, 2001.20. Mera, K.: Experimental determination of relative marginal utilities, Quart. J. Econom. 83(3)

(August 1969), 464–477.21. Schechtman, E. and Yitzhaki, S.: A measure of association based on Gini’s mean difference,

Comm. Statist. Theory Methods 16(1) (1987), 207–231.22. Schechtman, E. and Yitzhaki, S.: On the proper bounds of the Gini correlation, Econom. Lett.

63(2) (1999), 133–138.23. Schechtman E. and Yitzhaki, S.: Asymmetric Gini regressions, estimation and testing, Mimeo,

Hebrew University of Jerusalem, Dept. of Economics, 2001.24. Schweizer, B. and Wolff, E.F.: On nonparametric measures of dependence for random variables,

Ann. Statist. 9 (1981), 879–885.25. Shalit, H. and Yitzhaki, S.: Estimating beta, Rev. Quantitative Finance and Accounting 18(2)

(2002), 95–118.26. Shorrocks, A.F.: Ranking income distributions, Economica 50 (1983), 3–17.27. Wodon, Q. and Yitzhaki, S.: Inequality and social welfare, In: J. Klugman (ed.), PRSP

Sourcebook, The World Bank, Washington DC, 2002.28. Wodon, Q. and Yitzhaki, S.: Inequality and the accounting period, Mimeo, The World Bank,

March 2002.29. Yaari, M.E.: The dual theory of choice under risk, Econometrica 55 (1987), 95–115.30. Yaari, M.E.: A controversial proposal concerning inequality measurement, J. Econom. Theory

44(2) (1988), 381–397.31. Yitzhaki, S.: Stochastic dominance, mean variance, and the Gini’s mean difference, American

Economic Review 72(1) (March 1982), 178–185.32. Yitzhaki, S.: On an extension of the Gini inequality index, Internat. Econom. Rev. 24(3) (1983),

617–628.33. Yitzhaki, S. and Wodon, Q.: Mobility, inequality, and horizontal equity, Mimeo, The World

Bank, 2000.