Copula-Based Nonparametric Tests for Positive Quadrant Dependence Allowing for Arbitrary Marginal Distributions. James Philip Martin * National University of Singapore October 2021 Abstract Positive quadrant dependence (PQD) is a common relationship between economic vari- ables. Existing tests of PQD require all the marginal distributions to be continuously (or discretely) distributed. This is often very restrictive in practice because many economic relationships involve both continuous and discrete variables. In this paper, we extend copula-based tests for PQD based on the multilinear empirical copula to a general setting that allows for arbitrary marginal distributions. We provide conditions for validity and consistency of a Kolmogorov-Smirnov (KS) type test and a Cramer–von Mises (CvM) type test with critical values determined by a multiplier bootstrap. In an empirical application, we use our tests to investigate the dependence between intergenerational wages. Keywords: Positive Quadrant Dependence; Multilinear Empirical Copula Process; Boot- strap JEL Codes: C12; C35.. * [email protected]1
33
Embed
Copula-Based Nonparametric Tests for Positive Quadrant ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Copula-Based Nonparametric Tests for PositiveQuadrant Dependence Allowing for Arbitrary
Marginal Distributions.
James Philip Martin∗
National University of Singapore
October 2021
Abstract
Positive quadrant dependence (PQD) is a common relationship between economic vari-ables. Existing tests of PQD require all the marginal distributions to be continuously (ordiscretely) distributed. This is often very restrictive in practice because many economicrelationships involve both continuous and discrete variables. In this paper, we extendcopula-based tests for PQD based on the multilinear empirical copula to a general settingthat allows for arbitrary marginal distributions. We provide conditions for validity andconsistency of a Kolmogorov-Smirnov (KS) type test and a Cramer–von Mises (CvM) typetest with critical values determined by a multiplier bootstrap. In an empirical application,we use our tests to investigate the dependence between intergenerational wages.
Genest et al. (2017) gives many more examples. Next we need to make some new as-
sumptions in order to define the limiting distribution.
9
For a distribution H with marginal distributions FX and FY , we define the following
for every subset S of S = {1, 2} and all uj ∈ [0,1]:
λF,S(u1, u2) =∏j∈S
λFj ,S(uj)∏j∈Sc
(1− λFj ,S(uj))
. We finally define,
uF,S =
Fj{F−1j (uj)} if j ∈ S
Fj{F−1j ((uj))−} if j /∈ S
Definition 2.3. The bilinear extension functional F :l∞(rangeF )→ l∞([0, 1]2) is defined
for all g ∈ l∞(rangeF ) and u ∈ [0,1] by
F(g) =∑
S⊆{X,Y }
λF,S(u)g(uF,S).
This allows us to state the convergence result from Genest et al. (2017) which is used
to prove the consistency of our test. For a set K ⊂ [0, 1]2, let C(K) denote the set of
continuous, bounded functions from K → R, and let denote weak convergence in the
Hoffmann - Jørgensen sense (See van der Vaart and Wellner (1996) for details). Weak
convergence in C(O) means weak convergence in C(K) for every compact subset K of O,
with l∞(O) defined analogously. The following lemma is proved as Theorem 1 in Genest et
al. (2017). Then Lemma 2.3 is a simple consequence of this convergence and is analogous
10
to the continuous result of Scallet (2005).
Lemma 2.2. Under Assumption 2.1 we have that
√n(Cn − C ) CC ,
where this convergence takes place in C(O).
With BC = F(BCR ) and BCR a Brownian bridge with covariance kernel, for u,u′ ∈ O
Cov(BCR (u),BCR (u′)
)= C (u ∧ u′)− C (u)C (u′),
where u ∧ u′ is the minimum taken of u and u′, taken componentwise. Then for u ∈ O,
CC can be written as
CC (u) = BCi (u)−2∑q=1
∂qC(u)BC (u(q)),
where u(q) is a vector of ones, except the qth component, which is uq.
In this paper we base our statistics on functionals of the process Dn = uv − Cn(u, v). If
we define D = uv − C (u,v), then Lemma 2.3 is a simple consequence of Lemma 2.2.
Lemma 2.3. Under Assumption 2.1 we have that
√n(Dn −D ) CC .
Where this convergence takes place in C(O) and CC is the same as in defined in Lemma
11
2.2.
3 Proposed Test Statistics
3.1 Test Statistics
Given Lemma 2.1, we can write the null hypothesis (H0) and the alternative (H1) as
follows:
H0 : C (u, v) ≥ uv for all u, v ∈ [0, 1]
H1 : C (u, v) < uv for some u, v ∈ [0, 1].
We base the statistics on measures of violation of the multilinear extension copula Cn
and the independence copula uv. We expect this difference to be negative under the null.
However, unlike the continuous case, where the process converges everywhere, we need to
make assumptions on where the process CCn converges.
Assumption 3.1. (i) There exists an open set O with Lebesgue measure 1 such that
for all (u, v) ∈ O, ∂iC (u, v) exists and is continuous for i = 1, 2. (ii) There exists an
open set O that is dense in [0, 1]2 such that for all (u, v) ∈ O, ∂iC (u, v) exists and is
continuous for i = 1, 2.
Assumption 3.1(i) is the assumption that was given in Genest et al. (2017) to show the
validity of CvM-type statistics. However, Assumption 3.1 (ii) is a slightly weaker con-
12
dition than Assumption 3.1(i). It is introduced in this paper, and under it the KS-type
statistic is show to be valid.
The first test statistic we propose is an extension of the KS-type statistic introduced
in Scaillet (2005). Specifically, it is based on the supremum norm,
Kn =√n supu,v∈[0,1]2
(uv − Cn(u, v)). (2)
We propose another statistic that generalized an CvM-type test statistic, In, in the con-
tinuous case, where it was found to have greater power than the KS-type statistic in the
simulations of Gijbels et al. (2010):
In = n
∫[0,1]2
max(uv − Cn(u, v), 0)2dudv.
Both of the test statistics work by finding a critical value c∗ such that we reject the null
if Dn > c∗, for Dn = In or Dn = Kn.
Proving the validity of our tests has some differences to the tests in the continuous
marginal setting. As our proposed test statistics are applied over the entire set [0, 1]2,
but the multilinear empirical copula process does not converge everywhere in the unit
square, unlike in the continuous case. Hence in order to provide for consistent tests, we
define what it means for a functional to be approximable on a set. We let gRA denote the
restriction of an aribitrary function g : [0, 1]2 → R to a set A ⊂ [0, 1]2. We define a slightly
modified version of approximately than the one used in Genest et al. (2017). We call it
13
D-approximable, this will be used to emphasise the fact that the domain is important.
As the functional Kn is C-approximable but not l∞-approximable 4.
Definition 3.1. We say that a functional Γ :D→ R is D-approximable on an open set
A ⊂ [0, 1]2 if the following two conditions hold: (i) There exists a functional ΓA:A → R
such that for all g ∈ D,Γ(g) = Γ(gA); (ii) for all M , δ ∈ (0,∞), there exists a compact
set K ⊂ A and a continuous functional ΓK : D(K)→ R such that
supg∈D(A), ||g||≤M
|ΓA(g)− ΓK(gK)| < δ.
In effect, for a functional to be D-approximable it must not be asymptotically affected
by the points where the process does not converge.This allows us to calculate the test
statistics over the all of [0, 1]2 while not knowing the set of convergence. For our functional
to be consistent, We make use of the following result.
Lemma 3.1. (i) Under Assumption 3.1(i), the functional Kn is C-approximable. (ii)
Under Assumption 3.1(ii), In is l∞-approximable.
The Cvm-type test statistics were shown to be consistent under Assumption 3.1(ii) in
Genest et al. (2017), and the proof of Lemma 3.1(i) is given in the Appendix. We then
state a key result similar to Theorem 3 of Genest et al. (2017). However, state this for
l and C approximable functionals. As Cn ∈ C(O), this change makes no difference in
4This is due to the fact that for a dense subsetD of a space A we have that supx∈D f(x) = supx∈A f(x)if f is continuous but not if f is in l∞. There are examples of a dense subset D of a set A and a continuousfunction f such that
∫Df(x)dx 6=
∫Af(x)dx using the Cantor set. Therefore we cannot use In under the
weaker assumption.
14
practice as our test statistic will be continuous. Lemma 3.2 justifies the use of our test
statistic over the all of [0, 1]2.
Lemma 3.2. If C satisfies Assumption 3.1 on some open set O and the functional Γ is
approximable on O then, as n→∞,
Γ(CCn ) ΓO(CC ).
With the convergence taking place in C(O).
This allows us to derive the limiting distribution of our test statistics in Theorem 3.1.
Theorem 3.1. Under Assumption 3.1, and additionally Assumption 3.2(i) for In or
Assumption 3.2(ii) for Kn, we have the following under the null:
In ≤∫
[0,1]
∫[0,1]
max(√n(Dn(u, v)−D(u, v)), 0)2dudv
∫O
max(C (u, v), 0)2dudv
and
Kn ≤ supu,v∈[0,1]
(√n(Dn(u, v)−D(u, v)) sup
u,v∈O(C (u, v)),
where CC is defined as in Lemma 2.2 with the convergence taking place in C(O). Under
the alternative, In and Kn diverge to infinity in probability.
3.2 Critical Values
Because of the complex nature of the asymptotic distribution, an important question is
how to find the critical values for the test statistics as the limiting distribution of the mul-
15
tilinear empirical copula processes. Bootstrap procedures were first introduced in Efron
(1982). In our setting, we use a multiplier bootstrap for this process from Genest et al.
(2017) to calculate the critical values of our test statistics. This bootstrap method was
used to test independence in Genest et al. (2019). However, in the special case of testing
for independence, there is no need to estimate the derivatives. The procedure for the
multiplier bootstrap is outlined as follows.
Procedure 3.1. Multiplier Bootstrap Algorithm
1. Discretize the interval [0,1] into k points u = {ui}ki=1 with uk = 1.
2. Independently of the data, generate random variables ε1b, ..., εnb with mean 0 and
variance 1 and fourth moment at most 3. Then calculate εb = n−1∑n
j=1 εjb for all
ui, uj ∈ u.
3. For all ui, uj ∈ u, calculate
BC ,bn (ui, uj) =
1√n
n∑i=1
(εib − ε)V (X i, ui)V (Y i, uj),
and then calculate
CC ,bn = BC ,b
n (ui, uj)− BC ,bn (ui, uk)∂d1,nC (ui, uj)− BC ,b
n (uk, uj)∂d2,nC (ui, uj).
16
4. Calculate and store
KMn,b = max
ui,uj∈u(CC ,b
n (ui, uj))
IMn,b =1
k2
k∑i=1
k∑j=1
max(CC ,bn (ui, uj), 0)2.
5. Repeat Steps 1 through 4 a large number B of times.
We then compute the p-value for T = In, Kn by pTi,M = B−1∑B
b=1 1(TMnb > Tn). In order
to use this methodology we need to make the following assumption.
Assumption 3.2. (i) The sequence ||BC ,bn || is tight. (ii) For every compact set K ⊂ O,
the derivative estimators ∂i,nC (u, v) must satisfy
||∂i,nC − ∂iC ||Kp→ 0.
The first part of the assumption states that the bootstrap process is tight over the whole
unit square, for the multilinear empirical process this is known to be true under assump-
tion 2.1, however although we expect this to be true for the bootstrap process it is not
yet proved. Under the Assumption 3.2 (i) we give a simple extension of the Theorem 3
in Genest et al (2017) that will be used to prove the consistency of our test.
Lemma 3.4. Under Assumption 3.2 and If C satisfies Assumption 3.1 on some open
set O and the functional Γ is D-approximable on O, then as n→∞,
Γ(CC ,bn ) ΓO(CC ).
17
Next, we state Theorem 3.2, which proves the validity of our tests.
Theorem 3.2. Under Assumptions 3.1 and 3.2 additionally, Let α ∈ (0, 12). Finally,
under Assumption 3.1(i) for Kn, and Assumption 3.1(ii) for In and using the rule that we
reject if pTiK < α for T = In or Kn,we have the following:
limn→∞
P(reject H0) ≤ α If H0 is true
limn→∞
P(reject H0) = 1 If H0 is false
4 Simulation Study
nTest Statistic Kn InMarginal dist. P1 P20 P1G P20G G P1 P20 P1G P20G G
Table 4.1: (Size of PQD test) The table reports the size of the PQD test using the KS-type and Cvm-type statistics when the critical values are obtained from the multiplerbootstrap. The data were generated under the null from the independence copula withvarying marginals. The nominal level is fixed at α = 0.05.
In this section, we display the results of a simulation study using the two test statistics.
First, we generate a sample {xi}ni=1 of size n ∈ {50, 100, 200}. To generate this sample,
we fix the copula to be the Gaussian copula with ρ ∈ {0,−0.21,−0.31,−0.41}. These
are the parameter values used in the simulation study of the PQD test in Scaillet (2005).
18
The case with ρ = 0 corresponds to the independence case (i.e., under the null), whereas
those with a negative value of ρ are under the alternative. Note, however, that as the
marginals are not continuous, this no longer coincides with Kendall’s τ . We use six dif-
ferent combinations of marginals to test a wide range of marginal dependence structures:
1. F = G for Poisson random variables with mean 1. This is a situation with many
ties in the data.
2. F = G for Poisson random variables with mean 20.
3. F = Poisson with mean 1 and G =normal. This is the situation where one marginal
is continuous and the other is discrete.
4. F = Poisson with mean 20 and G =normal.
5. F = G is Gaussian distributed with mean 0 and variance 1. We also see how the
test performs in the continuous setting.
We denote these marginal combinations by P1, P20, P1-G, P20-G, and G respectively.
After generating this sample, we calculate the test statistics In and Kn. Then we boot-
strap the test statistics using the multiplier bootstrap,5 as introduced for the discrete
settings in Genest et al. (2017) and the classical bootstrap with replacement. To calcu-
late the critical values, we use B = 1000 bootstrap replications. Finally, we record the
result of the test and then repeat this process 1000 times.
5We apply the simple derivative estimator
∂di,nC (u, v) =C (u+ 1(i = 1)h, v + 1(i = 2)) + C (u− 1(i = 1)h, v − 1(i = 2))
2h
with h = 1√n, as suggested in Genest et al. (2017)
19
In Table 4.1, we display the simulated size of the tests with data generated from the inde-
pendence copula. The first five columns present the results of the KS-type test statistic.
These show that for all sample sizes and marginal combinations, the multiplier bootstrap
performs well in regard to maintaining the nominal sizes. However, the results for the
KS test imply that the test is slightly conservative. The final five columns show the size
of the Cvm-based test statistic. For all the sample sizes, the test is slightly conservative.
Although there is a range of different marginal structures that affect the limits as in Gen-
est et al. (2019), we find that the size of the test is not very strongly influenced by the
type of marginal distributions examined.
Finally, we examine the power of the test under different marginals in Table 4.2. This
table displays the rejection rates in our study, where we vary both the marginal distribu-
tions and the parameter of the Gaussian copula with ρ ∈ {−0.21,−0.31,−0.41}. In line
with the results for size, we find that for Sn the power is higher at every sample size and
ρ. The results indicate that the CvM–type test statistic is more powerful. To make sure
these results are not sensitive to the use of the normal copula, we repeated the simulations
using the Frank copula which can be found in Appendix B. The Frank copula is a popular
copula that exhibits negative quadrant dependence for some parameter values.
Table 4.2: (Power of PQD test ) The table reports the power of the Positive quadrantdependence test using the KS and CvM-type test statistics, when the critical values areobtained from the multiplier bootstrap. With the data generated from a Gaussian copulawith parameters ρ = {−0.21,−0.31,−0.41} The nominal level is fixed at α = 0.05.
5 Application
5.1 Intergenerational mobility
This section illustrates application of the process to an example from the Intergenerational
mobility literature. This topic is vital to understanding the current allocation of wealth.
In recent work, Chetty et al. (2014) found a variation in the coefficient of the rank–rank
regression of father and son incomes across different states in the United States. One
issue that is discussed is the fact that there were many ties and zeros in the data. The
methodology introduced here is suitable in this setting as this method is robust to the
presence of ties.
We use the data from the Panel Study of Income Dynamics (PSID), which is studied
in Minicozzi (2003) where they show the importance of taking censoring into account.
21
(a) (X ,Y ) (b) (X,Y)
Figure 1: The left panel is a scatter plot of the integrally transformed father and sonincomes while breaking the ties at random. The right panel shows a scatter plot of theraw data.
We focus on two variables: wage of the son at 28 (X) and the predicated wage of the
father (Y ). In previous studies of this topic by Lee et al. (2009), Delgado and Escanciano
(2012), and Seo (2018), their methods required continuous marginal assumptions. How-
ever, there is a large amount of discreteness in the data. The total size of the data set
is n = 616. Nevertheless, there are only 488 unique entries for the son’s income and 477
unique values for the father’s predicted income. It may as first sight seem obvious that we
would expect to see positive dependence. However, as PQD is a global condition if there
was upward mobility in the lower tail of the distribution, due to government intervention
we would not expect to see PQD.
To begin our analysis, we look at Figure 1. The right panel shows the bivariate plot
of the data (X, Y ), while the left panel shows a random sample from the multilinear
22
copula. 6. As can be seen in the figure, the raw data are very skewed by the outliers.
The raw data appear to show a positive relationship, but this is not clear. The left panel
shows the positive dependence more clearly, and it also indicates a large amount of tail
dependence, with groupings in the upper-right and lower-left corners. We calculate the
test statistics (0,−0.0218), with corresponding p-value of (1). This allows us to conclude
that there is very strong evidence of PQD. This would suggests that when do analysis
you should not rule out a Copula the exhibits PQD when modeling this relationship.
6 Conclusion
In this paper, we took steps to developing a nonparametric copula-based test without the
stringent continuous marginal assumptions. Here we have extended the tests for PQD
from Scaillet (2005) and Gijbels et al. (2010). However, according to results discussed
in this paper, the KS-type can be used in future tests in which the multilinear empirical
copula is applied.
There are currently no results on the convergence of the multilinear copula process under
different mixing conditions conditions. When such results become available, it will be easy
to extend this test for PQD without the continuous assumption to the time-series case.
Here we have focused on PQD, which is a two-dimensional concept. However, there is an
extension of the multilinear empirical copula to d dimensions (see Genest et al. (2017)).
This would allow us to test for some higher dimensional objects, as discussed at the end
of Scallet (2005).
6((X ,Y )) was created by integrally transforming the data and breaking the ties at random.
23
References
Bartolucci, F., Forcina, A., & Dardanoni, V. (2001). Positive quadrant depen-
dence and marginal modeling in two-way tables with ordered margins. Journal of the
American Statistical Association, 96(456), 1497–1505.
Bastani, S., & Selin, H. (2014). Bunching and non-bunching at kink points of the
Swedish tax schedule. Journal of Public Economics, 109, 36–49.
Caperaa, P., & Genest, C. (1993). Spearman’s ρ is larger than Kendall’s τ for
positively dependent random variables. Journal of Nonparametric Statistics, 2(2), 183–
194
Chetty, R., Hendren, N., Kline, P., & Saez, E. (2014). Where is the land of
opportunity? The geography of intergenerational mobility in the United States. The
Quarterly Journal of Economics, 129(4), 1553–1623.
Deheuvels, P. (1979). La fonction de dependance empirique et ses proprietes: Un test
non parametrique d’independance. Academie Royale de Belgique, Bulletin de la Classe
des Sciences (5), 65, 274–292.
Delgado, M. A., & Escanciano, J. C. (2012). Distribution-free tests of stochastic
monotonicity. Journal of Econometrics, 170(1), 68–75.
Denuit, M., Dhaene, J., & Ribas, C. (2004). Does positive dependence between
individual risks increase stop-loss premiums?. Insurance: Mathematics and Economics,
28(3), 305–308.
24
Denuit, M., & Scaillet, O. (2004). Nonparametric tests for positive quadrant de-
pendence. Journal of Financial Econometrics, 2(3), 422–450.
Devereux, M. P., Liu, L., & Loretz, S. (2014). The elasticity of corporate taxable
income: New evidence from UK tax records. American Economic Journal: Economic
Policy, 6(2), 19–53.
Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Society for
Industrial and Applied Mathematics.
Faugeras, O. P. (2017). Inference for copula modeling of discrete data: A cautionary
tale and some facts. Dependence Modeling, 5(1), 121–132.
Fermanian, J.-D., Radulovic, D., & Wegkamp, M. (2004). Weak convergence of
Table 4.2: (Power of PQD test) The table reports the power of the positive quadrantdependence test using the KS- and CvM-type test statistics when the critical values areobtained from the multipler bootstrap based. The data were generated generated from aFrank copula. The nominal level is fixed at α = 0.05.
32
C Possible issues when testing for symmetry
It may seem logical that this methodology using the Checkerboard Copula, can be easily
expanded in order to test symmetry. This would be done for instance by plugging in this
empirical checkerboard copula to one of the proposed test statistic of Genest et al (2012),
Tn = supu,v∈[0,1]
√n|(C (u∗, v∗)− C (v∗, u∗))|
A difficulty will appear when we wish to prove that is this test consistent. Specifically that
under the alternative we wish to show the test statistic converges to infinity. Normally we
would take a point (u∗, v∗) such that C (u∗, v∗) 6= C (v∗, u∗). The issue arises if all such
points are not in O. In this paper we use the continuity to show that if we are under the
alternative there must be at least one point in O that is under the alternative condition.
However, in a possible test of symmetry just because for one point C (u∗, v∗) 6= C (v∗, u∗),
it dose not imply (using continuity) that there must be a point such that C (a∗, b∗) 6=
C (b∗, a∗) and (b∗, a∗) ∈ O. So although the test statistic may be bounded bellow by
√n(C (u∗, v∗)) it is not clear if this diverges to infinity as (u∗, v∗) may not be in O. We
are not arguing that it is impossible to construct a test for symmetry using this method,
just that there are some complications that need to be considered when compared with