Nonparametric Bayesian Estimation of a Bivariate Copula Using the Jeffreys Prior

arX

iv:0

908.

2372

v3 [

stat

.ME

] 3

Jul

201

2

Bernoulli 18(2), 2012, 496–519DOI: 10.3150/10-BEJ345

Bayesian estimation of a bivariate copula

using the Jeffreys prior

SIMON GUILLOTTE1 and FRANCOIS PERRON2

1Department of Mathematics and Statistics, University of Prince Edward Island, 550 UniversityAve., Charlottetown, Prince Edward Island, Canada C1A 4P3.E-mail: [email protected] de Mathematiques et Statistique de l’Universite de Montreal, Pavillon Andre-Aisenstadt, 2920 Chemin de la Tour, Montreal, Quebec, Canada H3C 3J7.E-mail: [email protected]

A bivariate distribution with continuous margins can be uniquely decomposed via a copulaand its marginal distributions. We consider the problem of estimating the copula function andadopt a Bayesian approach. On the space of copula functions, we construct a finite-dimensionalapproximation subspace that is parametrized by a doubly stochastic matrix. A major problemhere is the selection of a prior distribution on the space of doubly stochastic matrices also knownas the Birkhoff polytope. The main contributions of this paper are the derivation of a simpleformula for the Jeffreys prior and showing that it is proper. It is known in the literature thatfor a complex problem like the one treated here, the above results are difficult to obtain. TheBayes estimator resulting from the Jeffreys prior is then evaluated numerically via Markov chainMonte Carlo methodology. A rather extensive simulation experiment is carried out. In manycases, the results favour the Bayes estimator over frequentist estimators such as the standardkernel estimator and Deheuvels’ estimator in terms of mean integrated squared error.

Keywords: Birkhoff polytope; copula; doubly stochastic matrices; finite mixtures; Jeffreys prior;Markov chain Monte Carlo; Metropolis-within-Gibbs sampling; nonparametric; objective Bayes

1. Introduction

Copulas have received considerable attention recently because of their increasing use inmultiple fields such as environmental studies, genetics and data networks. They are alsocurrently very popular in quantitative finance and insurance; see Genest et al. [10]. Sinceit is precisely the copula that describes the dependence structure among various randomquantities, estimating a copula is part of many techniques employed in these fields. Forinstance, in risk measurement, the value at risk (VaR) is computed by simulating asset logreturns from a fitted joint distribution, for which the dependence structure between theassets is modelled by a copula. Further financial examples in which copulas are estimated

This is an electronic reprint of the original article published by the ISI/BS in Bernoulli,2012, Vol. 18, No. 2, 496–519. This reprint differs from the original in pagination andtypographic detail.

1350-7265 c© 2012 ISI/BS

http://arxiv.org/abs/0908.2372v3

http://isi.cbs.nl/bernoulli/

http://dx.doi.org/10.3150/10-BEJ345

mailto:[email protected]

mailto:[email protected]

http://isi.cbs.nl/BS/bshome.htm

http://isi.cbs.nl/bernoulli/

http://dx.doi.org/10.3150/10-BEJ345

2 S. Guillotte and F. Perron

are provided in Embrechts et al. [7] and the books written by Cherubini et al. [4], McNeilet al. [19] and Trivedi and Zimmer [29]. In this paper, we provide new generic methodologyfor estimating copulas within a Bayesian framework.Let us first recall that a bivariate copula C is a cumulative distribution function on

S = [0,1]× [0,1] with uniform margins. In this paper, we denote the space of all copulasby C . Every C ∈ C is Lipschitz continuous, with a common Lipschitz constant equal toone:

|C(u1, v1)−C(u2, v2)| ≤ |u1 − u2|+ |v1 − v2| for all (ui, vi) ∈ S, i ∈ {1,2}. (1)

The space C is bounded above and below by the so-called Frechet–Hoeffding copulas,that is, for every C ∈ C ,

max(0, u+ v − 1)≤C(u, v)≤min(u, v) for all (u, v) ∈ S.

Sklar’s theorem states that a bivariate cumulative distribution function F is completelycharacterized by its marginal cumulative distribution functions FX , FY and its copula C.More precisely, we have the representation

F (x, y) =C(FX(x), FY (y)) for all (x, y) ∈R2, (2)

where C is well defined on Ran(FX)×Ran(FY ); see Nelsen [22]. In particular, the copulais unique if FX and FY are continuous and, in this case, we have the following expressionfor the copula:

C(u, v) = F (F−1X (u), F−1

Y (v)) for all (u, v) ∈ S. (3)

Let {(xi, yi), i= 1, . . . , n} be a sample where every (xi, yi) is a realization of the randomcouple (Xi, Yi), i = 1, . . . , n, with joint cumulative distribution function F , and contin-uous marginal cumulative distribution functions FX and FY . We consider the problemof estimating the unknown copula C of F by a copula C, where C depends on thesample. In this problem, the individual marginal distributions are treated as nuisanceparameters. The literature presents three generic approaches for estimating C, namelythe fully parametric, the semi-parametric and the nonparametric approaches. Below, webriefly describe each approach and focus on two nonparametric estimators, since we willsubsequently compare our estimator with these.The fully parametric approach. In this framework, parametric models are assumed for

both the marginal distribution functions FX and FY and for the copula C. See Joe [15];Cherubini et al. [4], Joe [16], and, in a Bayesian setup, Silva and Lopes [27].The semi-parametric approach. Here, a parametric model is assumed only for the copula

function C, not for the margins. In this setup, Genest et al. [11] have proposed the use ofrescaled empirical distribution functions, such as the estimates FX and FY and a pseudo-likelihood estimator for C. The authors show that the resulting estimator is consistentand asymptotically normal. In Kim et al. [17], comparisons are made between the fullyparametric approach and the semi-parametric approach proposed by Genest et al. [11].

Bayesian estimation of a bivariate copula 3

More recently, in a Bayesian setup, Hoff [14] proposes a general estimation procedure,via a likelihood based on ranks, that does not depend on any parameters describing themarginal distributions. The latter methodology can accommodate both continuous anddiscrete data.The nonparametric approach. This approach exploits equation (3). Here, we describe

Deheuvels’ estimator and the kernel estimator. Let F be the empirical cumulative dis-tribution function, and let F−1

X and F−1Y be the generalized inverses of its marginal

cumulative distribution functions. Any copula C ∈ C is said to satisfy the Deheuvelsconstraint associated with F provided that for all i, j = 1, . . . , n,

C(i/n, j/n) = F (F−1X (i/n), F−1

Y (j/n))

= (1/n)

n∑

k=1

1(rank(xk)≤ i, rank(yk)≤ j) (Deheuvels’ constraint).

In Deheuvels [5], the asymptotic behaviour of the class of copulas CF ⊂ C satisfying the

Deheuvels constraint associated with F is described. Note that, in the literature, theso-called empirical copula Cemp(u, v) = F (F−1

X (u), F−1Y (v)), for all (u, v) ∈ S, is a func-

tion that satisfies the Deheuvels constraint and is often used as an estimator for C eventhough it is not a genuine copula. In Lemma 3, we propose an estimator CDEH that sat-isfies Deheuvels’ constraint and which, unlike Cemp, is itself a copula so that CDEH ∈ CF .This estimator, which we call Deheuvels’ estimator henceforth, is based on ranks. Onenice property of rank-based estimators is their invariance under strictly increasing trans-formations of the margins. Therefore, if ϕ and ψ are two strictly increasing functions,then the Deheuvels estimator based on the original sample and the one based on the sam-ple {(ϕ(xi), ψ(yi)): i= 1, . . . , n} are identical. This is a desirable property for a copulaestimator since it is inherent to copulas themselves.Moreover, if F is a smooth kernel estimator of F (FX and FY are continuous say),

then

C(u, v) = F (F−1X (u), F−1

Y (v)) for all (u, v) ∈ S, (4)

is called a kernel estimator for C, and we have C ∈ C . Asymptotic properties of suchestimators are discussed in Fermanian and Scaillet [8], and the reader is referred toCharpentier et al. [3] for a recent review. In particular, the so-called Gaussian kernelestimator is given by (4) using

F (x, y) = (1/n)

n∑

i=1

Φ

(x− xih

)Φ

(y− yih

)for all x, y,

where Φ denotes the standard univariate Gaussian cumulative distribution function andh > 0 is the value of the bandwidth.Both of the nonparametric estimators discussed above have good asymptotic proper-

ties. On the other hand, they may not be optimal for small sample sizes. This could bean inconvenience when working with small samples, and we think practitioners should


be aware of this. We illustrate some of these situations by a simulation study in Sec-tion 5.Our aim is to develop a Bayesian alternative for the estimation of C that circumvents

this problem. Following Genest et al. [11], when the marginal distributions are unknown,we use rescaled empirical distribution functions as their estimates. In view of this, ourmethodology can be called empirical Bayes. When the marginal distributions are known,the sample {(xi, yi), i = 1, . . . , n} is replaced by {(FX(xi), FY (yi)), i = 1, . . . , n}, whichis a sample from the uniform distribution U (0,1). In this case, our procedure is purelyBayesian. In both cases, our estimator has the property of being invariant under monotonetransformations of the margins, just like Deheuvels’ estimator.Our model is obtained as follows. First, in Section 2 we construct an approxima-

tion subspace C ∗ ⊂ C . This is achieved by considering the sup-norm ‖ · ‖∞ and settinga precision ε > 0 so that for every copula C ∈ C there exists a copula C∗ ∈ C ∗ suchthat ‖C∗ −C‖∞ ≤ ε. Moreover, C ∗ is finite dimensional; it is parametrized by a doublystochastic matrix P . Then, our estimator C is obtained by concentrating a prior on C ∗

and computing the posterior mean, that is, the Bayes estimator under squared errorloss. Now two problems arise, the first one is the prior selection on C ∗ and the secondone concerns the numerical evaluation of the Bayes estimator. These are the topics ofSections 3 and 4, respectively. While the problem of evaluating the Bayes estimator issolved using a Metropolis-within-Gibbs algorithm, the choice of the prior distributionis a much more delicate problem. A copula from our model can be written as a finitemixture of cumulative distribution functions. The mixing weights form a matrix W thatis proportional to a doubly stochastic matrix. Therefore, specifying a prior on C ∗ boilsdown to specifying a prior for the mixing weights. We assume that we do not have anyinformation that we could use for the construction of a subjective prior. It is not ourintention to obtain a Bayes estimator better than some other given estimator. For thesereasons we shall rely on an objective prior, and a natural candidate is the Jeffreys prior.The main contributions of our paper are the derivation of a simple expression for theJeffreys prior and showing that it is proper. The fact that these results are generallydifficult to come up with, for finite mixture problems, has been raised before in the lit-erature; see Titterington et al. [28] and Bernardo and Giron [2]. Moreover, here we facethe additional difficulty that the mixing weights are further constrained, since their sumis fixed along the rows and the columns of W . To the best of our knowledge, nothing hasyet been published on this problem. In Section 5, we report the results of an extensivesimulation study in which we compare our estimator with Deheuvels’ estimator and theGaussian kernel estimator. Finally, a discussion is provided in Section 6 to conclude thepaper.

2. The model for the copula function

For every m> 1, we construct a finite-dimensional approximation subspace Cm ⊂ C . Theconstruction of Cm uses a basis that forms a partition of unity. A partition of unity isa set of nonnegative functions g = {gi}mi=1, such that mgi is a probability density function


on [0,1] for all i= 1, . . . ,m, and

m∑

i=1

gi(u) = 1 for all u∈ [0,1].

Particular examples are given by indicator functions

{g1 = 1[0,1/m],gi = 1((i−1)/m,i/m], i= 2, . . . ,m, (5)

and Bernstein polynomials

gi =Bm−1i−1 , i= 1, . . . ,m, (6)

where

Bmi (u) =

(m

i

)ui(1− u)m−i for all u ∈ [0,1].

See Li et al. [18] for more examples of partitions of unity. In the following, let G =(G1, . . . ,Gm)⊤, where Gi(u) =

∫ u

0 gi(t) dt, for all u ∈ [0,1], i= 1, . . . ,m, and let

C∗P (u, v) =mG(u)⊤PG(v) for all (u, v) ∈ S, (7)

where P is an m×m doubly stochastic matrix. The following lemma is straightforwardto prove.

Lemma 1. For every doubly stochastic matrix P , C∗P is an absolutely continuous copula.

For a fixed partition of unity, we now define the approximation space as

Cm = {C∗P :P is an m×m doubly stochastic matrix}.

The approximation order of Cm is now discussed. It depends on the choice of the basis G.Let Gm = {(i/m, j/m): i, j = 1, . . . ,m} be a uniformly spaced grid on the unit square S.For a given copula C, let RC = (C(i/m, j/m))mi,j=1 be the restriction of C on Gm. Let

D=

1 0 0 0 · · · 0−1 1 0 0 · · · 00 −1 1 0 · · · 0

0 0 −1 1 · · ·...

......

......

. . . 00 0 0 0 −1 1

.

Then PC =mDRCD⊤ is a doubly stochastic matrix. Upper bounds for ‖C∗

PC−C‖∞ are

given in the following lemma.


Lemma 2. Let C be a copula and let C∗ =C∗PC

∈ Cm, where C∗PC

is obtained by (7).

(a) For a model using indicator functions basis (5), we have RC∗ = RC and ‖C∗ −C‖∞ ≤ 2/m.

(b) For a model using the Bernstein basis (6), we have ‖C∗ −C‖∞ ≤ 1/√m.

Proof. (a) A direct evaluation shows that RC∗ =RC . From the Lipschitz condition (1),if two copulas C1 and C2 satisfy the constraint RC1

=RC2, then ‖C1 −C2‖∞ ≤ 2/m.

(b) First, it is well known thatmG⊤D = (Bm1 , . . . ,B

mm). For any (u, v) ∈ S, consider two

independent random variables, X and Y , where X follows a Binomial(m,u) distributionand Y follows a Binomial(m,v) distribution. We have

C∗(u, v) = Eu,v[C(X/m,Y/m)].

Therefore,

sup(u,v)∈S

|C∗(u, v)−C(u, v)| = sup(u,v)∈S

|Eu,v[C(X/m,Y/m)−C(u, v)]|

≤ sup(u,v)∈S

Eu,v[|C(X/m,Y/m)−C(u, v)|]

≤ sup(u,v)∈S

Eu,v[|X/m− u|+ |Y/m− v|]

= (2/m) supu∈[0,1]

Eu[|X −mu|].

In Lemma 6 of the Appendix, we give the exact value of supu∈[0,1]Eu[|X−mu|]. However,a simple expression for an upper bound is given by Holder’s inequality

supu∈[0,1]

2Eu[|X −mu|]/m≤ supu∈[0,1]

2√Varu[X ]/m

= 1/√m. �

Bernstein copulas have appeared in the past literature and their properties have beenextensively studied in Sancetta and Satchell [24] and Sancetta and Satchell [25]. However,in view of Lemma 2 and of the simplicity of indicator functions, we subsequently use theindicator functions basis given in (5) for G in our model C∗

P given by equation (7), and Cm

is the family of copulas generated by this model. Since PC∗

P= P for any doubly stochastic

matrix P , our model is rich in the sense that we have {PC : C ∈ C }= {PC∗ : C∗ ∈ Cm},which is the set of doubly stochastic matrices.Notice that in a data reduction perspective, if {(uk, vk), k = 1, . . . , n} is a sample from

our model C∗P , and if g = (g1, . . . , gm)⊤ represents the indicator functions in (5), then

{g(uk)g(vk)⊤, k = 1, . . . , n} is a sample from the multinomial(1,m−1P ) distribution. Asin a multinomial experiment with probabilities given by m−1P , the vector (nij) of cellcount statistics nij =

∑nk=1 gi(uk)gj(vk), i, j = 1, . . . ,m, follows a multinomial(n,m−1P )

distribution and is sufficient for P .


The following lemma is used to define what we call Deheuvels’ estimator. The estimatorcorresponds to the so-called bilinear extension of the empirical copula and has beenconsidered by Deheuvels [6], Nelsen [22], Lemma 2.3.5, Genest and Neslehova [12] andNeslehova [23], Section 5.

Lemma 3. Let {(xi, yi): i= 1, . . . , n} be a sample, and let R= (rij) be the n×n matrixgiven by

rij = (1/n)

n∑

k=1

1(rank(xk)≤ i, rank(yk)≤ j) for i, j = 1, . . . , n.

If we use the indicator basis (5) with m= n for G, then the copula

CDEH = n2G⊤DRD⊤G (Deheuvels’ estimator)

satisfies Deheuvels’ constraint.

3. The prior distribution

The choice of a prior concentrated on the approximation space is delicate. The priordistribution is specified on B, the set of doubly stochastic matrices of order m, m> 1.Here, we adopt an objective point of view and derive the Jeffreys prior. We also discusstwo representations of doubly stochastic matrices that can be useful for the specificationof other prior distributions on B.The set B is a convex polytope of dimension (m− 1)2. It is known in the literature as

the Birkhoff polytope and has been the object of much research. For instance, computingthe exact value of its volume is an outstanding problem in mathematics; it is known onlyfor m≤ 10 (see Beck and Pixton [1]).The Fisher information matrix is obtained as follows. For m> 1, let P ∈ B, and let

W = (1/m)B. The copula (7) is a mixture of m2 bivariate distribution functions

C∗P (u, v) =mG(u)⊤PG(v)

=H(u)⊤WH(v)

=m∑

i=1

m∑

j=1

wijHi(u)Hj(v),

whereW = (1/m)P ∈W , andHi(u) =∫ u

0 hi(t) dt, for all u ∈ [0,1], with hi(·) =mgi(·), i=1, . . . ,m. The probability density function c∗P of the copula is thus

c∗P (u, v) =m∑

i=1

m∑

j=1

wijhi(u)hj(v)

= 1+

m∑

i=1

m∑

j=1

(wij − 1/m2)hi(u)hj(v)


= 1+

m−1∑

i=1

m−1∑

j=1

(wij − 1/m2)(hi(u)− hm(u))(hj(v)− hm(v)).

The last equality expresses the fact that there are (m− 1)2 free parameters in the model.Recall that we are considering the indicator functions basis (5) in our model. It followsthat for all i1, j1, i2, j2 = 1, . . . ,m− 1,

E

[−∂2 log c∗P (u, v)∂wi1j1 ∂wi2j2

]=

∫ 1

0

∫ 1

0

(hi1(u)hi2(u) + h2m(u))(hj1 (v)hj2(v) + h2m(v))

c∗P (u, v)dudv

=

1/wi1j1 + 1/wi1m +1/wmj1 + 1/wmm, if i1 = i2, j1 = j2,1/wi1m + 1/wmm, if i1 = i2, j1 6= j2,1/wmj1 + 1/wmm, if i1 6= i2, j1 = j2,1/wmm, if i1 6= i2, j1 6= j2.

Although the information matrix is of order (m − 1)2 × (m − 1)2, the following resultshows how to reduce the computation of its determinant to that of a matrix of order(m− 1)× (m− 1). The important reduction provided by (8) is greatly appreciated whenrunning an MCMC algorithm, which computes the determinant at every iteration. Mostimportant, this expression enables us to derive the main result of this paper, that is,Theorem 1. The proofs of these two results are quite technical, so we have put them inthe Appendix.

Lemma 4. The Fisher information for W = (wij)i,j=1,...,m ∈ W is given by

I(W ) =det((1/m)I −mV ⊤V )

mm det (D0) det (D1), (8)

where

V = (wij)i=1,...,m;j=1,...,m−1,

D0 = diag(w11, . . . ,w1(m−1), . . . ,w(m−1)1, . . . ,w(m−1)(m−1))

and

D1 = diag(wmm,w1m, . . . ,w(m−1)m,wm1, . . . ,wm(m−1)).

Theorem 1. The Jeffreys prior π ∝ I1/2 is proper.

Now, in order to specify different priors, we can consider the two following represen-tations.The Hilbert space representation. Let B0 = {P − (1/m)11⊤: P ∈ B} and V =

Span(B0). Consider the Frobenius inner product 〈V1, V2〉 = tr(V1V⊤2 ) on V . Thus,

V is an (m − 1)2-dimensional Hilbert space and an orthonormal basis is given by


{viv⊤j }i,j=1,...,m−1, with

vi =1√

i(i+ 1)(1, . . . ,1︸︷︷︸

i

,−i,0, . . . ,0)⊤, i= 1, . . . ,m− 1.

For every P ∈ B, there exists a unique (m− 1)× (m− 1) matrix A such that

P =m−111

⊤ +GAG⊤, (9)

where G is the m× (m− 1) matrix given by G= (v1, v2, . . . , vm−1). In this representationA = G⊤PG. Therefore, if we let B′ = G⊤BG, then we have a bijection between B

and B′. The set B

′ is a bounded convex subset of R(m−1)2 with positive Lebesgue

measure. From this, priors on B can be induced by priors on B′, and later on, we shallrefer to the uniform prior on the polytope B as the uniform distribution on B′. The aboverepresentation is also particularly useful to construct a Gibbs sampler for distributionson the polytope.The Birkhoff–von Neumann representation. Another decomposition is obtained by

making use of the Birkhoff–von Neumann theorem. Doubly stochastic matrices can bedecomposed via convex combinations of permutation matrices. In fact, B is the convexhull of the permutation matrices and these are precisely the extreme points (or vertices)of B. Furthermore, every m×m doubly stochastic matrix P is a convex combination of,at most, k = (m−1)2+1 permutation matrices; see Mirsky [21]. In other words, if {σi}m!

i=1

is the set of permutation matrices and if P ∈ B, then there exists 1≤ i1 < · · ·< ik ≤m!such that P =

∑kj=1 λijσij , for some weight vector (λi1 , . . . , λik) lying in the (k − 1)-

dimensional simplex Λk = {(λ1, . . . , λk): 0≤ λj , for all j and∑k

j=1 λj = 1}. A prior dis-tribution over the polytope can be selected using a discrete distribution over the set{1 ≤ i1 < · · · < ik ≤ m!} and a continuous distribution over the simplex Λk, such asa Dirichlet distribution. See Melilli and Petris [20] for work in this direction.

4. The MCMC algorithm

Let {(xi, yi), i= 1, . . . , n} be a sample, where each (xi, yi) is a realization of the randomcouple (Xi, Yi), i = 1, . . . , n, with dependence structure given by a copula C, and withcontinuous marginal distributions FX and FY . If the marginal distributions are known,then the transformed observations xi = FX(xi) and yi = FY (yi), i = 1, . . . , n, are bothsamples from a uniform distribution on (0,1). If the marginal distributions are unknown,then we follow Genest et al. [11] and consider the pseudo-observations xi = (n/(n +1))FX(xi) and yi = (n/(n+ 1))FY (yi), i= 1, . . . , n, where FX and FY are the empiricaldistributions. The algorithm below describes the transition kernel for the Markov chainused to numerically evaluate the Bayesian estimator C associated to the Jeffreys prior π.The type of algorithm is called Metropolis-within-Gibbs; see Gamerman and Lopes [9].An individual estimate is approximated by the sampling mean of the chain.


Let T ≥ 1 be the length of the chain, and at each iteration t, 1 ≤ t ≤ T , let Pt be thecurrent doubly stochastic matrix. From representation (9) in the previous section with A=(akl)k,l=1,...,m−1,

Pt − (1/m)11⊤ =

m−1∑

k=1

m−1∑

l=1

aklvkv⊤l .

Repeat for i, j = 1, . . . ,m− 1:

1. Select direction viv⊤j and compute the interval Iij ⊂R as follows:

1.1 For every p, q = 1, . . . ,m, find the largest interval I(p,q)ij such that

εijv(p)i v

(q)j ≥−1/m−

m−1∑

k=1

m−1∑

l=1

aklv(p)k v

(q)l for all εij ∈ I(p,q)ij .

1.2 Take Γij =⋂

p,q Γ(p,q)ij .

2. Draw εij from the uniform distribution on Iij , and set a′ij = aij + εij and a′kl = akl,for every k 6= i, l 6= j. The proposed doubly stochastic matrix is given by

P propt = (1/m)11⊤ +

m−1∑

k=1

m−1∑

l=1

a′klvkv⊤l .

3. Accept Pt+1 = P propt with probability

α(Pt, Ppropt ) =min

{1,π(P prop

t )L(P propt | x, y)

π(Pt)L(Pt | x, y)

}, (10)

where L(· | x, y) is the likelihood derived from expression (7).

Note that the above algorithm could also be used with any prior specified via theHilbert space representation described in the previous section, including the uniformprior on the polytope B described in the previous section. In particular, it could beadapted to draw random doubly stochastic matrices according to such priors by replacingthe acceptance probability (10) with

α(Pt, Ppropt ) =min

{1,π(P prop

t )

π(Pt)

}.

In order to further describe the Jeffreys prior, we use the algorithm to approximate theprobability of the largest ball contained in B with respect to the Euclidean distanceon B. This distance may be computed using the Frobenius inner product described inthe previous section. The largest ball has radius 1/(m − 1), where m > 1 is the sizeof the doubly stochastic matrix. Although this probability can be obtained exactly for


(a) (b)

Figure 1. Convergence of 1000 parallel MCMC runs for the probability of the largest ballcontained in the polytope B with m= 4. Shaded region represents the range of the entire setof approximations at each iteration. Figure (a) is the convergence for the probability in thecase of the uniform distribution. The flat line, in this case, corresponds to the true probabilityp≈ 0.0027. Figure (b) is the same for the Jeffreys prior.

the uniform distribution, we nevertheless approximate it using our algorithm, meanwhileproviding some validation of the MCMC algorithm. Figure 1 shows the results we get form= 4.Notice that this probability is much smaller for the Jeffreys prior, because it distributes

more mass towards the extremities of the polytope than the uniform prior does. This mayalso be observed by plotting the density estimates of the radius of the doubly stochasticmatrix, that is, the Euclidean distance of the doubly stochastic matrix from the centerof the polytope B. These are shown in Figure 2.

5. Simulation experiments

The goal of the experiment is to study the performance of our estimator on artificialdata sets generated from various bivariate distributions. We provide evidence that theestimators derived from our model give good results in general, and most important, thatthe Jeffreys prior is a reasonable choice.Six parametric families of copulas are considered:

1. Clayton family: Cθ(u, v) = {max(0, u−θ + v−θ − 1)}−1/θ, θ ≥−1, θ 6= 0,2. Gumbel family: Cθ(u, v) = exp[−{(− logu)θ + (− logv)θ}1/θ], θ ≥ 1,

3. Frank family: Cθ(u, v) =− 1θ log{1+

(e−θu−1)(e−θv−1)e−θ−1

}, θ 6= 0,

4. Gaussian family: Cθ(u, v) = Φθ(Φ−1(u),Φ−1(v)), |θ| ≤ 1, where Φθ is the stan-

dard bivariate Gaussian cumulative distribution function with correlation coefficient θ,


(a) (b)

(c) (d)

Figure 2. Plots of samples and density estimates of the radius (Euclidean distance of thedoubly stochastic matrix from the center of the polytope B), on the interval [0, q95], where q95is the 95th quantile of its distribution. Figures (a) and (b) are results when sampling from theuniform prior and figures (c) and (d) are those of the Jeffreys prior. Here m= 4.

and Φ−1 is the inverse of the univariate standard normal cumulative distribution func-tion.

5. Gaussian cross family: C×θ (u, v) = 1/2(Cθ(u, v) − Cθ(u,1 − v) + u), |θ| ≤ 1,

where Cθ belongs to the Gaussian family.6. Gaussian diamond family:

C⋄θ (u, v) =

{C×

θ (u+1/2, v)−C×θ (1/2, v), if u≤ 1/2,

C×θ (u− 1/2, v) + v−C×

θ (1/2, v), if u > 1/2,

where |θ| ≤ 1 and C×θ belongs to the Gaussian cross family.


(a) (b)

Figure 3. Densities of the Gaussian cross copula and the Gaussian diamond copula with θ = 0.5.

For the Clayton, Frank and Gaussian families, values of the parameter away from 0 in-dicate departure from independence, while a parameter away from 1 indicates departurefrom independence for the Gumbel family. These four families are among the popularones in the literature, but the last two families are not, and so we now describe themin more detail. The Gaussian cross family is obtained by the following: Let (U,V ) bea random vector with uniform margins and with the Gaussian copula Cθ as its joint dis-tribution. Let W be an independent uniformly distributed random variable, and considerthe random vector

(U×θ , V

×θ ) = (U,V )1(W ≤ 1/2)+ (U,1− V )1(W > 1/2).

The distribution of (U×θ , V

×θ ) is given by the Gaussian cross copula. Here, the superscript

× is to highlight the “cross-like” dependence structure; see Figure 3(a) for a plot of itsdensity when θ = 0.5. The Gaussian diamond family corresponds to the distributions ofthe random vectors

(U⋄θ , V

⋄θ ) = (U×

θ + 1/2(mod 1), V ×θ ) for each |θ| ≤ 1.

See Figure 3(b) for an illustration of its density when θ= 0.5.An extensive simulation experiment is carried out in two parts. In the first part, we con-

sider the case of known marginal distributions and use bivariate data sampled from thecopula families above. For each family, we consider 11 models corresponding to equallyspaced parameter values in some interval. For the first four families, the interval is deter-mined so that the Kendall’s τ values associated to the particular models range between 0and 2/3; see the simulation in Silva and Lopes [27]. Kendall’s τ associated with a copula C


is the dependence measure defined by

τ = 4

∫ 1

0

∫ 1

0

C(u, v) dC(u, v)− 1.

The values of Kendall’s τ for the first four families are respectively given by τ = θ/(θ+ 2),τ = 1 − 1/θ, τ = 1 − (4/θ)[1 − D1(θ)], where D1 is the Debye function and τ =(2/π) arcsin(θ). For families 5 and 6, we consider the models corresponding to 11 valuesof θ ranging between 0 and 1. In the second part of the experiment, we simulate an un-known margins situation. We focus on families 4, 5 and 6 and consider 11 equally spacedvalues of θ ranging between 0 and 1 for the copula models. Here, a Student t with sevendegrees of freedom and a chi-square with four degrees of freedom are considered as thefirst and second margins, respectively.In the experiment, 1000 samples of both sizes n= 30 and n= 100 are generated from

each model. For every data set, the copula function is estimated using five estimators.The first two are the Bayes estimators associated to the Jeffreys and the uniform priors,respectively. For the uniform prior, we mean the uniform distribution on B′ defined inSection 3, we use the bijection B =m−1

11⊤+GB′G⊤ given by expression (9). The third

estimator is the maximum likelihood estimator (MLE) from our model C∗P, where P

maximizes the likelihood derived from expression (7). This estimator is evaluated nu-merically. For the above three estimators, we take m = 6 as the order of the doublystochastic matrix in our model. Finally, we consider the two frequentist estimators, thatis, Deheuvels’ estimator given in Theorem 3 and the Gaussian kernel estimator describedin the Introduction. Values of the bandwidth for the latter estimator are based on thecommonly used rule of thumb: h= sin

−1/5, where si, i= 1,2, is the sample standard de-viation of the ith margin; see Fermanian and Scaillet [8] and Sheather [26]. Figures 4, 5and 6 report the values of the mean integrated squared errors,

MISE(C) = E

[∫ 1

0

∫ 1

0

(C(u, v)−C(u, v))2dudv

],

for the five estimators as a function of the parameter θ.As the results indicate, the Bayesian approach outperforms Deheuvels’ estimator and

the kernel estimator near independence for the Clayton, Gumbel, Frank and Gaussianfamilies. Unfortunately, this is not necessarily the case when the value of the parameterincreases, that is, when the true copula approaches the Frechet–Hoeffding upper bound,also called the comonotone copula, corresponding to (almost sure) perfect positive lineardependence. For families 5 and 6, the Bayes estimators outperform both frequentist esti-mators when the sample size is small (n= 30). One remarkable feature that appears whencomparing the results obtained in the known margins case with the results obtained inthe unknown margins case is the decrease in performance of the kernel estimator. Recallthat the latter estimator is the only one for which the invariance property mentioned inthe Introduction does not hold. The other estimators seem to behave similarly when com-paring their resulting MISE in the known margins case with their MISE in the unknownmargins case. Notice the resemblance in shape of the MISE for Deheuvels’ estimator andthe kernel estimator in the unknown margins cases. Finally, the performance of the MLE


(a) Family 1 (b) Family 2

(c) Family 3 (d) Family 4

(e) Family 5 (f) Family 6

Figure 4. Plots of MISE against θ in the known margins case. The MISE is approximatedusing 1000 samples of size n= 30. Thick solid line is the MISE of the Bayes estimator using theJeffreys prior, dashed line is that of the Bayes estimator using the uniform prior, dashed–dottedline is the MISE of the MLE, while dotted and thin solid line is the MISE of Deheuvels’ andthe Gaussian kernel estimators, respectively.


(a) Family 1 (b) Family 2

(c) Family 3 (d) Family 4

(e) Family 5 (f) Family 6

Figure 5. Plots of MISE against θ in the known margins case. The MISE is approximatedusing 1000 samples of size n= 100. Thick solid line is the MISE of the Bayes estimator using theJeffreys prior, dashed line is that of the Bayes estimator using the uniform prior, dashed–dottedline is the MISE of the MLE, while dotted and thin solid line is the MISE of Deheuvels’ andthe Gaussian kernel estimator, respectively.


(a) Model 4, n= 30 (b) Model 4, n= 100

(c) Model 5, n= 30 (d) Model 5, n= 100

(e) Model 6, n= 30 (f) Model 6, n= 100

Figure 6. Plots of MISE against θ in the unknown margins case. The MISE is approximatedusing 1000 samples each of sizes n= 30 and n= 100. Thick solid line is the MISE of the Bayesestimator using the Jeffreys prior, dashed line is that of the Bayes estimator using the uniformprior, dashed–dotted line is the MISE of the MLE, while dotted and thin solid line is the MISEof Deheuvels’ and the Gaussian kernel estimators, respectively.


is worth mentioning, since in many cases it has the smallest MISE, especially for largevalues of θ. This is because the MLE will go on the boundary of the parameter spaceeasily, while the Bayes estimator will always stay away from the boundary with the typesof priors that we have selected. However, if such an extreme case is to happen in a reallife problem, it is probable that the practitioner has some insight on the phenomenonbeforehand, and may choose to work with a more appropriate (subjective) prior.

6. Discussion

Two points need to be further discussed. First, our methodology is purely Bayesian onlywhen the marginal distributions are known. When these are unknown, our methodology isempirical Bayes. In fact, in this case we propose a two-step procedure by first estimatingthe marginsvia the empirical marginal distributions and then plugging them in as thetrue distributions thereafter. We have chosen to do this because it is common practice todo so (see Genest et al. [11]), it is simple to implement, it is robust against outliers andour estimator is consequently invariant under increasing transformations of the margins.One way to propose a purely Bayesian estimator by using our model for the copula is touse finite mixtures for the margins. This way, if the densities used in the latter mixtureshave disjoint supports, then the Jeffreys prior for the mixing weights has a simple formand is proper; see Bernardo and Giron [2]. Now by selecting independent Jeffreys priorsfor the margins and for the copula, the resulting prior is proper as well.Finally, our models given by the approximation spaces Cm, m> 1, are called sieves

by some authors; see Grenander [13]. In the present paper, we have chosen to work witha fixed sieve, so this makes our model finite dimensional. In this case, the methodologyfalls in the semi-parametric approach described in the Introduction. Here, the rathersubjective choice of the sieve to work with can be viewed as a weakness of the proposedmethodology. On the other hand, by using the entire set of sieves, we can constructa nonparametric model for the copula that can, in some sense, respect the infinite-dimensional nature of the copula functions. In fact, if we take C =

⋃m>1 Cm, then C

is dense in the space of copulas. Our Bayesian methodology can be easily adapted here.This can be achieved by selecting an infinite support prior for the model index m andusing our methodology inside each model. The Bayesian estimator becomes an infinitemixture of the estimators proposed in this paper (one for each model m), where themixing weights are given by the posterior probabilities of the models.

Appendix

Proof of Lemma 4. Here we show how to compute I(W ) = det(E[−∂2 log c∗P (u,v)

∂W 2 ]) effi-

ciently. First, notice that A = E[−∂2 log c∗P (u,v)

∂W 2 ] can be written as A =D−10 + CD−1

1 C⊤,where

D0 = diag(w11, . . . ,w1(m−1), . . . ,w(m−1)1, . . . ,w(m−1)(m−1)),

D1 = diag(wmm,w1m, . . . ,w(m−1)m,wm1, . . . ,wm(m−1))


and

C(m−1)2×(2m−1) =

1m−1 1m−1 01m−1 · · · Im−1

1m−1 01m−1 1m−1 · · · Im−1

1m−1 01m−1 01m−1 · · · Im−1

......

......

...1m−1 01m−1 01m−1 · · · Im−1

.

Thus, detA = det(D1 + C⊤D0C)/(detD0 detD1). If we let B = (wij)i,j=1,...,m−1, thensince

∑mi=1wij = 1/m, for all j = 1, . . . ,m, and

∑mj=1wij = 1/m for all i= 1, . . . ,m,

D1 +C⊤D0C =

1− 2(1/m−wmm) 1⊤B⊤

1⊤B

B1 (1/m)I BB⊤

1 B⊤ (1/m)I

.

By elementary row and column operations, we get

det(D1 +C⊤D0C) = det

1 (1/m)1⊤ (1/m)1⊤

(1/m)1 (1/m)I B(1/m)1 B⊤ (1/m)I

,

so that

det(D1 +C⊤D0C)

= det

(((1/m)I BB⊤ (1/m)I

)− (1/m2)12(m−1)1

⊤2(m−1)

)

= det

((1/m)(I − (1/m)11⊤) B − (1/m2)11⊤

B⊤ − (1/m2)11⊤ (1/m)(I − (1/m)11⊤)

)

= det((1/m)(I − (1/m)11⊤))det([(1/m)(I − (1/m)11⊤)]

− [B⊤ − (1/m2)11⊤][(1/m)(I − (1/m)11⊤)]−1

[B − (1/m2)11⊤]).

Finally, det((1/m)(I − (1/m)11⊤)) = (1/m)m and [(1/m)(I − (1/m)11⊤)]−1 = m(I +11

⊤), thus

det(D1 +C⊤D0C) = (1/m)m det((1/m)I −mV ⊤V ),

where V = (wij)i=1,...,m;j=1,...,m−1. �

Proof of Theorem 1. We prove that the Jeffreys prior is proper. Consider the fol-lowing partition of V , V = (V1V2 · · ·Vm−1), where each Vj is a vector, j = 1, . . . ,m− 1.The matrix (1/m)I −mV ⊤V is symmetric, non-negative and semi-definite, so that byHadamard’s inequality, we have

det((1/m)I −mV ⊤V ) ≤m−1∏

j=1

(1/m−m‖Vj‖2)


=

m−1∏

j=1

(2m

∑

1≤i<k≤m

wijwkj

)

= (2m)m−1∑

1≤im−1<km−1≤m

· · ·∑

1≤i1<k1≤m

m−1∏

j=1

wijjwkjj .

For any W ∈W , we have

√I(W ) =

√det((1/m)I −mV ⊤V )

/√√√√mm

m∏

i,j=1

wij

≤√2m−1/m

{∑

1≤im−1<km−1≤m

· · ·∑

1≤i1<k1≤m

m−1∏

j=1

wijjwkjj

}1/2/√√√√m∏

i,j=1

wij

≤√2m−1/m

∑

1≤im−1<km−1≤m

· · ·∑

1≤i1<k1≤m

{m−1∏

j=1

wijjwkjj

}1/2/√√√√m∏

i,j=1

wij

=√2m−1/m

∑

α∈A

m∏

i,j=1

wαij−1ij ,

where A = {αij ∈ {1/2,1}: i, j = 1, . . . ,m,α+j =m/2 + 1, j = 1, . . . ,m − 1 and α+m =m/2}, with α+j =

∑mi=1 αij , for all j = 1, . . . ,m. We need to show that the integral of∏m

i,j=1wαij−1ij is finite for all α ∈ A . The integration is made with respect to wij , i ∨

j < m, the free variables. For any permutation matrices P1 and P2, the transformationW 7→ P1WP2 is a one-to-one transformation from W onto W , and the Jacobian, inabsolute value, is equal to one. Therefore, it is sufficient to verify that the integral of∏m

i,j=1wαij−1ij is finite for all α ∈A0, where A0 = {α ∈ A : αm−1m = αmm = 1}. The idea

is to decompose the multiple integral into m− 2 iterated integrals over the sections givenby

Wk = {wij ≥ 0: i∧ j = k, i∨ j ≤m,wk+ =w+k = 1/m}, k = 1, . . . ,m− 2,

and

Wm−1 = {wij ≥ 0: i, j =m− 1,m,wm−1+ =w+m−1 =wm+ =w+m = 1/m}.

Here, the set W1 is fixed, the sets Wk are parameterized by {wij ≥ 0: i∧ j < k, i∨ j = k},k = 2, . . . ,m−2, and Wm−1 is parameterized by {wij ≥ 0: i∧ j <m−1, i∨ j =m−1,m}.By Fubini’s theorem, for any non-negative function f , we can write

∫

W

f(W )

m−1∏

i,j=1

dwij =

∫

W

{· · ·∫

W−

{f(W ) dwm−1m−1} · · ·} ∏

i∧j=1,i∨j<m

dwij .


The next step consists in finding finite functions ck, k = 1, . . . ,m− 1, on A0, such that∫

W

∏

i∧j=k,i∨j≤m

wαij−1ij

∏

i∧j=k,i∨j<m

dwij ≤ ck(α),

for all α ∈ A0, uniformly on {wij ≥ 0: i ∧ j < k, i ∨ j = k}, for k = 1, . . . ,m − 2, anduniformly on {wij ≥0: i∧ j <m−1, i∨j=m−1,m}, for k=m−1. This will give us that

∫

W

∏

i,j≤m

wαij−1ij

∏

i,j<m

dwij ≤m−1∏

k=1

ck(α),

for all α ∈A0.Let a = 0 ∨ {

∑ℓ<m−1(wℓm − wm−1ℓ)} ∨ {

∑ℓ<m−1(wmℓ − wℓm−1)} and b = 1/m −

{(∑ℓ<m−1wℓm−1) ∨ (∑

ℓ<m−1wm−1ℓ)}. If a > b, the set Wm−1 is empty. Supposethat Wm−1 is not empty and α ∈A0. Let b0 = 1/m−∑ℓ<m−1wℓm−1. We have

∫

W−

∏

i,j=m−1,m

wαij−1ij dwm−1m−1 =

∫ b

a

uαm−1m−1−1(b0 − u)αmm−1−1 du

≤∫ b0

0

uαm−1m−1−1(b0 − u)αmm−1−1 du

= bαm−1m−1+αmm−1−10 B(αm−1m−1, αmm−1)

≤ B(αm−1m−1, αmm−1)

= cm−1(α).

For k = 1, . . . ,m− 2 and α ∈A0 we can take

ck(α) =

(B

(αkk,

m∑

i=k+1

αik

)+B

(αkk,

m∑

j=k+1

αkj

))∏i>k Γ(αik)

∏j>k Γ(αkj)

Γ(∑

i>k αik)Γ(∑

j>k αkj).

The justification is given by Lemma 5. �

Lemma 5. If 0< a, b≤ 1,m≥ 3, α > 0,

βj > 0, j = 1, . . . ,m− 1, with β =m−1∑

j=1

βj ≥ 1,

γi > 0, i= 1, . . . ,m− 1, with γ =

m−1∑

i=1

γi ≥ 1

and

C =

{wij ≥ 0: i∧ j = 1, i∨ j ≤m,

m∑

j=1

w1j = a,

m∑

i=1

wi1 = b

},


then

∫

C

wα−111

m∏

j=2

wβj−1−11j

m∏

i=2

wγi−1−1i1 dw11

m−1∏

j=2

dw1j

m−1∏

i=2

dwi1

≤ (B(α,β) +B(α,γ))

∏m−1j=1 Γ(βj)

∏m−1i=1 Γ(γi)

Γ(β)Γ(γ).

Proof. Let

K(a, b,α, β, γ) =

∫ a∧b

0

wα−1(a−w)β−1(b−w)γ−1 dw.

If a < b, then

K(a, b,α, β, γ) =

∫ a

0

wα−1(a−w)β−1(b−w)γ−1 dw

≤ bγ−1

∫ a

0

wα−1(a−w)β−1 dw

= aα+β−1bγ−1B(α,β)≤B(α,β).

In the same way, if b < a, then K(a, b,α, β, γ)≤B(α,γ), so that

K(a, b,α, β, γ)≤B(α,β) +B(α,γ). (11)

Now, let W11 be a random variable on (0, a∧ b) with density

1

K(a, b,α, β, γ)wα−1

11 (a−w11)β−1(b−w11)

γ−1,

let (U12, . . . , U1m) be a random vector distributed according to a Dirichlet(β1, . . . , βm−1),let (U21, . . . , Um1) be distributed according to a Dirichlet(γ1, . . . , γm−1) and further as-sume independence between W11, (U12, . . . , U1m) and (U21, . . . , Um1). Let W1j = (a −W11)U1j , j = 2, . . . ,m, and Wi1 = (b −W11)Ui1, i = 2, . . . ,m. From this construction,given W11 = w11, we have that (W12, . . . ,W1m) and (W21, . . . ,Wm1) are conditionallyindependent with conditional densities given, respectively, by

1

(a−w11)β−1

Γ(β)∏m−1

i=1 Γ(βi)wβ1−1

12 · · ·wβm−1−11m ,

with w1j ≥ 0, j = 2, . . . ,m,∑

2≤j≤mw1j = a−w11 and

1

(b−w11)γ−1

Γ(γ)∏m−1

i=1 Γ(γi)wγ1−1

21 · · ·wγm−1−1m1 ,


with wi1 ≥ 0, i = 2, . . . ,m,∑

2≤i≤mwi1 = b− w11. This construction, together with in-equality (11), implies the result, namely

∫

C

wα−111

m∏

j=2

wβj−1−11j

m∏

i=2

wγi−1−1i1 dw11

m−1∏

j=2

dw1j

m−1∏

i=2

dwi1

≤ (B(α,β) +B(α,γ))

∏m−1j=1 Γ(βj)

∏m−1i=1 Γ(γi)

Γ(β)Γ(γ).

�

Lemma 6. Consider X, a Binomial(n, p) random variable. We have

sup0≤p≤1

Ep[|X−np|] ={1/B(1/2, (n+1)/2), if n is odd,

(1− (n+1)−2)n/2

(1 + (n+ 1)−2)1/B(1/2, n/2), if n is even.

Proof. Let µn(p) = Ep[|X − np|]. We have

µn(p) =2n!

(⌊np⌋)!(n− 1− ⌊np⌋)!p⌊np⌋+1(1− p)n−⌊np⌋ for all n≥ 1, p∈ [0,1],

where ⌊x⌋=max{n: n≤ x,n is an integer} for all x. Therefore,

sup0≤p≤1

µn(p) = max0≤k≤n−1

sup{p: ⌊np⌋=k}

µn(p)

= max0≤k≤n−1

µn

(k+1

n+1

),

and, in particular, sup0≤p≤1 µ1(p)=µ1(1/2)=1/2=1/B(1/2,1). Now assume that n>1.Let

νn(k) = µn

(k+ 2

n+ 1

)/µn

(k+ 1

n+ 1

),

for k = 0, . . . , n− 2. We have

νn(k) =(1 + (k+ 1)−1)k+2

(1 + (n− k− 1)−1)n−kand νn(k) =

1

νn(n− 2− k)for k = 0, . . . , n− 2.

However,

d

dtlog

(1 +

1

t

)t+1

= log

(1 +

1

t

)− 1

t< 0 for all t > 1.

This implies that νn decreases on {0, . . . , n− 2}. Therefore,

µn

(1

n+1

)< · · ·< µn

((n+ 1)/2

n+ 1

)> · · ·> µn

(n

n+ 1

)if n is odd,

µn

(1

n+1

)< · · ·< µn

(n/2

n+ 1

)= µn

(n/2+ 1

n+ 1

)> · · ·> µn

(n

n+ 1

)if n is even.


The final expression is obtained using the following identity:

n! = 2nΓ

(n

2+ 1

)Γ

(n+1

2

)/Γ

(1

2

)for all n≥ 0.

�

Acknowledgements

We are particularly grateful to the referee and to the associate editor for their commentsand suggestions that have led to a much improved version of this paper. We wish to thankDaniel Stubbs and the staff at Reseau Quebecois de Calcul Haute Performance (RQCHP)for their valuable help with high performance computing. This research was supportedby the Natural Sciences and Engineering Research Council of Canada (NSERC).

References

[1] Beck, M. and Pixton, D. (2003). The Ehrhart polynomial of the Birkhoff polytope.Discrete Comput. Geom. 30 623–637. MR2013976

[2] Bernardo, J.M. and Giron, F.J. (1988). A Bayesian analysis of simple mixture problems.In Bayesian Statistics 3 (Valencia, 1987). Oxford Sci. Publ. 67–78. New York: OxfordUniv. Press. MR1008044

[3] Charpentier, A., Fermanian, J.-D. and Scaillet, O. (2007). The estimation of copulas:Theory and practice. In Copulas: From Theory to Application in Finance (J. Rank,ed.) 35–60. London: Risk Publications.

[4] Cherubini, U., Luciano, E. and Vecchiato, W. (2004). Copula Methods in Finance.Chichester: Wiley. MR2250804

[5] Deheuvels, P. (1979). La fonction de dependance empirique et ses proprietes. Un testnon parametrique d’independance. Acad. Roy. Belg. Bull. Class. Sci. (5) 65 274–292.MR0573609

[6] Deheuvels, P. (1980). Nonparametric test of independence. In Nonparametric AsymptoticStatistics (Proc. Conf., Rouen, 1979) (French). Lecture Notes in Math. 821 95–107.Berlin: Springer. MR0604022

[7] Embrechts, P., Lindskog, F. and McNeil, A. (2003). Modelling dependence with cop-ulas and applications to risk management. In Handbook of Heavy Tailed Distributionsin Finance (S. Rachev, ed.) 329–384. Amsterdam: Elsevier/North Holland.

[8] Fermanian, J.-D. and Scaillet, O. (2003). Nonparametric estimation of copulas for timeseries. J. Risk 95 25–54.

[9] Gamerman, D. and Lopes, H.F. (2006). Markov Chain Monte Carlo: Stochastic Sim-ulation for Bayesian Inference, 2nd ed. Boca Raton, FL: Chapman & Hall/CRC.MR2260716

[10] Genest, C., Gendron, M. and Bourdeau-Brien, M. (2009). The advent of copulas infinance. Eur. J. Finance 15 609–618.

[11] Genest, C., Ghoudi, K. and Rivest, L.P. (1995). A semiparametric estimation procedureof dependence parameters in multivariate families of distributions. Biometrika 82 543–552. MR1366280

[12] Genest, C. and Neslehova, J. (2007). A primer on copulas for count data. Astin Bull.37 475–515. MR2422797

http://www.ams.org/mathscinet-getitem?mr=2013976









[13] Grenander, U. (1981). Abstract Inference. New York: Wiley. MR0599175[14] Hoff, P.D. (2007). Extending the rank likelihood for semiparametric copula estimation.

Ann. Appl. Statist. 1 265–283. MR2393851[15] Joe, H. (1997). Multivariate Models and Dependence Concepts. Monographs on Statistics

and Applied Probability 73. London: Chapman & Hall. MR1462613[16] Joe, H. (2005). Asymptotic efficiency of the two-stage estimation method for copula-based

models. J. Multivariate Anal. 94 401–419. MR2167922[17] Kim, G., Silvapulle, M.J. and Silvapulle, P. (2007). Comparison of semiparametric and

parametric methods for estimating copulas. Comput. Statist. Data Anal. 51 2836–2850.MR2345609

[18] Li, X., Mikusinski, P. and Taylor, M.D. (1998). Strong approximation of copulas.J. Math. Anal. Appl. 225 608–623. MR1644300

[19] McNeil, A.J., Frey, R. and Embrechts, P. (2005). Quantitative Risk Management:Concepts, Techniques and Tools. Princeton, NJ: Princeton Univ. Press. MR2175089

[20] Melilli, E. and Petris, G. (1995). Bayesian inference for contingency tables with givenmarginals. Statist. Methods Appl. 4 215–233.

[21] Mirsky, L. (1963). Results and problems in the theory of doubly-stochastic matrices.Probab. Theory Related Fields 1 319–334. MR0153038

[22] Nelsen, R.B. (2006). An Introduction to Copulas, 2nd ed. New York: Springer. MR2197664[23] Neslehova, J. (2007). On rank correlation measures for non-continuous random variables.

J. Multivariate Anal. 98 544–567. MR2293014[24] Sancetta, A. and Satchell, S. (2001). Bernstein approximation to copula function and

portfolio optimization. DAE Working paper, Univ. Cambridge.[25] Sancetta, A. and Satchell, S. (2004). The Bernstein copula and its applications to

modeling and approximations of multivariate distributions. Econometric Theory 20

535–562. MR2061727[26] Sheather, S.J. (2004). Density estimation. Statist. Sci. 19 588–597. MR2185580[27] Silva, R.d.S. and Lopes, H.F. (2008). Copula, marginal distributions and model selection:

A Bayesian note. Stat. Comput. 18 313–320. MR2413387[28] Titterington, D.M., Smith, A.F.M. and Makov, U.E. (1985). Statistical Analysis of

Finite Mixture Distributions. Chichester: Wiley. MR0838090[29] Trivedi, P. and Zimmer, D. (2007). Copula Modeling: An Introduction for Practitioners.

Hanover, MS: Now Publishers.

Received January 2010 and revised August 2010















Nonparametric Bayesian Estimation of a Bivariate Copula Using the Jeffreys Prior

Documents