Multivariate Quantiles and Ranks using Optimal Transportation Bodhisattva Sen 1 Department of Statistics Columbia University, New York Department of Statistics George Mason University Joint work with Promit Ghosal (Columbia University) 05 April, 2019 1 Supported by NSF grants DMS-1712822 and AST-1614743
94
Embed
Multivariate Quantiles and Ranks using Optimal Transportation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Multivariate Quantiles and Ranks using OptimalTransportation
Bodhisattva Sen1
Department of StatisticsColumbia University, New York
Department of StatisticsGeorge Mason University
Joint work with Promit Ghosal (Columbia University)
05 April, 2019
1Supported by NSF grants DMS-1712822 and AST-1614743
How to define ranks and quantiles in Rd , d > 1?
Ranks and quantiles when d = 1
X is a random variable with c.d.f. F
Rank: The rank of x ∈ R is F (x)
Property: If F is continuous, F (X ) ∼ Unif([0, 1])
Quantile: The quantile function is F−1
Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])
How to define ranks and quantiles in Rd , d > 1?
Ranks and quantiles when d = 1
X is a random variable with c.d.f. F
Rank: The rank of x ∈ R is F (x)
Property: If F is continuous, F (X ) ∼ Unif([0, 1])
Quantile: The quantile function is F−1
Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])
How to define ranks and quantiles in Rd , d > 1?
Ranks and quantiles when d = 1
X is a random variable with c.d.f. F
Rank: The rank of x ∈ R is F (x)
Property: If F is continuous, F (X ) ∼ Unif([0, 1])
Quantile: The quantile function is F−1
Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?
Monge Problem
What’s the cheapest way to transport a pile of sand to cover asinkhole?
Blanchet (Columbia U. and Stanford U.) 5 / 60
Goal: infT :T (X )∼ν
Eµ[c(X ,T (X ))]
µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =
∫Ydν(y) = 1
c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)
T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,
ν(B) = µ(T−1(B)) =
∫
T−1(B)
dµ, B ⊂ Y
Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?
Monge Problem
What’s the cheapest way to transport a pile of sand to cover asinkhole?
Blanchet (Columbia U. and Stanford U.) 5 / 60
Goal: infT :T (X )∼ν
Eµ[c(X ,T (X ))]
µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =
∫Ydν(y) = 1
c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)
T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,
ν(B) = µ(T−1(B)) =
∫
T−1(B)
dµ, B ⊂ Y
One-dimensional optimal transport
Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν
(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
One-dimensional optimal transport
Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν
(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
The minimizing T must satisfy (Why?)
(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2
This means that if x1 > x0 then T (x1) ≥ T (x0)
So T must be a monotone nondecreasing function
Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)
dµ)
∫ x
−∞dµ(x) =
∫ T (x)
−∞dν(y) ⇒ Fµ(x) = Fν(T (x))
Thus, T = F−1ν ◦ Fµ (and this map T is unique)
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
The minimizing T must satisfy (Why?)
(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2
This means that if x1 > x0 then T (x1) ≥ T (x0)
So T must be a monotone nondecreasing function
Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)
dµ)
∫ x
−∞dµ(x) =
∫ T (x)
−∞dν(y) ⇒ Fµ(x) = Fν(T (x))
Thus, T = F−1ν ◦ Fµ (and this map T is unique)
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
The minimizing T must satisfy (Why?)
(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2
This means that if x1 > x0 then T (x1) ≥ T (x0)
So T must be a monotone nondecreasing function
Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)
dµ)
∫ x
−∞dµ(x) =
∫ T (x)
−∞dν(y) ⇒ Fµ(x) = Fν(T (x))
Thus, T = F−1ν ◦ Fµ (and this map T is unique)
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Monge’s problem: Given probability measures µ and ν solve:
infT :T (X )∼ν
Eµ[c(X ,T (X ))] = infT :T#µ=ν
∫
Xc(x ,T (x))dµ(x) (1)
where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel
Drawbacks
Above optimization problem is highly non-linear and can be ill-posed
No admissible T may exist; e.g., if µ is the Dirac delta and ν is not
Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map
Solution need not be unique (book shifting example)
Not much progress was made for about 160 yrs!
Monge’s problem: Given probability measures µ and ν solve:
infT :T (X )∼ν
Eµ[c(X ,T (X ))] = infT :T#µ=ν
∫
Xc(x ,T (x))dµ(x) (1)
where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel
Drawbacks
Above optimization problem is highly non-linear and can be ill-posed
No admissible T may exist; e.g., if µ is the Dirac delta and ν is not
Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map
Solution need not be unique (book shifting example)
Not much progress was made for about 160 yrs!
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Kantorovich Relaxation: Primal Problem
Monge’s problem (M): infT :T#µ=ν
∫X c(x ,T (x))dµ(x)
Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.
πX = marginal of X = µ, πY = marginal of Y = ν
Kantorovich relaxation (K): Solve
minπ∈Π(µ,ν)
Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)
∫
X×Yc(x , y)dπ(x , y)
Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2
Linear program (infinite dimensional)
Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ
Computation: Kantorovich Dual Problem Dual
2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)
Kantorovich Relaxation: Primal Problem
Monge’s problem (M): infT :T#µ=ν
∫X c(x ,T (x))dµ(x)
Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.
πX = marginal of X = µ, πY = marginal of Y = ν
Kantorovich relaxation (K): Solve
minπ∈Π(µ,ν)
Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)
∫
X×Yc(x , y)dπ(x , y)
Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2
Linear program (infinite dimensional)
Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ
Computation: Kantorovich Dual Problem Dual
2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)
Kantorovich Relaxation: Primal Problem
Monge’s problem (M): infT :T#µ=ν
∫X c(x ,T (x))dµ(x)
Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.
πX = marginal of X = µ, πY = marginal of Y = ν
Kantorovich relaxation (K): Solve
minπ∈Π(µ,ν)
Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)
∫
X×Yc(x , y)dπ(x , y)
Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2
Linear program (infinite dimensional)
Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ
Computation: Kantorovich Dual Problem Dual
2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
A Geometric Approach to Optimal Transportation
µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2
Monge’s problem3 (M): infT :T#µ=ν
∫‖u − T (u)‖2dµ(u)
T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B
Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)
∫‖u − x‖2dπ(u, x)
Compared to above notions this approach has the following advantages:
This relies on appealing geometric ideas
Does not require any moment conditions
When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.
3Monge’s problem is not meaningful unless µ and ν have finite second moments
A Geometric Approach to Optimal Transportation
µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2
Monge’s problem3 (M): infT :T#µ=ν
∫‖u − T (u)‖2dµ(u)
T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B
Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)
∫‖u − x‖2dπ(u, x)
Compared to above notions this approach has the following advantages:
This relies on appealing geometric ideas
Does not require any moment conditions
When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.
3Monge’s problem is not meaningful unless µ and ν have finite second moments
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Quantile map when d ≥ 1
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa
Q ≡ ∇ϕ : S → Rd
where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.
Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.
aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc
In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)
Quantile map when d ≥ 1
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa
Q ≡ ∇ϕ : S → Rd
where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.
Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.
aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc
In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)
Sample quantiles when d = 1
µ = Uniform([0, 1])ν ≡ νn = 1
n
∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ R
X(1) < . . . < X(n) be the order statistics
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−0.5
0.5
Quantile function
u
Q
Then the sample quantile function Qn (Qn#µ = νn) reduces to
Qn(u) = X(i), if u ∈(i − 1
n,i
n
), i = 1, . . . , n
At in , i = 1, . . . , n − 1, we are free to define
Qn
(i
n
)∈ [X(i),X(i+1)]
Sample quantiles in Rd , d ≥ 1
µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)
ν ≡ νn = 1n
∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd
Qn is the transport (Monge) map s.t. Qn#µ = 1n
∑ni=1 δXi and
minimizes (in this case (K)=(M))∫
S
‖u − T (u)‖2dµ(u) =n∑
i=1
∫
{u∈S:T (u)=Xi}
‖u − Xi‖2dµ(u)
Sample quantiles in Rd , d ≥ 1
µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)
ν ≡ νn = 1n
∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd
Qn is the transport (Monge) map s.t. Qn#µ = 1n
∑ni=1 δXi and
minimizes (in this case (K)=(M))∫
S
‖u − T (u)‖2dµ(u) =n∑
i=1
∫
{u∈S:T (u)=Xi}
‖u − Xi‖2dµ(u)
Computation in the semi-discrete case
Obtain a convex subdivision of S — “partition” of S = ∪ni=1Q−1n (Xi )
Top-dimensional cells: convex polyhedral sets in the subdivision of Swith non-empty interior
Question: How to compute Qn? (Figures and plots?)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure: The data sets are drawn from the following distributions (clockwise topto bottom): (i) X ∼ N2((0, 0), I2); (ii) X ∼ N2((0, 0),Σ) where Σ1,1 = Σ2,2 = 1and Σ1,2 = Σ2,1 = 0.99; (iii) two spiral structures with Gaussian perturbations(with small variance); and (iv) a mixture of four different distributions.
Rank map
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd
where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function
aBy convention ϕ(u) = +∞ for u /∈ S
Rank map
The rank map of ν (w.r.t. µ) is defined by
R ≡ ∇ϕ∗
where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:
ϕ∗(x) := supu∈Rd
{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}
Note that the rank map R(·) is finite on Rd convex functions
Rank map
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd
where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function
aBy convention ϕ(u) = +∞ for u /∈ S
Rank map
The rank map of ν (w.r.t. µ) is defined by
R ≡ ∇ϕ∗
where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:
ϕ∗(x) := supu∈Rd
{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}
Note that the rank map R(·) is finite on Rd convex functions
When is R = Q−1?
X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4
U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density
Result [Ghosal and S. (2018+)]
Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:
(i) The inverse function of Q exists, and has the form
Q−1 = ∇ϕ∗ =: R,
where ϕ∗ is the Legendre-Fenchel dual of ϕ.
(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)
(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.
athe c.d.f. F is continuous and strictly increasing
4with mild boundedness (from below and above) assumptions on the density on X
When is R = Q−1?
X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4
U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density
Result [Ghosal and S. (2018+)]
Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:
(i) The inverse function of Q exists, and has the form
Q−1 = ∇ϕ∗ =: R,
where ϕ∗ is the Legendre-Fenchel dual of ϕ.
(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)
(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.
athe c.d.f. F is continuous and strictly increasing
4with mild boundedness (from below and above) assumptions on the density on X
When is R = Q−1?
X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4
U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density
Result [Ghosal and S. (2018+)]
Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:
(i) The inverse function of Q exists, and has the form
Q−1 = ∇ϕ∗ =: R,
where ϕ∗ is the Legendre-Fenchel dual of ϕ.
(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)
(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.
athe c.d.f. F is continuous and strictly increasing
4with mild boundedness (from below and above) assumptions on the density on X
Properties of the rank/quantile maps
Characterizes the distribution
The quantile and rank functions characterize the associated distribution
Equivariance under orthogonal transformations
Suppose Y = AX , A is d × d matrix
A is an orthogonal matrix, i.e., AA> = A>A = Id
µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))
Then, QY (u) = AQX (A>u) for µ-a.e. u
RY (y) = ARX (A>y), for a.e. y ∈ Rd
Quantile/rank maps — equivariant under orthogonal transformations
Properties of the rank/quantile maps
Characterizes the distribution
The quantile and rank functions characterize the associated distribution
Equivariance under orthogonal transformations
Suppose Y = AX , A is d × d matrix
A is an orthogonal matrix, i.e., AA> = A>A = Id
µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))
Then, QY (u) = AQX (A>u) for µ-a.e. u
RY (y) = ARX (A>y), for a.e. y ∈ Rd
Quantile/rank maps — equivariant under orthogonal transformations
Properties of the rank/quantile maps
Characterizes the distribution
The quantile and rank functions characterize the associated distribution
Equivariance under orthogonal transformations
Suppose Y = AX , A is d × d matrix
A is an orthogonal matrix, i.e., AA> = A>A = Id
µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))
Then, QY (u) = AQX (A>u) for µ-a.e. u
RY (y) = ARX (A>y), for a.e. y ∈ Rd
Quantile/rank maps — equivariant under orthogonal transformations
Under mutual independence
X = (X1,X2, . . . ,Xk) ∼ ν where k ≥ 2;
Xi ∼ νi , for i = 1, . . . , k are r.v. in Rdi (here d1 + . . .+ dk = d)
µ = Uniform([0, 1]d)
Let Q and Qi be the quantile maps of X and Xi , for i = 1, . . . , k,respectively (w.r.t. µ and µi = Uniform([0, 1]di ))
Let R and Ri , for i = 1, . . . , k , be the corresponding rank maps
Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map
Glivenko-Cantelli type result [Ghosal & S. (2018+)]
Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))
Let Q and R be the quantile and rank maps of ν (w.r.t. µ)
Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism
Let Qn and Rn be any sample quantile and rank functions
Let K1 ⊂ Int(S) be a compact set. Then, we have
supu∈K1
‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd
‖Rn(x)− R(x)‖ a.s.→ 0
Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.
Glivenko-Cantelli type result [Ghosal & S. (2018+)]
Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))
Let Q and R be the quantile and rank maps of ν (w.r.t. µ)
Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism
Let Qn and Rn be any sample quantile and rank functions
Let K1 ⊂ Int(S) be a compact set. Then, we have
supu∈K1
‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd
‖Rn(x)− R(x)‖ a.s.→ 0
Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Two-sample Testing
Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd
Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd
µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
Quantile maps
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Population quantile maps: QX and QY
Recall: QX#µ = νX and QX#µ = νY
Two-sample Testing
Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd
Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd
µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
Quantile maps
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Population quantile maps: QX and QY
Recall: QX#µ = νX and QX#µ = νY
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn
Test statistic:
Tm,n :=
∫
S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =
∫ 1
0
{Fn(F−1(u))− u}2du
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn
Test statistic:
Tm,n :=
∫
S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =
∫ 1
0
{Fn(F−1(u))− u}2du
Test statistic: Tm,n =∫S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
When d = 1, Tm,n is distribution-free!
Question: Is Tm,n is (asymptotically) distribution-free when d > 1?
Critical value: Can always be computed by permutation test
Theorem [Ghosal and S. (2018+)]
Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,
Tm,nP→ 0, as m, n→∞.
Further, for νX 6= νY (and mild regularity conditions on νX and νY ),
Tm,nP→ c > 0 as m, n→∞.
Test statistic: Tm,n =∫S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
When d = 1, Tm,n is distribution-free!
Question: Is Tm,n is (asymptotically) distribution-free when d > 1?
Critical value: Can always be computed by permutation test
Theorem [Ghosal and S. (2018+)]
Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,
Tm,nP→ 0, as m, n→∞.
Further, for νX 6= νY (and mild regularity conditions on νX and νY ),
Tm,nP→ c > 0 as m, n→∞.
Test statistic: Tm,n =∫S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
When d = 1, Tm,n is distribution-free!
Question: Is Tm,n is (asymptotically) distribution-free when d > 1?
Critical value: Can always be computed by permutation test
Theorem [Ghosal and S. (2018+)]
Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,
Tm,nP→ 0, as m, n→∞.
Further, for νX 6= νY (and mild regularity conditions on νX and νY ),
Tm,nP→ c > 0 as m, n→∞.
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Independence Testing
(X1,Y1), . . . , (Xn,Yn) are i.i.d. ν (abs. cont.) on RdX × RdY ; dX + dY = d