Top Banner
Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi (MIT) Mohammad Mahdian (Google) Vahab S. Mirrokni (Google)
64

Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Aug 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets for Diversity and Coverage

Maximization

Piotr Indyk (MIT) Sepideh Mahabadi (MIT)

Mohammad Mahdian (Google) Vahab S. Mirrokni (Google)

Page 2: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

Page 3: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

• 𝒄-Core-set: Small subset of points S ⊂ 𝑃 which suffices to 𝑐-approximate the optimal solution

• Maximization: 𝑓𝑜𝑜𝑜 𝑃

𝑐≤ 𝑓𝑜𝑜𝑜 𝑆 ≤ 𝑓𝑜𝑜𝑜(𝑃)

Page 4: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

• 𝒄-Core-set: Small subset of points S ⊂ 𝑃 which suffices to 𝑐-approximate the optimal solution

• Maximization: 𝑓𝑜𝑜𝑜 𝑃

𝑐≤ 𝑓𝑜𝑜𝑜 𝑆 ≤ 𝑓𝑜𝑜𝑜(𝑃)

• Example

– Optimization Function: Distance of the two farthest points

Page 5: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

• 𝒄-Core-set: Small subset of points S ⊂ 𝑃 which suffices to 𝑐-approximate the optimal solution

• Maximization: 𝑓𝑜𝑜𝑜 𝑃

𝑐≤ 𝑓𝑜𝑜𝑜 𝑆 ≤ 𝑓𝑜𝑜𝑜(𝑃)

• Example

– Optimization Function: Distance of the two farthest points

– 1-Core-set: Points on the convex hull.

Page 6: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

Page 7: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

Page 8: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

• Example: two farthest points

Page 9: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

• Example: two farthest points

Page 10: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

• Example: two farthest points

Page 11: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Streaming Computation

• Streaming Computation: – Processing sequence of 𝑛 data elements “on the fly” – limited Storage

Page 12: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Streaming Computation

• Streaming Computation: – Processing sequence of 𝑛 data elements “on the fly” – limited Storage

• 𝒄-Composable Core-set of size 𝒌 – Chunks of size 𝑛𝑛 , thus number of chunks = 𝑛/𝑛

Page 13: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Streaming Computation

• Streaming Computation: – Processing sequence of 𝑛 data elements “on the fly” – limited Storage

• 𝒄-Composable Core-set of size 𝒌 – Chunks of size 𝑛𝑛 , thus number of chunks = 𝑛/𝑛 – Core-set for each chunk – Total Space: 𝑛 𝑛/𝑛 + 𝑛𝑛 = 𝑂( 𝑛𝑛) – Approximation Factor: 𝑐

Page 14: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Distributed Systems • Streaming Computation • Distributed System:

– Each machine holds a block of data. – A composable core-set is computed and sent to the server

Page 15: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Distributed Systems • Streaming Computation • Distributed System:

– Each machine holds a block of data. – A composable core-set is computed and sent to the server

• Map-Reduce Model: • One round of Map-Reduce • 𝑛/𝑛 mappers each getting 𝑛𝑛 points • Mapper computes a composable core-set of size 𝑛 • Will be passed to a single reducer

Page 16: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size

Page 17: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size • Good to have result from each

cluster: relevant and diverse

Page 18: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size • Good to have result from each

cluster: relevant and diverse • Diverse Near Neighbor Problem

[Abbar, Amer-Yahia, Indyk, Mahabadi WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13]

Page 19: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size • Good to have result from each

cluster: relevant and diverse • Diverse Near Neighbor Problem

[Abbar, Amer-Yahia, Indyk, Mahabadi WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13]

– uses Locality Sensitive Hashing (LSH) and Composable Core-sets techniques.

Page 20: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity

k=4 n = 6

Page 21: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity • Diversity:

– Minimum pairwise distance (Remote Edge)

k=4 n = 6

Page 22: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity • Diversity:

– Minimum pairwise distance (Remote Edge)

– Sum of Pairwise distances (Remote Clique) k=4

n = 6

Page 23: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity • Diversity:

– Minimum pairwise distance (Remote Edge)

– Sum of Pairwise distances (Remote Clique)

• Long list of variants [Chandra and Halldorsson ‘01]

k=4 n = 6

Page 24: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Functions Diversity function over

a set 𝑆 of 𝑛 point Description

Remote-edge Minimum Pairwise Distance: min𝑜,𝑞∈𝑆

𝑑𝑑𝑑𝑑(𝑝, 𝑞)

Remote-clique Sum of Pairwise Distances : ∑ 𝑑𝑑𝑑𝑑(𝑝, 𝑞)𝑜,𝑞∈𝑆

Remote-tree Weight of Minimum Spanning Tree (MST) of the set 𝑆

Remote-cycle Weight of minimum Traveling Salesman Tour (TSP) of the set 𝑆

Remote-star Weight of minimum star: min𝑜∈𝑆

∑ 𝑑𝑑𝑑𝑑(𝑝, 𝑞)𝑞∈𝑆

Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor ∑ min

𝑞∈𝑆𝑑𝑑𝑑𝑑(𝑝, 𝑞)𝑜∈𝑆

Remote-Matching Weight of minimum perfect Matching of the set 𝑆

Max-Coverage How well the points cover each coordinate

�max𝑜∈𝑆

𝑝𝑖

𝑑

𝑖=1

Page 25: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Our Results Diversity function Offline ApproxFactor Composable Coreset

Approx factor [Our Results]

Remote-edge Minimum Pairwise Distance 𝑂(1) [Tmair 91][White 91]

[Ravi et al 94]

𝑶(𝟏)

Remote-clique Sum of Pairwise Distances 𝑂(1) [Hassin et al 97]

𝑶(𝟏)

Remote-tree Weight of MST 𝑂(1) [Halldorsson et al 99]

𝑶(𝟏)

Remote-cycle Weight of minimum TSP 𝑂(1) [Halldorsson et al 99]

𝑶(𝟏)

Remote-star Weight of minimum star 𝑂(1) [Chandra&Halldorsson 01]

𝑶(𝟏)

Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor

𝑂(log 𝑛) [Chandra&Halldorsson 01]

𝑶(𝐥𝐥𝐥 𝒌)

Remote-Matching Weight of minimum perfect Matching 𝑂(log 𝑛) [Chandra&Halldorsson 01]

𝑶(𝐥𝐥𝐥 𝒌)

Max-Coverage How well the points cover each coordinate

�max𝑜∈𝑆

𝑝𝑖

𝑑

𝑖=1

𝑂(1) [Feige 98]

No Composable Coreset of Poly size in 𝒌 with app. factor

𝒌𝒍𝒍𝒍 𝒌

Page 26: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Review of Offline Algorithms

• We have a set of 𝑛 point 𝑃 • Goal: find a subset 𝑆 of size 𝑛 which

maximizes the diversity

Page 27: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

The Greedy Algorithm

• Used for minimum-pairwise distance

Page 28: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

The Greedy Algorithm

• Used for minimum-pairwise distance • Greedy Algorithm [Ravi, Rosenkrantz,

Tayi] [Gonzales] – Choose an arbitrary point – Repeat k-1 times

• Add the point whose minimum distance to the currently chosen points is maximized

Page 29: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

The Greedy Algorithm

• Used for minimum-pairwise distance • Greedy Algorithm [Ravi, Rosenkrantz,

Tayi] [Gonzales] – Choose an arbitrary point – Repeat k-1 times

• Add the point whose minimum distance to the currently chosen points is maximized

• Remote-edge: computes a 2-

approximate set

Page 30: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances

Page 31: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points

Page 32: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

Page 33: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

» Perform the swap

Page 34: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

» Perform the swap

Page 35: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

» Perform the swap

• For Remote-Clique – Number of rounds: log 1+𝜖𝑛

𝑛2 = 𝑂(𝑛𝜖

log 𝑛)

– Approximation factor is constant.

Page 36: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets

• Greedy Algorithm Computes a 3-composable core-set for minimum pairwise distance

• Local Search Algorithm Computes a constant factor composable core-set for sum of pairwise distances.

Page 37: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖

Page 38: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c

Page 39: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c

Page 40: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c

Page 41: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c

Page 42: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Page 43: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂

Page 44: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂))

Page 45: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂)) • find a one-to-one mapping 𝜇 from 𝑂𝑃𝑂 = {𝑜1,⋯ , 𝑜𝑘} to 𝑆 = 𝑆1 ∪ ⋯∪ 𝑆𝑚 s.t.

𝑑𝑑𝑑𝑑 𝑜𝑖 ,𝜇 𝑜𝑖 ≤ 𝑶(𝑟)

Page 46: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂)) • find a one-to-one mapping 𝜇 from 𝑂𝑃𝑂 = {𝑜1,⋯ , 𝑜𝑘} to 𝑆 = 𝑆1 ∪ ⋯∪ 𝑆𝑚 s.t.

𝑑𝑑𝑑𝑑 𝑜𝑖 ,𝜇 𝑜𝑖 ≤ 𝑶(𝑟) • Replacing 𝑜𝑖 with 𝜇(𝑜𝑖) has still large diversity • 𝑑𝑑𝑣 𝜇 𝑜𝑖 is approximately as good as 𝑑𝑑𝑣 𝑜𝑖

Page 47: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂)) • find a one-to-one mapping 𝜇 from 𝑂𝑃𝑂 = {𝑜1,⋯ , 𝑜𝑘} to 𝑆 = 𝑆1 ∪ ⋯∪ 𝑆𝑚 s.t.

𝑑𝑑𝑑𝑑 𝑜𝑖 ,𝜇 𝑜𝑖 ≤ 𝑶(𝑟) • Replacing 𝑜𝑖 with 𝜇(𝑜𝑖) has still large diversity • 𝑑𝑑𝑣 𝜇 𝑜𝑖 is approximately as good as 𝑑𝑑𝑣 𝑜𝑖 • The actual mapping 𝜇 depends on the specific diversity measure we are considering.

Page 48: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Maximum k-Coverage • A set of 𝑛 points 𝑃 in 𝑑-dimensional space • Each dimension corresponds to a feature. • Goal: choose a set of 𝑛 points 𝑆 in 𝑃 which maximizes the total

coverage: – cov S = ∑ max

𝑠∈𝑆𝑑𝑖𝑑

𝑖=1

Page 49: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Maximum k-Coverage • A set of 𝑛 points 𝑃 in 𝑑-dimensional space • Each dimension corresponds to a feature. • Goal: choose a set of 𝑛 points 𝑆 in 𝑃 which maximizes the total

coverage: – cov S = ∑ max

𝑠∈𝑆𝑑𝑖𝑑

𝑖=1

• Special Case hamming space: • A collection of 𝑛 sets 𝑃 • Over the universe 𝑈 = 1, … ,𝑑 • Goal: choose 𝑛 sets 𝑆 = {𝑆1, … , 𝑆𝑘} in 𝑃 whose union is

maximized.

Page 50: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Maximum k-Coverage • A set of 𝑛 points 𝑃 in 𝑑-dimensional space • Each dimension corresponds to a feature. • Goal: choose a set of 𝑛 points 𝑆 in 𝑃 which maximizes the total

coverage: – cov S = ∑ max

𝑠∈𝑆𝑑𝑖𝑑

𝑖=1

• Special Case hamming space: • A collection of 𝑛 sets 𝑃 • Over the universe 𝑈 = 1, … ,𝑑 • Goal: choose 𝑛 sets 𝑆 = {𝑆1, … , 𝑆𝑘} in 𝑃 whose union is

maximized.

• Theorem: for any 𝛼 < 𝑘log 𝑘

and any constant 𝛽 > 1, there is no 𝛼-composable core-set of size 𝑛𝛽

Page 51: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = 1,⋯ ,𝑂 𝑛4

Page 52: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈

Page 53: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖

Page 54: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

Page 55: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

We show there exists 𝑉1,⋯ ,𝑉𝑂 𝑘 such that – 𝑉𝑖 ∖ 𝑉1 has size 𝑛 – 𝑉𝑖 ∖ 𝑉1 and 𝑉𝑗 ∖ 𝑉1 are disjoint for 𝑑 ≠ 𝑗

Page 56: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

We show there exists 𝑉1,⋯ ,𝑉𝑂 𝑘 such that – 𝑉𝑖 ∖ 𝑉1 has size 𝑛 – 𝑉𝑖 ∖ 𝑉1 and 𝑉𝑗 ∖ 𝑉1 are disjoint for 𝑑 ≠ 𝑗

• Using 𝑛 sets everything in ∪ 𝑉𝑖 can be covered,

that is 𝑂(𝑛3/2) elements.

Page 57: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

We show there exists 𝑉1,⋯ ,𝑉𝑂 𝑘 such that – 𝑉𝑖 ∖ 𝑉1 has size 𝑛 – 𝑉𝑖 ∖ 𝑉1 and 𝑉𝑗 ∖ 𝑉1 are disjoint for 𝑑 ≠ 𝑗

• Using 𝑛 sets everything in ∪ 𝑉𝑖 can be covered,

that is 𝑂(𝑛3/2) elements. • Using core-sets only 𝑉1 + 𝑛 log 𝑛 = O(k log k )

can be covered

Page 58: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets

Page 59: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures

Page 60: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage

Page 61: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage • Open Problems

– Are there any other applications of composable core-sets?

Page 62: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage • Open Problems

– Are there any other applications of composable core-sets? – Is there a general characterization of measures for which

composable core-sets exist?

Page 63: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage • Open Problems

– Are there any other applications of composable core-sets? – Is there a general characterization of measures for which

composable core-sets exist? – Better approximation factors?

Page 64: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Thank You!

Questions?