Top Banner
1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT
41

Approximations and Streaming Algorithms for Geometric Problems

Feb 25, 2016

Download

Documents

Hue

Approximations and Streaming Algorithms for Geometric Problems . Piotr Indyk MIT. Computational Model. Single * pass over the data: e 1 , e 2 , …, e n Bounded storage Fast processing time per element. * For the purpose of this talk. Streaming Data Types. Vector problems: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Approximations and       Streaming Algorithms for Geometric Problems

1

Approximations and Streaming Algorithms for

Geometric Problems

Piotr IndykMIT

Page 2: Approximations and       Streaming Algorithms for Geometric Problems

2

Computational Model Single* pass over the data: e1, e2,

…,en Bounded storage Fast processing time per element

*For the purpose of this talk

Page 3: Approximations and       Streaming Algorithms for Geometric Problems

3

Streaming Data Types Vector problems:

Stream defines an array of numbers Maintain stats of the array, e.g., median

Metric problems Clustering

Graph problems, Text problems Geometric Problems [this talk]

Page 4: Approximations and       Streaming Algorithms for Geometric Problems

4

Geometric Data Stream Algorithms as Data Structures Data structures that support:

Insert(p) to P Possibly: Delete(p) from P Compute(P)

Use space that is sub-linear in |P|

Page 5: Approximations and       Streaming Algorithms for Geometric Problems

5

Insertions-only

Page 6: Approximations and       Streaming Algorithms for Geometric Problems

6

Clustering in Geometric Spaces Problems:

k-center [Charikar-Chekuri-Feder-Motwani’97] k-median [Guha-Mishra-Motwani-O’Callaghan’00,

Meyerson’01, Charikar-O’Callaghan-Panigrahy’03]

Bounds: poly(k,log n) space O(1)-approximation

Page 7: Approximations and       Streaming Algorithms for Geometric Problems

7

k-median/k-center

• k is given• Goal: choose k medians/centers to minimize:

• k-median: the sum of the distances• k-center: the max distance

Page 8: Approximations and       Streaming Algorithms for Geometric Problems

8

Geometric Space Bounds:

poly(k,log n) space (1+)-approximation

Problems: Diameter, Minimum Enclosing Ball [Agarwal-Har-Peled’01,

Feigenbaum-Kannan-Zhang’02, Cormode-Muthukrishnan’02, Hershberger-Suri’04]

k-center [Agarwal-HarPeled’01, Agarwal-HarPeled-Varadarajan’04] k-median [HarPeled-Mazumdar’04] Range searching via -approximations:

[Suri-Toth-Zhou’04] [Bagchi-Chaudhary-Eppstein-Goodrich’04]

Page 9: Approximations and       Streaming Algorithms for Geometric Problems

9

Dominant Approach: Merge and Reduce Main ideas:

Design an (off-line) algorithm that converts the input into a “sketch”:

Small size Sufficient to solve the problem A sketch of sketches is a sketch

Partition the input in a tree-like fashion Simulate tree computation in small space

Technique can traced back to ancient times

i.e., 80’s [Munro-Paterson’80]

Page 10: Approximations and       Streaming Algorithms for Geometric Problems

10

Tree Computation

p1 p2 p3 p4 p5 p7p6 p8 p9 p10 p11 p12 p13 p15p14 p16

Page 11: Approximations and       Streaming Algorithms for Geometric Problems

11

Analysis Space: (sketch size)*log n Time: sketch computation time Question: Where do sketches come

from ?

Page 12: Approximations and       Streaming Algorithms for Geometric Problems

12

Idea I: solution=sketch Consider k-median [GMMO’00] : approximate k-

median of approximate weighted k-medians is an approximate k-median

Result: Constant depth tree Space: kn , >0 O(1) -approximation Works for any metric

space

k=3

Page 13: Approximations and       Streaming Algorithms for Geometric Problems

13

Use the solution, ctd. -Approximations: find a subset SP ,

such that for any rectangle/halfspace/etc R,

|RS|/|S| = |RP|/|P| [Matousek] : approximation of a union of

approximations is an approximation [BCEG’04] : convert it into streaming

algorithm, applications 1/2 space

[STZ’04] : better/optimal bounds for rectangles and halfspaces

Page 14: Approximations and       Streaming Algorithms for Geometric Problems

14

Idea 2: Core-Sets [AHP’01] Assume we want to

minimize CP(o) SP is an -core-set

for P, if for any o, and a set T:

CPT (o) = (1 ± ) CST (o) Note: this must hold

for all o, not just the optimal one

o

P

Page 15: Approximations and       Streaming Algorithms for Geometric Problems

15

Example: Core-set for MEB Compute extremal points:

Choose “densely” spaced direction v1 …vk

I.e., for any u there is vi such that u*vi ≥ ||u||2 / (1+)

For each direction maintain extremal point

k=O(1/)(d-1)/2 suffice

Page 16: Approximations and       Streaming Algorithms for Geometric Problems

16

Stream Algorithms via Core-sets Diameter/MEB/width: O(1/)(d-1)/2 log n

space [AHP’01] k-center: O(k/d) log n [HP’01] k-median:

O(k/d) log n [HPM’04] O(k2/d) [HPK’05] O(k2d log6 n/) [Chen’05] O(d3/7), k=1 [Indyk’05]

Faster algorithms and other results

Page 17: Approximations and       Streaming Algorithms for Geometric Problems

17

Limitations Small core-sets might not exist Do not support deletions

Page 18: Approximations and       Streaming Algorithms for Geometric Problems

18

Insertions and Deletions

Page 19: Approximations and       Streaming Algorithms for Geometric Problems

19

Insertions and Deletions Technique:

Reduction of geometric problems to vector problems

Use of randomized linear embeddings Problems:

Maintaining histograms of the data Classic geometric problems

(matching, MST, clustering etc)

Page 20: Approximations and       Streaming Algorithms for Geometric Problems

20

Streaming Algorithms for Vector Problems Norm estimation:

Stream elements: (i,b) , i=1…m Interpretation: xi=xi+b Want to maintain ||x||p

Why ? Examples: ||x||p

p =Σi xip = #non-zero coordinates in x, as

p0 …

How ?

Page 21: Approximations and       Streaming Algorithms for Geometric Problems

21

Dimensionality reduction x is an m-dimensional vector A is a “random” m times k matrix, k “small” Store Ax Recover (1±)||x||2 from Ax (with prob. 1-1/N )

[Alon-Matias-Szegedy’96] Estimator: median[ (A1x)2+..+ (Ac x)2, (Ac+1x)2+..+ (A2cx)2,..]1/2 ,

c=1/2 , k=c log N A: constructed from 4-wise independent random variables

[Johnson-Lindenstrauss’85] Estimator: ||Ax||2 A: each entry independently drawn from e.g. Gaussian distribution constructed using Nisan’s PRG [Indyk’00]

[Indyk’00] Estimator: median[ (A1x),…, (Ak x) ] A: as above Works for ||x||p any p(0,2] (using p-stable distributions)

Page 22: Approximations and       Streaming Algorithms for Geometric Problems

22

What it means To know ||x||2, suffices to know Ax Can maintain Ax when the coordinates are

incremented:A(x+ bei)=Ax+ bA ei

A xAx

Page 23: Approximations and       Streaming Algorithms for Geometric Problems

23

Applications of Vector Approach Histograms/wavelet approximation Classic geometric problems

(matching, MST, clustering etc)

Page 24: Approximations and       Streaming Algorithms for Geometric Problems

24

Histograms View x as a function x:[1…n] [1…M] Approximate it using piecewise constant

function h, with B pieces (buckets) Problem can be formulated in 2D as well

(buckets become rectangular tiles)

Page 25: Approximations and       Streaming Algorithms for Geometric Problems

25

Results: 1D [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-

Strauss’02] : Maintains h with B pieces such that

||x-h||2 ≤ (1+)||x-hOPT||2 Under increments/decrements of x Space: poly(B,1/,log n) Time: poly(B,1/,log n)

Page 26: Approximations and       Streaming Algorithms for Geometric Problems

26

Results: 2D [Thaper-Guha-Indyk-Koudas’02] :

Maintains h with B log (nM) tiles such that ||x-h||2 ≤ (1+)||x-hOPT||2

Under increments/decrements of x Space/Update time: poly(B,1/,log n) Histogram reconstruction time: poly(B,1/, n)

[Muthukrishnan-Strauss’03] : Maintains h with 4B tiles Time: poly(B,1/, log(nM))

Page 27: Approximations and       Streaming Algorithms for Geometric Problems

27

Minimum Weight Bi-chromatic Matching

• Estimate the cost of MWBM

Page 28: Approximations and       Streaming Algorithms for Geometric Problems

28

Minimum Weight Matching

• Estimate the cost of MWM

Page 29: Approximations and       Streaming Algorithms for Geometric Problems

29

Minimum Spanning Tree

• Estimate the cost of MST

Page 30: Approximations and       Streaming Algorithms for Geometric Problems

30

Facility Location

• Goal: choose a set F of facilities to minimize the sum of the distances to nearest facility plus the number of facilities times f• Again, report the cost

Page 31: Approximations and       Streaming Algorithms for Geometric Problems

31

Approach [Indyk’04] Assume P{1…}2

Reduce to vector problems Impose square grids G0…Gk, with

side lengths 20,21, …, 2k , shifted at random.

For each square cell c in Gi, let nP(c) be the number of points from P in c.

The algorithms will maintain certain statistics over nP(.), which will allow it to approximately solve the problems

2 1

1

1

3

1

15

Page 32: Approximations and       Streaming Algorithms for Geometric Problems

32

Estimators MST:∑i 2i ∑c Gi [nP(c)>0] MWM: ∑i 2i ∑c Gi [nP(c) is odd] MWBM: ∑i 2i ∑c Gi |nG(c)-nB(c)| Fac. Loc.: ∑i 2i ∑c Gi min[nP(c), Ti] K-median:∑i 2i ∑c Gi - B(Q, 2^i) nP(c) (given medians Q)

Maintain #non-zero entries in nP [FM’85]Maintain L1 difference [I’00]

Page 33: Approximations and       Streaming Algorithms for Geometric Problems

33

ResultsProblem Appr.MST log → 1+MWM log MWBM log Fac.Loc. log2 K-median XYZ → 1+

Space: (log +log n + K )O(1)

[Frahling-Indyk-Sohler’05]

[Frahling-Sohler’05]

[…, Charikar’02, …]

Page 34: Approximations and       Streaming Algorithms for Geometric Problems

34

XYZ

Space: (K+log + log n)O(1)

Computation Time ApproximationO(k) poly(log n+1/) 1+2 poly(log n+log +k) O(1)poly(log n+log +k) [ 1+ , log n log ]

Page 35: Approximations and       Streaming Algorithms for Geometric Problems

35

Probabilistic embeddings into HST’s

2 1

1

1

3

1

1

Known [Bartal’96, Charikar-Chekuri-Goel-Guha-Plotkin’98]: • ||p-q|| ≤ Dtree (p,q) • E[ Dtree(p,q) ] ≤ ||p-q|| * O(log )

T

5

Page 36: Approximations and       Streaming Algorithms for Geometric Problems

36

MST

E[Cost(MST in T)] ≤ O(log ) Cost(MST) Cost(MST in T) Cost(T) How to compute Cost(T) ?

Sum over all levels i, of the #nodes at i, times 2i

Node c exists iff ni(c)>0

2 1

1

1

3

1

15

Page 37: Approximations and       Streaming Algorithms for Geometric Problems

37

Matching Algorithm:

Match what you can at the current level

Odd leftovers wait for the next level

Repeat Optimal on the HST Cost=∑i 2i ∑c Gi [nP(c) is odd]

0 11

1

10

0

1

1

Page 38: Approximations and       Streaming Algorithms for Geometric Problems

38

Conclusions Algorithms for geometric data

streams Insertions-only: merge and reduce,

coresets Insertions and deletions: randomized

linear embeddings

Page 39: Approximations and       Streaming Algorithms for Geometric Problems

39

Open Problems Matchings, Facility Location, etc:

Replace log by O(1) or even 1+ Possible for:

MST [Frahling-Indyk-Sohler’05] k-median [Frahling-Sohler’05]

Related to computing bi-chromatic matching [Agarwal-Varadarajan’04]

Min-sum clustering ?

Page 40: Approximations and       Streaming Algorithms for Geometric Problems

40

Open Problems High dimensions:

Diameter: 21/2-approx, O(d2 n1/2 ) space, follows from

[Goel-Indyk-Varadarajan’01] c-approx, O( dn1/(c2 - 1) ) [Indyk’03] Conjecture: 21/2-approx, O(d polylog n)

space Min-width cylinder: 18-approx, O(d)

space [Chan’04]

Page 41: Approximations and       Streaming Algorithms for Geometric Problems

41

Open Problems Range queries:

General lower bounds ? (Not just for - approximations)

(1/2) -bit bound for general queries follows from LB for dot product [Indyk-Woodruff’03] and is tight (for randomized algorithms)

What about e.g., half-space queries ? O(1/4/3) is known [STZ’04]

Other problems [STZ’04]