Top Banner
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
25

Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Dec 18, 2015

Download

Documents

Gertrude Hall
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets: Simple algorithms for

proximity search

Robert Krauthgamer (IBM Almaden)Joint work with James R. Lee (UC Berkeley)

Page 2: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 2

A classical problemFix a metric space (X,d):

X = set of points.

d = distance function over X.

Near-neighbor search (NNS) [Minsky-Papert]:

1. Preprocess a given n-point subset S X.

2. Given a query point q 2 X, quickly compute the closest point to q among S.

Page 3: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 3

Variations on NNS(1+)-approximate nearest neighbor search: Find a2X such that d(q,a) · (1+) d(q,S).

Dynamic case: Allow updates to S (insertions and deletions).

Distributed case: No central index (e.g., nodes in a network). Other cost measures (e.g., communication, stretch, load).

Page 4: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 4

General metrics Only oracle access to distance function d(¢,¢). Models a complicated metric or on-demand measurement. No “hashing of coordinates” or tuning for a specific metric.

Goal: efficient query (sublinear or polylog time). Impossible, even if the data set S is a path metric:

1 2 n

n-1n n

What about approximate NNS?

Page 5: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 5

Approximate NNSHard even for (near) uniform metrics d(x,y) = 1 for all x,y2S.

1

11

But many data sets lack large uniform subsets.

Can we quantify this?

Page 6: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 6

Abstract dimensionThe doubling constant X of a metric (X,d) is the minimum such that every ball can be covered by balls of half the radius.

The metric is doubling if X = O(1).

The (abstract) dimension is dim (X) = log2 X.

Immediate properties: dimA(Rd , || · ||2) = O(d).

dimA(X’) dimA(X) for all X’ X.

dimA(X) log |X|. (Equality for a uniform metric.)

Page 7: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 7

IllustrationGrid with missing piece

Page 8: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 8

IllustrationGrid with missing piece

Low-dimensional manifold (bounded curvature)

Page 9: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 9

IllustrationGrid with missing piece

Manifold

Union of curves in Euclidean space

Page 10: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 10

Embedding doubling metricsTheorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0<<1, and let (X,d) be a doubling metric. Then (X,d) can be embedded with O(1) distortion into l2O(1).

Not true for =1 [Semmes, 1996].

Motivation: Embed S and then apply Euclidean NNS.

Page 11: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 11

Our resultsSimple data structure for maintaining S: (1+)-NNS query time: (1/)O(dim(S)) · log (for <½), where

dmax/dmin is the normalized diameter of S (typically =nO(1)). Space: n · 2O(dim(S))

Dynamic maintenance of S: Insertion / deletion time: 2O(dim(S)) · log · loglog .

Additional properties: Best possible dependency on dim(S) (in a certain model). Oblivious to dim(S) and robust against “bad localities”.

Matches/improves known (more specialized) results.

Page 12: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 12

NetsDefinition: An r-net of X is a subset Y with1. d(y1,y2) r for all y1,y2 2 Y.

2. d(x,Y) < r for all x 2 XnY.

(I.e., a maximal r-separated subset.)

Note: Compare vs. -net.

Running example – a path metric:

An 8-net

A 4-net

A 16-net

Page 13: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 13

More netsDefinition: An r-net of X is a subset Y with1. d(y1,y2) r for all y1,y2 2 Y.

2. d(x,Y) < r for all x 2 XnY.

(I.e., a maximal r-separated subset.)

Note: Compare vs. -net.

Yr

Y Y

Y

Page 14: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 14

The data structureFor every r = 2i, let Yr be an r-net of S. Only O(log ) values of r are non-trivial.

A 16-net

An 8-net

A 4-net

For every y 2 Yr maintain a navigation list

Ly,r = {z 2 Yr/2: d(y,z) 2r}

Page 15: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 15

More on the data structure

3r

Yr/2

Yr

For every r = 2i, let Yr be an r-net of S. Only O(log ) values of r are non-trivial.

For every y 2 Yr maintain a navigation list

Ly,r = {z 2 Yr/2: d(y,z) 2r}

Page 16: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 16

Space requirementLemma: |Ly,r| 2O(dim(S)) for all y2Y, r¸0.Proof:

Ly,r is contained in a ball of radius 2r.

This ball can be covered by S3 balls of radius r/4.

Every point in Ly,r Yr/2 must be covered by a distinct ball.

Hence, | Ly,r | S3 = 23dim(S).

Corollary: Total space is 2O(dim(S)) · n · log .We actually improve it to 2O(dim(S)) · n.

Page 17: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 17

Back to running example

A 16-net

An 8-net

A 4-net

Page 18: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 18

Navigating netsLet $ denote the query point.

Initially z16 = only point in Y16.

Find z8 = closest Y8 point to $.

Find z4 = closest Y4 point to $ etc.

$

$

$

Page 19: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 19

How to find zr/2?

Assume each zr2Yr is the closest point to a (instead of to q).

Then d(zr,zr/2) · r+r/2 = 3r/2.

And zr/2 must be in zr‘s list Ly,r.

• q

• zr

· r

• a

• zr/2

· r/2 · r/4For zr to be closest Yr point to q,

It suffices that d(q,a) · r/4.

And then zr’s list Ly,r contains zr/2.

Note: d(q,zr) · 3r/2.

Page 20: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 20

Stopping pointIf we find a point zr with d(q,zr) · 3r/2,

But not a point zr/2 with d(q,zr/2) · 3r/4,

We know that d(q,S) > r/4,

Yielding 6-NNS with query time 2O(dim(S)) · log .

This can be extended to (1+)-NNS Similar principles yield insertions and deletions.

Page 21: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 21

Near-optimalityThe basic idea: Consider a uniform metric on points. Let the query point be at distance 1 from all of them, Except for one point whose distance is 1-. Finding this point requires (in an oracle model) computing all

distances to q.

Can happen at every distance scale r.

We get a lower bound of 2 (dim(S)) log .

Page 22: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 22

Related work – general metricsLet KX be the smallest K such that

|B(x,r)| K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0.

Define the KR-dimension as log2 KX.

Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04]: Space n · 2O(dim(S)) · log . Query time : 2O(dim(S)) · log . If dimKR(S) = O(1) the log term is actually O(log n).

Our results extend to this setting:1. KR-metrics are doubling: dim(X) 4dimKR(X).

2. Our algorithms actually give exact NNS.

Assumptions on query distribution [Clarkson’99].

Page 23: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 23

Related work – Euclidean metricsExact NNS for Rd: O(d5 log n) query time and O(nd+) space. [Meiser’93]

-NNS for Rd: O((d/)d log n) query time and O(dn) space by quad-tree like

decompositions [AMNSW’94]. Our algorithm achieves similar bounds.

O(d polylog(dn)) query time and (dn)O(1) space is useful for higher dimensions [IM’98, KOR’98].

Page 24: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 24

Concluding remarksOur approach: A “decision tree” that is not really a tree (saves space).

In progress: A different (static) scheme where log is replaced by log n. Bounds on the help of “ambient” space points.

Our data structure yields a spanner of the metric Immediate: O(1) stretch with average degree 2dim(S). More work: O(1) stretch with maximum degree 2dim(S).

[Guibas,’04] applied the nets data structure for moving points in the plane.

Page 25: Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets 25