Top Banner
Searching in High-Dimensional Spaces Index Structures for Improving the Performance of Multimedia Databases Christian Böhm, Stefan Berchtold, Daniel A. Keim ACM Computing Surveys, 2001
22

Searching in High-Dimensional Spaces

Jan 10, 2016

Download

Documents

Ilana

Searching in High-Dimensional Spaces Index Structures for Improving the Performance of Multimedia Databases. Christian Böhm, Stefan Berchtold, Daniel A. Keim ACM Computing Surveys, 2001. Introduction. Multimedia databases have become increasingly important in many application areas - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Searching in High-Dimensional Spaces

Searching in High-Dimensional SpacesIndex Structures for Improving the Performance of

Multimedia Databases

Christian Böhm, Stefan Berchtold, Daniel A. KeimACM Computing Surveys, 2001

Page 2: Searching in High-Dimensional Spaces

Introduction Multimedia databases have become increasingly im-

portant in many application areas Content-based retrieval of similar objects Similarity search

Feature transformation• Multimedia object → high dimensional points (feature vector)

Search of points in the feature space that are close to a given query point

2

Traditional Databases

Point, range, partial match query

Multimedia Databases

Similarity search

Page 3: Searching in High-Dimensional Spaces

Similarity Queries Basic idea of feature-based similarity search

3

ε-Searchor NN-Search

FeatureTransformation

Insert

Complex Data Objects High-Dim. Feature Vectors High-Dim. Index

NN

Range query Nearest-neighbor query

Page 4: Searching in High-Dimensional Spaces

Effects in High-Dimensional Space Curse of dimensionality

Can you imagine 5 or 10-dimension? “Every d-dimensional sphere touching (or intersecting) the

(d-1)-dimensional boundaries of the data space contains c” What happen if d=16?

4

Page 5: Searching in High-Dimensional Spaces

Effects in High-Dimensional Space Issues

Exponential growth of volume

Space partitioning• The majority of the data pages are located at the surface of the

data space rather than in the interior• Coarse partitioning

5

0.5

0.50.25 917.025.016 0.917

0.917

Page 6: Searching in High-Dimensional Spaces

Common Principles Structure & Regions

Hierarchical clustering Spatially adjacent vectors are likely to reside in the same

node

6

Page 7: Searching in High-Dimensional Spaces

Basic Algorithms Index construction

Insert, Delete, and Update Query processing

Exact match query Range query Nearest-neighbor query Ranking query (generalized k-nearest-neighbor query) Reverse nearest-neighbor query

7

Page 8: Searching in High-Dimensional Spaces

Nearest-Neighbor Query No fixed criterion, known a priori, to exclude branches

of the indexing structure The criterion is the nearest-neighbor distance But it is not known until the algorithm has terminated

• Pessimistic estimation• The closest point among all points visited

(closest point candidate)

8

Page 9: Searching in High-Dimensional Spaces

Nearest-Neighbor Query RKV algorithm

MINDIST : the actual distance between the query point and page region

MINMAXDIST : estimation of the nearest neighbor distance ‘Depth-first’ and ‘Branch and bound’ traversal

9

MINDISTMINMAXDIST

Page 10: Searching in High-Dimensional Spaces

Nearest-Neighbor Query HS algorithm

Access all pages of the index in the order of increasing dis-tance to the query point

Active page list (APL)

10

p3

p1

p2

p31

p1

p33

p2

p32

p1

p311

p312

p33

p2

p32

p11

p311

p312

p33

p2

p32

p13

p12

p311

p312

p33

p111

p2

p112

p32

p13

p12

Page 11: Searching in High-Dimensional Spaces

Nearest-Neighbor Query Comparison

RKV• pr1 → pr12 → pr11 →…

HS• pr1 → pr2 → pr21

11

Page 12: Searching in High-Dimensional Spaces

Index Structures Minimum bounding rectangles

R-tree family X-tree

Bounding spheres SS-tree TV-tree

Combined regions SR-tree

Etc. Space filling curves Pyramid-tree

12

Page 13: Searching in High-Dimensional Spaces

R, R*, R+-Tree Overlap problem

For an overlap-free split, a dimension is needed in which the projections of the page regions have no overlap at some point• Existence of such a point becomes less likely as the dimension

of the data space increases

R+ tree An overlap-free variant of the R-tree using a forced-split strat-

egy High dimensionality leads to many forced-split operations.

• Storage utilization < 50%

13

8409.02/1

7071.02/14

2

d

effCAa /1A

a

Page 14: Searching in High-Dimensional Spaces

X-Tree Extension of the R*-tree Designed for the management of high-dimensional ob-

jects Overlap-free split (split history) Supernodes (unbalanced split tree)

14

Page 15: Searching in High-Dimensional Spaces

kd-Tree Advantage

Guarantee of no overlap Disadvantages

Complete partitioning• Page regions are generally larger than necessary which yields a

higher access probability

Unbalanced

15

Page 16: Searching in High-Dimensional Spaces

kd-Tree kd-B-tree

Balanced kd-tree Forced split

hB-tree Splitting a node based on

multiple attributes Forced split is avoided

LSDh-tree Coded region description

• Reduce space requirement

16

Page 17: Searching in High-Dimensional Spaces

SS-Tree Spheres as page regions Split

Split axis is determined as the dimension yielding the highest variance

Not amenable to an easy overlap-free split

17

Page 18: Searching in High-Dimensional Spaces

Space Filling Curves Range and nearest-neighbor queries based on dis-

tance calculations of page regions

18

I

q

lb : 47 = 101111ub : 60 =111100longest common prefix : p =1s = <p100…000> = 110000 = 48

I1

I2

lb : 48 = 110000ub : 60 =111100longest common prefix : p =11s = <p100…000> = 111000 = 56

I21

I22

Page 19: Searching in High-Dimensional Spaces

Pyramid Tree Divide the data space such that the resulting partitions

are shaped like peels of an onion Pyramid mapping

Optimized for range queries on high-dim. data Not affected by the curse of dimensionality

19

Page 20: Searching in High-Dimensional Spaces

Summary & Comparison

20

Page 21: Searching in High-Dimensional Spaces

Summary & Comparison

21

Page 22: Searching in High-Dimensional Spaces

Conclusions Effects occurring in indexing high-dim. spaces Principal ideas of the index structures that have been

proposed to overcome the problems Research on high-dim. indexing has a major impact on

many practical applications and commercial multime-dia database system

Future Research Issues Real case (not uniform and not independent data) Partitioning strategies that perform well in high-dim. Approximate processing of NN queries

22