Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs

Tomáš Skopal, David Hoksza

Charles University in PragueDepartment of Software Engineering

Czech Republic

ADBIS 2007 2

Presentation Outline

Metric Access Methods (MAMs)

M-tree, PM-tree

Query processing and Filtering

Nearest-neighbor graphs → M*-tree, PM*-tree filtering pivot selection strategies

Experiments

ADBIS 2007 3

Metric Access Methods

Indexing methods designed for searching metric datasets

Similarities among objects are modeled by a distance function which fulfills metric properties

MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying

Methods GNAT, (m)vp-tree, D-index, (L)AESA, … M-tree, PM-tree

ADBIS 2007 4

M-tree (Metric tree)

dynamic, hierarchical index structure data space divided into ball shaped data

regions (hyper-spheres) root node represent data region covering

all data children nodes represent regions covering

parts of the space, … built in bottom-up way like b-tree when node is full, new node is created

and the objects are separated be data regions form balanced hierarchical

structure inner nodes → routing entries

leaf nodes → ground items

))](()),(,(,,[)( iiiOiil OTptrOparOrOOrouti

))](,(,[)( iiii OparOOOgrnd

ADBIS 2007 5

Query Processing + Filtering

range and k nearest neighbor (kNN) queries traversing from the root node

in case of kNN dynamically decreasing query radius basic filtering → filter out nodes whose parent data

region doesn’t intersect the query region parent filtering → using precomputed distance of an

object to the parent and of the parent to the query

ADBIS 2007 6

PM-tree (Pivoting Metric tree)

PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume ith ring defined by nearest and furthest objects in the node according to i th pivot

query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering

PM-tree regionM-tree region

queryquery

QQ

Q doesn’t overlap 2. ring

ADBIS 2007 7

Pivot space

global pivots map regions/data into a pivot space of dimensionality p (ith coordinate → distance to ith pivot)

distances of a data region to p pivots produces p-dimensional minimum bounding rectangle

the overlap with rings can be understood in this sense as L∞ filtering (region is filtered out if it’s L∞ distance to Q is smaller then the query radius)

ADBIS 2007 8

M*-tree, PM*-tree M*-tree = M-tree + nearest-neighbor (NN) graphs

present in every node each object knows it’s NN (within it’s node)

example →

PM*-tree = PM-tree + nearest-neighbor (NN) graphs

))]((,))(,(),()),(,(,,[)( iiiiiiOiil OTptrONNOONNOparOrOOrouti

]))(,(),()),(,(,[)( iiiiiii ONNOONNOparOOOgrnd

O6 = NN(O4)

ADBIS 2007 9

NN-graph Filtering objects (NN graph nodes) play role of mutual local pivots

sacrifice local pivot object whose distance to the query is really computed by query evaluation used for possible filtering of reverse nearest neighbours (rNNs)

filtering with NN-graph (one step of node processing)1. fetch first record (Si) from sacrifices queue (SQ)2. apply parent filtering to Si

3. If Si not filtered → sacrifice (compute Q-Si distance)4. try to filter out rNNs(Si) (NN-graph filtering)5. move non-filtered rNNs(Si) to the beginning of SQ (rNNs sets are disjoint

→ non-filtered become sacrifices)6. apply basic filtering to Si

ADBIS 2007 10

Sacrifice selection selection of sacrifices is important

good pivot filters many objects out poor pivot filters good possible pivot(s) (future sacrifices)

Heuristics M*-tree

hMaxRNNCount first in SQ is object with highest number of rNNs

hMinRNNDistance first in SQ is object nearest to its NN or rNN

hMinToParentDistance first in SQ is object closest to parent object

PM*-tree hMinLmaxDistance

first in SQ is object with minimum L∞ distance hMaxLmaxDistance

first in SQ is object with maximum L∞ distance

ADBIS 2007 11

Experimental Results Corel dataset

65,615 feature vectors of images L1 distance function 8 dimensions

Polygons dataset synthetic 1,000,000 randomly generated 2D polygons (5-10 vertices) Hausdorff set distance function

GenBank Dataset 250,000 strings of proteins (of lengths 50-100) edit distance function

Testing of computation costs (number of distance computations)

ADBIS 2007 12

Experiments – Corel Dataset

ADBIS 2007 13

Experiments – Polygons Dataset

ADBIS 2007 14

Experiments- Genbank Dataset

ADBIS 2007 15

Conclusion

We have proposed enhancing nodes of M-tree like structures by nearest-

neighbors graphs filtering technique based on NN-graphs → NN-graph

filtering

We have implemented M*-tree (enhancement of M-tree by NN-graphs) PM*-tree (enhancement of PM-tree by NN-graphs)

Experimental results we have shown up to 45% speed-up

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.

Documents