Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering Czech Republic
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs
Tomáš Skopal, David Hoksza
Charles University in PragueDepartment of Software Engineering
Czech Republic
ADBIS 2007 2
Presentation Outline
Metric Access Methods (MAMs)
M-tree, PM-tree
Query processing and Filtering
Nearest-neighbor graphs → M*-tree, PM*-tree filtering pivot selection strategies
Experiments
ADBIS 2007 3
Metric Access Methods
Indexing methods designed for searching metric datasets
Similarities among objects are modeled by a distance function which fulfills metric properties
MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying
Methods GNAT, (m)vp-tree, D-index, (L)AESA, … M-tree, PM-tree
ADBIS 2007 4
M-tree (Metric tree)
dynamic, hierarchical index structure data space divided into ball shaped data
regions (hyper-spheres) root node represent data region covering
all data children nodes represent regions covering
parts of the space, … built in bottom-up way like b-tree when node is full, new node is created
and the objects are separated be data regions form balanced hierarchical
structure inner nodes → routing entries
leaf nodes → ground items
))](()),(,(,,[)( iiiOiil OTptrOparOrOOrouti
))](,(,[)( iiii OparOOOgrnd
ADBIS 2007 5
Query Processing + Filtering
range and k nearest neighbor (kNN) queries traversing from the root node
in case of kNN dynamically decreasing query radius basic filtering → filter out nodes whose parent data
region doesn’t intersect the query region parent filtering → using precomputed distance of an
object to the parent and of the parent to the query
ADBIS 2007 6
PM-tree (Pivoting Metric tree)
PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume ith ring defined by nearest and furthest objects in the node according to i th pivot
query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering
PM-tree regionM-tree region
queryquery
Q doesn’t overlap 2. ring
ADBIS 2007 7
Pivot space
global pivots map regions/data into a pivot space of dimensionality p (ith coordinate → distance to ith pivot)
distances of a data region to p pivots produces p-dimensional minimum bounding rectangle
the overlap with rings can be understood in this sense as L∞ filtering (region is filtered out if it’s L∞ distance to Q is smaller then the query radius)
ADBIS 2007 8
M*-tree, PM*-tree M*-tree = M-tree + nearest-neighbor (NN) graphs
present in every node each object knows it’s NN (within it’s node)
example →
PM*-tree = PM-tree + nearest-neighbor (NN) graphs
))]((,))(,(),()),(,(,,[)( iiiiiiOiil OTptrONNOONNOparOrOOrouti
]))(,(),()),(,(,[)( iiiiiii ONNOONNOparOOOgrnd
O6 = NN(O4)
ADBIS 2007 9
NN-graph Filtering objects (NN graph nodes) play role of mutual local pivots
sacrifice local pivot object whose distance to the query is really computed by query evaluation used for possible filtering of reverse nearest neighbours (rNNs)
filtering with NN-graph (one step of node processing)1. fetch first record (Si) from sacrifices queue (SQ)2. apply parent filtering to Si
3. If Si not filtered → sacrifice (compute Q-Si distance)4. try to filter out rNNs(Si) (NN-graph filtering)5. move non-filtered rNNs(Si) to the beginning of SQ (rNNs sets are disjoint
→ non-filtered become sacrifices)6. apply basic filtering to Si
ADBIS 2007 10
Sacrifice selection selection of sacrifices is important
good pivot filters many objects out poor pivot filters good possible pivot(s) (future sacrifices)
Heuristics M*-tree
hMaxRNNCount first in SQ is object with highest number of rNNs
hMinRNNDistance first in SQ is object nearest to its NN or rNN
hMinToParentDistance first in SQ is object closest to parent object
PM*-tree hMinLmaxDistance
first in SQ is object with minimum L∞ distance hMaxLmaxDistance
first in SQ is object with maximum L∞ distance
ADBIS 2007 11
Experimental Results Corel dataset
65,615 feature vectors of images L1 distance function 8 dimensions
Polygons dataset synthetic 1,000,000 randomly generated 2D polygons (5-10 vertices) Hausdorff set distance function
GenBank Dataset 250,000 strings of proteins (of lengths 50-100) edit distance function
Testing of computation costs (number of distance computations)
ADBIS 2007 12
Experiments – Corel Dataset
ADBIS 2007 13
Experiments – Polygons Dataset
ADBIS 2007 14
Experiments- Genbank Dataset
ADBIS 2007 15
Conclusion
We have proposed enhancing nodes of M-tree like structures by nearest-
neighbors graphs filtering technique based on NN-graphs → NN-graph
filtering
We have implemented M*-tree (enhancement of M-tree by NN-graphs) PM*-tree (enhancement of PM-tree by NN-graphs)
Experimental results we have shown up to 45% speed-up