An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)
Dec 21, 2015
Motivation
kNN is a popular (GIS, AI, Pattern Recognition, Clustering, Outlier Detection)
kNN is a hard problem R-tree is the industry standard (Oracle,
Microsoft SQL Server, DB2, and MySQL) Problems with higher dimensional spaces GIS
Related Work
Voronoi Diagram Incremental approach (find k+1 using k) High dimensions (X-tree) New data structures (k-d tree, P-range
tree, X-tree, SS-tree, …)
Problem Definition
Given: Spatial database with n objects and query point, q.
Find: The k ≤ n ranked nearest neighbors. Objective:
Use object classifications Incremental
Constraints: Spatial objects are stored in an R-Tree
Key Ideas
Allow users to define domain-specific classifiers to decrease search space
Use informed, incrementally increasing query region to decrease search space
Don’t worry about finding exactly k nearest neighbors.
Approach
Object Classification Distance Classification Incrementally increasing concentric circle
query regions
Object Classification
Domain specific classifiers. Only search MBBs that contain
classifications Adds classification dimensions. Example: Zoning Classifier
{“Residential”, “Industrial”, Commercial”}
Distance Classification
Maps Euclidean distance/increment generator to region
Default function Separate R-tree
Concentric Circles
Decrease candidate regions Only consider MBBs that are completely
contained in query region Ignore previously searched MBBs
Validation
Find nearest gas stations (Zoning example) 1.7% total searchable area of Minneapolis
Complexity: p classifiers with q classifications
Computational: O(p*logα(q))* O(logα(n)) ≈ O(logα(n))
Spatial: (p*q*s + t)(n + α*logαn + α)
Conclusion
Expand R-trees for kNN User-defined, domain specific classifiers to
decrease search space User defined incremental distance function Increasing Euclidean distance, Concentric
Circles
Future Work
Extend distance classifier to include many classifiers Non-Euclidean distance (e.g. speed limit) Combine distance classification tree with data tree Experiment Plan for incrementally upgrading existing R-tree
implementations Determine threshold for number of classifiers and
classifications