Nearest Neighbor and Locality-Sensitive Hashing Yaniv Masler IDC - 16.03.08 l me who your neighbors are, and I'll know who you
Nearest Neighbor and Locality-Sensitive Hashing
Yaniv Masler
IDC - 16.03.08
Tell me who your neighbors are, and I'll know who you are
Lecture Outline• Variants of NN• Motivation• Algorithms:
– Linear scan– Quad-trees– kd-trees– Locality Sensitive Hashing– R-tree (and its variants)– VA-file
• Examples– Colorization by Example– Medical Pattern recognition
– Hand written digit recognition
Nearest Neighbor Search
• Given: a set P of n points in Rd
• Goal: a data structure, which given a query point q, finds the nearest neighbor p of q in P
qp
Algorithms for Nearest Neighbor Search / Piotr Indyk
Nearest Neighbor SearchProblem: what's the nearest restaurant to my hotel?
Near neighbor (range search)
Or: find all restaurants up to 400m from my hotel
Problem: find one/all points in P within distance r from q
Approximate Near neighbor
Or: find a restaurant that is near my hotel
Problem: find one/all points p’ in P, whose distance to q is at most (1+e) times the distance from q to its nearest neighbor
K-Nearest-Neighbor
Or: find the 4 closest restaurants to my hotel
Problem: find the K points nearest q
Spatial join
Problem: Find pairs of hotels and shopping malls which are at most 100m apart
Problem: given two sets P,Q, find all pairs p in P, q in Q, such that p is within distance r from q
Nearest Neighbour Rule
Non-parametric pattern classification.
Consider a two class problem where each sample consists of two measurements (x,y).
k = 1
k = 3
For a given query point q, assign the class of the nearest neighbour.
Compute the k nearest neighbours and assign the class by majority vote.
http://www.robots.ox.ac.uk/~dclaus/cameraloc/samples/nearestneighbour.ppt
MotivationThe nearest neighbor search problem arises
in numerous fields of application, including:
Pattern recognition Statistical classificationComputer vision DatabasesCoding theory Data compressionInternet marketing DNA sequencing Spell checking Plagiarism detection Copyright violation detection and many more
Algorithms
• Main memory (Computational Geometry)– linear scan– tree-based:
• quadtree
• kd-tree
– hashing-based: Locality-Sensitive Hashing
• Secondary storage (Databases)– R-tree (and numerous variants)– Vector Approximation File (VA-file)
Linear scan (Naïve approach)• The simplest solution to the NNS problem
• Compute the distance from the query point to every other point in the database, keeping track of the "best so far".
• This algorithm works for small databases but quickly becomes intractable as either the size or the dimensionality of the problem becomes large.
• Running time is O(Nd).
Quad-tree
Split the space into 2d equal subsquares
Repeat until done:only one pixel leftonly one point leftonly a few points left
A simple data structure
Range search
• Near neighbor (range search):– put the root on the stack– repeat
• pop the next node T from the stack• for each child C of T:
– if C is a leaf, examine point(s) in C– if C intersects with the ball of radius r around q, add C to
the stack
Algorithms for Nearest Neighbor Search / Piotr Indyk
Quad-tree - structure
X
Y
X1,Y1 P≥X1P≥Y1
P<X1P<Y1
P≥X1P<Y1
P<X1P≥Y1
X1,Y1
http://www.wisdom.weizmann.ac.il/~mica/CVspring06/presentations/Dan_Tomer.ppt
Quad-tree - Query
X
Y
X1,Y1P<X1P<Y1 P<X1
P≥Y1
X1,Y1
P≥X1P≥Y1
P≥X1P<Y1
http://www.wisdom.weizmann.ac.il/~mica/CVspring06/presentations/Dan_Tomer.ppt
Quad-tree
• Simple data structure
• What's the downside ?
Quad-tree – Pitfall1
X
Y
X1,Y1P≥X1P≥Y1
P<X1
P<X1P<Y1 P≥X1
P<Y1P<X1P≥Y1
X1,Y1
http://www.wisdom.weizmann.ac.il/~mica/CVspring06/presentations/Dan_Tomer.ppt
Quad-tree – pitfall 2
X
Y
Running Time: O(2d)
Space and Time Exponential in dimensionshttp://www.wisdom.weizmann.ac.il/~mica/CVspring06/presentations/Dan_Tomer.ppt
Kd-trees [Bentley’75]
• Main ideas:– only one-dimensional splits– instead of splitting in the middle, choose the
split “carefully” (many variations)– near(est) neighbor queries: as for quad-trees
Algorithms for Nearest Neighbor Search / Piotr Indyk
47
6
5
1
3
2
9
8
10
11
l5
l1 l9
l6
l3
l10 l7
l4
l8
l2
l1
l8
1
l2l3
l4 l5 l7 l6
l9l10
3
2 5 4 11
9 10
8
6 7
Kd-Trees Construction
http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/approx_nn_theory/Nearest_Neighbor_Theory.ppt
47
6
5
1
3
2
9
8
10
11
l5
l1 l9
l6
l3
l10
l7
l4
l8
l2
l1
l8
1
l2l3
l4 l5 l7 l6
l9l10
3
2 5 4 11
9 10
8
6 7
q
Kd-Trees Query
http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/approx_nn_theory/Nearest_Neighbor_Theory.ppt
Kd-trees
• Advantages:– no (or less) empty spaces– only linear space
• Exponential query time still possible– However if we dont do something really
stupid, query time is at most dn– This is still quite bad though, when the
dimension is around 20-30
Approximate nearest neighbor
• Can do it using k-d trees, by interrupting search earlier [Arya et al’94]
• Basically, After each search step, check if you are close enough, if so stop.
• Not good for exact queries.
• What about a different approach:– can we adapt hashing to nearest neighbor search ?
Locality-Sensitive Hashing[Indyk-Motwani’98]
Key Idea
• Preprocessing : – Hash the data-point using several LSH
functions so that probability of collision is higher for closer objects
• Querying :– Hash query point and retrieve elements in
the buckets containing the query point
Locality-Sensitive Hashing
• Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have:– Pr[h(p)=h(q)] is “high” if p is “close” to q– Pr[h(p)=h(q)] is “low” if p is ”far” from q
Algorithms for Nearest Neighbor Search / Piotr Indyk
Do such functions exist ?
• Consider the hypercube, i.e.,– points from {0,1}d
– Hamming distance D(p,q)= # positions on which p and q differ
• Define hash function h by choosing a set S of k random coordinates, and setting
h(p) = projection of p on S
Richard Hamming Algorithms for Nearest Neighbor Search / Piotr Indyk
011
Example
Hash function h()
Taked=12, p=010111001011
k=3, S={2,5,10}
Store p into the matching bucket
2k buckets
110
p=010111001011
h(p)=110
h’s are locality-sensitive
• Pr[h(p)=h(q)]=(1-D(p,q)/d)k
• We can vary the probability by changing k
k=1 k=2
distance distance
Pr Pr
Algorithms for Nearest Neighbor Search / Piotr Indyk
How can we use LSH ?
• Choose several h1..hl
• Initialize a hash array for each hi
• Store each point p in the bucket hi(p) of the i-th hash array, i=1...l
• In order to answer query q– for each i=1..l, retrieve points in a bucket hi(q)
– return the closest point found
Algorithms for Nearest Neighbor Search / Piotr Indyk
LSH - Algorithm
h1(pi) h2(pi) hL(pi)
TLT2T1
pi
P
http://www.cmpe.boun.edu.tr/courses/cmpe521/fall2002/Similarty_Search_in_High_Dimensions_via_Hashing.ppt
What does this algorithm do ?
• By proper choice of parameters k and l, we can make, for any p, the probability that
hi(p)=hi(q) for some i look like this:
• Can control:– Position of the slope– How steep it is
distance
Algorithms for Nearest Neighbor Search / Piotr Indyk
The LSH algorithm
• Therefore, we can solve (approximately) the near neighbor problem with given parameter r
• Worst-case analysis guarantees dn1/(1+) query time
• Practical evaluation indicates much better behavior [GIM’99,HGI’00,Buh’00,BT’00]
• Drawbacks:• works best for Hamming distance (although can be
generalized to Euclidean space)• requires radius r to be fixed in advance
Algorithms for Nearest Neighbor Search / Piotr Indyk
Secondary storage• As mentioned in the Motivation Slide,
There are many usages for NN.
• Some store large datasets that need secondary storage.
Secondary storage
• Grouping the data is crucial
• Different approach required:– in main memory, any reduction in the number
of inspected points was good– on disk, this is not the case !
Disk-based algorithms
• R-tree [Guttman’84]– departing point for many variations– over 600 citations ! (according to CiteSeer)– “optimistic” approach: try to answer queries in
logarithmic time
• Vector Approximation File [WSB’98]– “pessimistic” approach: if we need to scan the whole
data set, we better do it fast
• LSH works also on disk
Algorithms for Nearest Neighbor Search / Piotr Indyk
R-tree
• “Bottom-up” approach (k-d-tree was “top-down”) :– Start with a set of points/rectangles– Partition the set into groups of small cardinality– For each group, find minimum rectangle
containing objects from this group– Repeat
Algorithms for Nearest Neighbor Search / Piotr Indyk
R-tree
R-tree
• Advantages:– Supports near(est) neighbor search (similar
as before)– Works for points and rectangles– Avoids empty spaces– Many variants: X-tree, SS-tree, SR-tree etc– Works well for low dimensions
• Not so great for high dimensions
Algorithms for Nearest Neighbor Search / Piotr Indyk
VA-file [Weber, Schek, Blott’98]
• Approach:– In high-dimensional spaces, all tree-based
indexing structures examine large fraction of leaves
– If we need to visit so many nodes anyway, it is better to scan the whole data set and avoid performing seeks altogether
– 1 seek = transfer of few hundred KB
Algorithms for Nearest Neighbor Search / Piotr Indyk
VA-file
• Natural question: how to speed-up linear scan ?
• Answer: use approximation– Use only i bits per dimension (and speed-up
the scan by a factor of 32/i) – Identify all points which could be returned as
an answer– Verify the points using original data set
Algorithms for Nearest Neighbor Search / Piotr Indyk
VA-file
• Tile d-dimensional data-space uniformly into 2b rectangular cells.
• b bits for each approximation
Where’s Waldo ?
R.Irony, D.Cohen-Or, D.Lischinski
Colorization by example
MotivationColorization, is the process of adding color to monochrome images and video.
Colorization typically involves segmentation + tracking regions across frames - neither can be done reliably – user intervention required – expensive + time consuming
Colorization by example – no need for accurate segmentation/region tracking
The method• Colorize a grayscale image based on a
user provided reference.
Reference Image
Naïve MethodTransferring color to grayscale images [Walsh, Ashikhmin, Mueller 2002]
• Find a good match between a pixel and its neighborhood in a grayscale image and in a reference image.
By Example Method
Overview1. training2. classification3. color transfer
Training stage
Input:
1. The luminance channel of the reference image
2. The accompanying partial segmentation
Construct a low dimensional feature space in which it is easy to discriminate between pixels belonging to differently labeled regions, based on a small (grayscale) neighborhood around each pixel.
Training stage
Create Feature Space(get DCT cefficients)
Classification stage
For each grayscale image pixel, determine which region should be used as a color reference for this pixel.
One way: K-Nearest –Neighbor Rule
Better way: KNN in discriminating subspace.
KNN in discriminating subspace
Originaly sample point has a majority of Magenta
KNN in discriminating subspace
Rotate Axes in the direction of the intra-difference vector
KNN in discriminating subspace
Project Points onto the axis of the inter-difference vector
Nearest neigbors are now cyan
KNN Differences
Simple KNN
Discriminating KNN
Matching Classification
Use a median filter to create a cleaner
classification
Color transfer
Using YUV color space
YUV
Final Results
• Pattern recognition is the application of (statistical) techniques with the objective of classifying a set of objects into a number of distinct classes. Pattern recognition is applied in virtually all branches of science. Medical examples are as follow:
• Pattern recognition methods exploit the similarities of objects belonging to the same class and the dissimilarities of objects belonging to different classes.
Field Objects Objective
Cytology Cells Detection of carcinomas
Genetics Chromosomes Karyotyping
Cardiology ECGs Detection of coronary diseases
Neurology EEGs Detection of neurological conditions
Pharmacology Drugs Monitoring of medication
Diagnostics Disease patterns Computer-assisted decisions
Medical Pattern Recognition
HINF 2502 (Clinical Processes and Decision Making)© Hadi Kharrazi, Dalhousie University
Syntactic Pattern Recognition• In syntactic or linguistic pattern recognition, objects are
described as a set of primitives. A primitive is an elementary component of an object. The object is then recognized by the sequence in which the primitives appear in the object description.
• A simple example of a set of primitives is the Morse alphabet. The objects are the individual characters and the spaces between words. A grammar describes the sequence in which these primitives constitute the various characters.
• A medical sample of syntactic pattern recognition is karyotype where similar chromosomes are intended to be grouped. In this case the set of primitives describing a contour may be the following set: {convexity(a), straight part(b), deep concavity(c), shallow concavity(d)}
Ctn. Pattern Recognition
HINF 2502 (Clinical Processes and Decision Making)© Hadi Kharrazi, Dalhousie University
Syntactic Pattern Recognition
• A medical sample of syntactic pattern recognition is karyotype where similar chromosomes are intended to be grouped.
• A Karyotype is the characteristic chromosome complement of a eukaryote species (wikipedia)
Ctn. Pattern Recognition
Syntactic description of a submedian and a median chromosome in terms of primitives.
HINF 2502 (Clinical Processes and Decision Making)© Hadi Kharrazi, Dalhousie University
Statistical Pattern Recognition• In statistical pattern recognition objects are described by
numerical features. This method is categorized into: supervised and unsupervised techniques.
• In supervised techniques the number of distinct classes are known and a set of example objects is available. These objects are labeled with their class membership. The problem is to assign a new unclassified object to one of the classes.
• In unsupervised techniques (such as clustering) a collection of observations is given and the problem is to establish whether these observations naturally divide into two or more different classes.
Ctn. Pattern Recognition
HINF 2502 (Clinical Processes and Decision Making)© Hadi Kharrazi, Dalhousie University
Supervised Pattern Recognition• In supervised pattern recognition, class recognition is based on
the differences of the statistical distributions of the features between the various classes. The development of supervised classification rules normally proceeds in two steps:
• Learning phase: In this step the classification rule is designed on the basis of class properties as derived from a collection of class-labeled objects called the design (training) set.
• Validation phase: In this step another collection of class labeled objects called test set will be tested by the results from the learning phase. Thus, the proportion of correct classifications obtained by the rule can be calculated.
Ctn. Pattern Recognition
HINF 2502 (Clinical Processes and Decision Making)© Hadi Kharrazi, Dalhousie University
• 1-Nearest-Neighbor Rule: In the simplest form, to classify an unknown object the nearest object from the learning set is identified. The unknown object is then assigned to the class to which its nearest neighbor belongs.
• q-Nearest-Neighbor Rule: Rather than deciding on class membership on the basis of a single nearest neighbor, a quorum of q nearest neighbors is inspected. The class membership of the unknown object is them established on the basis of the majority of the class memberships of these q nearest neighbors.
•The problem with NN rules is that they are justifiable only with large learning sets and this increases the computational time.
Ctn. Pattern Recognition
HINF 2502 (Clinical Processes and Decision Making)© Hadi Kharrazi, Dalhousie University
Ctn. Pattern Recognition
Illustration of nearest-neighbor classification. The learning set consists of objects belonging to three different classes: class 1 (blue), class 2 (red) and class 3 (black). Using one neighbor only, the 1-NN rule assigns the unknown object (yellow) to class 1. The 5-NN rule assigns the object to class 3, whereas the (5,4)-NN rule leaves the object unassigned.
HINF 2502 (Clinical Processes and Decision Making)© Hadi Kharrazi, Dalhousie University
Back To Computer Science
How we use Nearest Neighbor for OCR ?
Venkat Raghavan N. S., Saneej B. C., and Karteek Popuri
Department of Chemical and Materials EngineeringUniversity of Alberta, Canada.
Classification techniques for Hand-Written Digit Recognition
Sample Data• Normalize sample character to 16x16
grayscale image.
• We now have 256 pixels we can use as a characters feature vector. Xi=[xi1, xi2, ……. xi256]
• Collect Many Samples– dataset size: n * 256
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
16
xij ]1,0[
1616
Lets reduce dimensions
How ?
PCAPrincipal Component Analysis
(we skipped this lecture)
Principal Components AnalysisThe Basic Principle
PCA transforms a set of correlated variables into a smaller set of uncorrelated variables called principal components.
The ObjectiveDiscovering the “true dimension” of the data.
It may be that p dimensional data can be represented in q < p dimensions without losing much information
Samples can be found: http://www.cs.mcgill.ca/~sqrt/dimr/dimreduction.html
Dimension reduction - PCA• PCA done on the mean centered images
• The larger an Eigen value the more important is that Eigen digit.
• Based on the Eigen values first 64 PCs were found to be significant
• Any image represented nowby its PC: Y= [y1 y2….....y64 ]
• Each sample now has 64 variables– dataset size: n * 256
AVERAGE IMAGE
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
16
AVERAGE DIGIT
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
5 10 15
5
10
15
EIGEN DIGITS
Interpreting the PCs as Image Features
• Basically, the Eigen vectors are the rotation of the original axes to more meaningful directions.
• The PCs are the projection of the data onto each of these new axes.
• This is similar to what we did in ‘Colorization by Example’
Nearest Neighbour Classifier
• No assumption about distribution of data
• Euclidean distance to find nearest neighbour
Test point assigned to Class 2
Class 2
Class 1
Finds the nearest neighbours from the training set to test image and assigns its label to test image.
http://www.ualberta.ca/~slshah/files/Handwritten%20Digit%20Recognition.ppt
K-Nearest Neighbour Classifier (KNN)
• Compute the k nearest neighbours and assign the class by majority vote.
k = 3
Test point assigned to Class 1
Class 2 ( 1 vote )
Class 1 ( 2 votes )
http://www.ualberta.ca/~slshah/files/Handwritten%20Digit%20Recognition.ppt
1-NN Classification Results:
No of PCs 256 150 64
AER % 7.09 7.01 6.45
Using 64 PCs gives better results
Using higher k’s does not show improvement in recognition rate
http://www.ualberta.ca/~slshah/files/Handwritten%20Digit%20Recognition.ppt
Misclassification in NN:
0 1 2 3 4 5 6 7 8 90 1376 0 4 2 0 5 12 2 0 01 0 1113 1 0 1 0 2 0 2 02 22 9 728 17 4 4 6 16 18 23 4 0 4 690 2 26 0 4 6 34 3 15 9 0 687 0 7 2 4 325 9 3 12 37 5 517 32 0 23 96 10 3 5 0 3 2 714 0 3 27 0 6 1 0 19 0 0 657 1 208 8 11 1 26 7 7 8 5 547 139 6 1 2 0 23 0 0 32 0 664
Ac
tua
l
Recognised as
Euclidean distances between transformed images of same class can be very high
http://www.ualberta.ca/~slshah/files/Handwritten%20Digit%20Recognition.ppt
Issues in NN:
Expensive: To determine the nearest neighbour of a test image, must compute the distance to all N training examples
Storage Requirements: Must store all training data
http://www.ualberta.ca/~slshah/files/Handwritten%20Digit%20Recognition.ppt