Top Banner
Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington
59

Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Dec 13, 2015

Download

Documents

Neil Richard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

Fast Similarity Search in Image Databases

CSE 6367 – Computer VisionVassilis Athitsos

University of Texas at Arlington

Page 2: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

2

4128 images are generated for each hand shape.

Total: 107,328 images.

A Database of Hand Images

Page 3: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

3

Efficiency of the Chamfer Distance

• Computing chamfer distances is slow.– For images with d edge pixels, O(d log d) time.– Comparing input to entire database takes over 4 minutes.

• Must measure 107,328 distances.

input model

Page 4: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

4

The Nearest Neighbor Problem

database

Page 5: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

5

The Nearest Neighbor Problem

query

• Goal: – find the k nearest

neighbors of query q.database

Page 6: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

6

The Nearest Neighbor Problem

• Goal: – find the k nearest

neighbors of query q.

• Brute force time is linear to:– n (size of database).– time it takes to measure a

single distance.

query

database

Page 7: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

7

• Goal: – find the k nearest

neighbors of query q.

• Brute force time is linear to:– n (size of database).– time it takes to measure a

single distance.

The Nearest Neighbor Problem

query

database

Page 8: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

8

Examples of Expensive Measures

DNA and protein sequences: Smith-Waterman.

Dynamic gestures and time series: Dynamic Time Warping.

Edge images: Chamfer distance, shape context distance.

These measures are non-Euclidean, sometimes non-metric.

Page 9: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

9

Embeddings

database

x1

x2

x3

xn

Page 10: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

10

database

x1

x2

x3

xn

embedding F

x1x2

x3

x4

xn

Rd

Embeddings

Page 11: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

11

database

x1

x2

x3

xn

embedding F

x1x2

x3

x4

xn

q

query

Rd

Embeddings

Page 12: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

12

x1x2

x3

x4

xn

q

database

x1

x2

x3

xn

embedding F

q

query

Rd

Embeddings

Page 13: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

13

x1x2

x3

x4

xn

Rd

q

Measure distances between vectors (typically much faster).

Caveat: the embedding must preserve similarity structure.

Embeddingsdatabase

x1

x2

x3

xn

embedding F

q

query

Page 14: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

14

Reference Object Embeddings

original space X

Page 15: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

15

Reference Object Embeddings

original space X

r

r: reference object

Page 16: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

16

Reference Object Embeddings

original space X

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

r

Page 17: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

17

Reference Object Embeddings

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

r

Page 18: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

18

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

F(r) = D(r,r) = 0

r

Reference Object Embeddings

Page 19: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

19

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

a

F(r) = D(r,r) = 0 If a and b are similar,

their distances to r are also similar (usually).

br

Reference Object Embeddings

Page 20: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

20

original space X Real lineF

r: reference object Embedding: F(x) = D(x,r)

D: distance measure in X.

a

F(r) = D(r,r) = 0 If a and b are similar,

their distances to r are also similar (usually).

br

Reference Object Embeddings

Page 21: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

21

F(x) = D(x, Lincoln)

F(Sacramento)....= 1543F(Las Vegas).....= 1232F(Oklahoma City).= 437F(Washington DC).= 1207F(Jacksonville)..= 1344

Page 22: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

22

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)F(Las Vegas).....= ( 262, 1232, 2405)F(Oklahoma City).= (1345, 437, 1291)F(Washington DC).= (2657, 1207, 853)F(Jacksonville)..= (2422, 1344, 141)

Page 23: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

23

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando))

F(Sacramento)....= ( 386, 1543, 2920)F(Las Vegas).....= ( 262, 1232, 2405)F(Oklahoma City).= (1345, 437, 1291)F(Washington DC).= (2657, 1207, 853)F(Jacksonville)..= (2422, 1344, 141)

Page 24: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

24

Embedding Hand Images

F(x) = (C(x, R1), C(A, R2), C(A, R3))

R1

R2

R3

image x

x: hand image. C: chamfer distance.

Page 25: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

25

Basic Questions

R1

R2

R3

x: hand image. C: chamfer distance.

How many prototypes? Which prototypes? What distance should we

use to compare vectors?

image x

F(x) = (C(x, R1), C(A, R2), C(A, R3))

Page 26: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

26

Some Easy Answers.

R1

R2

R3

x: hand image. C: chamfer distance.

How many prototypes? Pick number manually.

Which prototypes? Randomly chosen.

What distance should we use to compare vectors? L1, or Euclidean.

image x

F(x) = (C(x, R1), C(A, R2), C(A, R3))

Page 27: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

27

Filter-and-refine Retrieval

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

Page 28: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

28

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

Page 29: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

29

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

Page 30: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

30

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

Page 31: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

31

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

Page 32: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

32

Evaluating Embedding Quality

Embedding step: Compute distances from query to reference

objects F(q). Filter step:

Find top p matches of F(q) in vector space. Refine step:

Measure exact distance from q to top p matches.

How often do we find the true nearest neighbor?

How many exact distance computations do we need?

Page 33: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

33

query

Database (107,328 images)

nearestneighborBrute force retrieval time: 260 seconds.

Results: Chamfer Distance on Hand Images

Page 34: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

34

Brute Force

Embeddings Embeddings

Accuracy 100% 95% 100%

# of distances 80640 1866 24650

Sec. per query 112 2.6 34

Speed-up factor 1 43 3.27

Query set: 710 real images of hands.

Database: 80,640 synthetic images of hands.

Results: Chamfer Distance on Hand Images

Page 35: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

35

Ideal Embedding Behavior

original space X F Rd

Notation: NN(q) is the nearest neighbor of q.

For any q: if a = NN(q), we want F(a) = NN(F(q)).

aq

Page 36: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

36

A Quantitative Measure

qa

original space X F Rd

b

If b is not the nearest neighbor of q,F(q) should be closer to F(NN(q)) than to F(b).

For how many triples (q, NN(q), b) does F fail?

Page 37: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

37

A Quantitative Measure

original space X F Rd

qa

F fails on five triples.

Page 38: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

38

Embeddings Seen As Classifiers

qa

b

Classification task: is qcloser to a or to b?

Page 39: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

39

Any embedding F defines a classifier F’(q, a, b). F’ checks if F(q) is closer to F(a) or to F(b).

qa

b

Embeddings Seen As Classifiers

Classification task: is qcloser to a or to b?

Page 40: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

40

Given embedding F: X Rd: F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||.

F’(q, a, b) > 0 means “q is closer to a.” F’(q, a, b) < 0 means “q is closer to b.”

qa

b

Classifier Definition

Classification task: is qcloser to a or to b?

Page 41: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

41

Given embedding F: X Rd: F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||.

F’(q, a, b) > 0 means “q is closer to a.” F’(q, a, b) < 0 means “q is closer to b.”

Classifier Definition

Goal: build an F such that F’ has low

error rate on triples of type (q, NN(q), b).

Page 42: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

42

1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

Page 43: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

43

1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

We can define lots of different classifiers. Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?

Page 44: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

44

1D Embeddings as Weak Classifiers

1D embeddings define weak classifiers. Better than a random classifier (50% error rate).

We can define lots of different classifiers. Every object in the database can be a reference object.

Question: how do we combine many such

classifiers into a single strong classifier?

Answer: use AdaBoost. AdaBoost is a machine learning method designed for

exactly this problem.

Page 45: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

45

Using AdaBoostoriginal space X

Fn

F2

F1

Real line

Output: H = w1F’1 + w2F’2 + … + wdF’d . AdaBoost chooses 1D embeddings and weighs them. Goal: achieve low classification error. AdaBoost trains on triples chosen from the database.

Page 46: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

46

From Classifier to Embedding

AdaBoost output H = w1F’1 + w2F’2 + … + wdF’d

What embedding should we use?What distance measure should we use?

Page 47: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

47

From Classifier to Embedding

AdaBoost output

BoostMap embedding

H = w1F’1 + w2F’2 + … + wdF’d

F(x) = (F1(x), …, Fd(x)).

Page 48: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

48

From Classifier to Embedding

AdaBoost output

BoostMap embedding

Distance measure

D((u1, …, ud), (v1, …, vd)) = i=1 wi|ui – vi|

d

F(x) = (F1(x), …, Fd(x)).

H = w1F’1 + w2F’2 + … + wdF’d

Page 49: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

49

From Classifier to Embedding

AdaBoost output

D((u1, …, ud), (v1, …, vd)) = i=1 wi|ui – vi|

d

F(x) = (F1(x), …, Fd(x)).BoostMap embedding

Distance measure

Claim: Let q be closer to a than to b. H misclassifiestriple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a.

H = w1F’1 + w2F’2 + … + wdF’d

Page 50: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

50

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 51: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

51

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 52: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

52

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 53: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

53

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 54: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

54

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 55: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

55

Proof

H(q, a, b) =

= wiF’i(q, a, b)

= wi(|Fi(q) - Fi(b)| - |Fi(q) - Fi(a)|)

= (wi|Fi(q) - Fi(b)| - wi|Fi(q) - Fi(a)|)

= D(F(q), F(b)) – D(F(q), F(a)) = F’(q, a, b)

i=1

d

i=1

d

i=1

d

Page 56: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

56

Significance of Proof• AdaBoost optimizes a direct measure of

embedding quality.

• We have converted a database indexing problem into a machine learning problem.

Page 57: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

57

query

Database (80,640 images)

nearestneighborBrute force retrieval time: 112 seconds.

Results: Chamfer Distance on Hand Images

Page 58: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

58

Query set: 710 real images of hands.

Database: 80,640 synthetic images of hands.

Results: Chamfer Distance on Hand Images

Brute Force

Random Reference

ObjectsBoostMap

Accuracy 100% 95% 95%

# of distances 80640 1866 450

Sec. per query 112 2.6 0.63

Speed-up factor 1 43 179

Page 59: Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.

59

Query set: 710 real images of hands.

Database: 80,640 synthetic images of hands.

Results: Chamfer Distance on Hand Images

Brute Force

Random Reference

ObjectsBoostMap

Accuracy 100% 100% 100%

# of distances 80640 24950 5995

Sec. per query 112 34 13.5

Speed-up factor 1 3.23 8.3