Top Banner
Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research - Haifa Presented by: Shai Erera
25

Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Metric Inverted -An efficient inverted indexing

method for metric spaces

Benjamin SznajderJonathan MamouYosi MassMichal Shmueli-Scheuer

IBM Research - Haifa

Presented by: Shai Erera

Page 2: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Outline

Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

Page 3: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Outline

Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

Page 4: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Motivation

Web 2.0 enables mass multimedia productions

Still, search is limited to manually added metadata State of the art solutions for CBIR (Content Based

Image Retrieval) do not scale– Reveal linear scalability in the collection size due to large

number of distance computations Can we use textIR methods to scale up CBIR?

Page 5: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Outline

Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

Page 6: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Problem definition

Low level image features can be generalized to Metric Spaces

Metric Space: An ordered pair (S,d) , where S is a domain and d a distance function d: S x S R such that

– d satisfies non-negativity, reflexibility, symmetry and triangle inequality

The best-k results for a query in a metric space are the k objects with the smallest distance to the query

– Convert distances to scores (small distance – high score) between [0,1]

Page 7: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Problem definition

Top-K Problem:– Assume m metric spaces, a Query Q, an

aggregate function f and a score function sd():– Retrieve the best k objects D with highest

f(sd1(Q,D), sd2(Q,D)…sdm(Q,D))

q

k=5

Page 8: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Outline

Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

Page 9: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Metric Inverted Index

Assume a collection of objects each having m features – Object D = {F1:v1, F2:v2,…, Fm:vm}

– m metric spaces

Indexing steps– Lexicon creation (select candidates)– Invert objects (canonization to lexicon terms)

Page 10: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Metric inverted indexing – Lexicon creation

Number of different features too large Need to select candidates

– Naïve solution: Lexicon of fixed size l Select randomly l/m documents and extract their features These l features form our lexicon

– Improvement Replace the random choice by clustering (K-Means etc.)

Keep the lexicon in an M-Tree structure

Page 11: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Metric inverted indexing – invert objects

Given object D = {F1:v1, F2:v2,…, Fm:vm}

Canonization – map features (Fi:vi) to lexicon entries– For each feature select the n nearest lexicon terms – D’ = {F1:v11, F1:v12, …F1:v1n,

F2:v21, F2:v22, …F2:v2n, …

Fm:vm1, Fm:vm2, …Fm:vmn}

Index D’ in the relevant posting-lists

Page 12: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Outline

Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

Page 13: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Retrieval stage – term selection

Given Q = {F1:qv1, F2:qv2,…, Fm:qvm} Canonization

– For each feature select the n nearest lexicon terms

– Q’ = {F1:qv11, F1:qv12, …F1:qv1n, F2:qv21, F2:qv22, …F2:qv2n, … Fm:qvm1, Fm:qvm2, …Fm:qvmn}

Page 14: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Retrieval stage – Boolean Filtering

These m*n posting-lists will be queried via a Boolean Query

Two possible modes:– Strict-query-mode:

– Fuzzy-query-mode:

Page 15: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Retrieval stage – Scoring

Documents retrieved by the Boolean Query are fully scored

Return the best k objects with the highest aggregate score f(sd_1(Q,D),sd_2(Q,D),… ,sd_m(Q,D))

Page 16: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Outline

Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

Page 17: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Experiments

Focus on:– Efficiency– Effectiveness

Collection of 160,000 images from Flickr 3 features are extracted from each image

– EdgeHistogram, ScalableColor and ColorLayout

180 queries – Fuzzy-Query-Mode– Sampled from the collection of images

Compared to M-tree data-structure

Page 18: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Experiments – Measures Used

Effectiveness: MAP is a natural candidate for measuring– Problem: In Image Retrieval, no document is irrelevant

– Solution: we defined as relevant the k highest scored documents in the collection (according to the M-Tree computation)

– MAP@K: MAP computed on relevant and retrieved lists of size k

Page 19: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Experiments – Measures Used contd.

Efficiency: we compute the number of computations per query– A computation unit (cu) is a distance computation

call between two feature values

Page 20: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Effectiveness

MAP vs. number of Nearest Terms size of the lexicon = 12000

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

5 10 15 20 25 30

# nearest terms

MA

P

K=10

K=20

K=30

Page 21: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Effectiveness

MAP vs. lexicon size Number Nearest Terms =30

0.80.820.840.860.880.9

0.920.940.960.98

1

3000 12000 24000 48000

lexicon's size

MA

P

K=10

K=20

K=30

Page 22: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Effectiveness vs. Efficiency

MAP vs. number of comparisons Number Nearest Terms =30

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0 50000 100000 150000 200000

#comparisons

MA

P

lexicon=3000

lexicon=12000

lexicon=24000

lexicon=48000

Page 23: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

M-Tree vs. Metric Inverted

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

5 10 20 30 40

top-k

# co

mp

aris

on

s

M-Tree

MII - lexicon=3000

MII - lexicon=12000

MII - lexicon=24000

MII - lexicon=48000

Number of comparisons vs. top-k Number Nearest Terms =30

Page 24: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Outline

Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

Page 25: Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Conclusions

We reduce the gap between Text IR and Multimedia Retrieval

Our method achieves very good approximation (MAP = 98%)

Our method improves drastically the efficiency (90%) over state-of-the-art methods