Top Banner
1 Hashing: Object Embedding Reporter: Xu Jiaming (PH.D Student) Date: 2014.03.27 Computational-Brain Research Center Institute of Automation, Chinese Academy of Sciences Report
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 20140327 - Hashing Object Embedding

1

Hashing: Object Embedding

Reporter: Xu Jiaming (PH.D Student) Date: 2014.03.27

Computational-Brain Research Center

Institute of Automation, Chinese Academy of Sciences

Report

Page 2: 20140327 - Hashing Object Embedding

2

First, What is Embedding?

[出自 ]: https://en.wikipedia.org/wiki/Embedding

When some object X is said to be embedded in another object Y, the embedding is given by some injective and structure-preserving map f : X → Y. The precise meaning of "structure-preserving" depends on the kind of mathematical structure of which X and Y are instances.

Structure-Preserving in IR:

1 2 1 2

:

(X , X ) (Y , Y )

f

Sim Sim

→≈X Y

Page 3: 20140327 - Hashing Object Embedding

3

Then, What is Hash?

[出自 ]: https://en.wikipedia.org/wiki/Hash_table

The hash function will assign each key to a unique bucket, but this situation is rarely achievable in practice (usually some keys will hash to the same bucket). Instead, most hash table designs assume that hash collisions—different keys that are assigned by the hash function to the same bucket—will occur and must be accommodated in some way.

Page 4: 20140327 - Hashing Object Embedding

4

Combine the Two Properties

[1998, Piotr Indyk, cited: 1847]

Locality Sensitive Hashing

. ( , ) , . Pr[ ( ) ( )] 1

. ( , ) (1 ), . Pr[ ( ) ( )] 1

if D p q r then h p h q p

if D p q r then h p h q pε≤ = ≥

> + = ≥

Page 5: 20140327 - Hashing Object Embedding

5

Overview of Hashing

Real World Binary Space

2000 values 32 bits

Binary

Reduction

Page 6: 20140327 - Hashing Object Embedding

6

Facing Big Data

Approximation

Page 7: 20140327 - Hashing Object Embedding

7

Learning to Hash

Data-Oblivious

Data-Aware

Description Methods

LSI, RBM, SpH, STH, …

LSH, Kernel-LSH, SimHash, …

Page 8: 20140327 - Hashing Object Embedding

8

Data-Oblivious: SimHash [WWW.2007]

Text ……

Observed Features

W1

W2

Wn

100110 W1

110000 W2

001001 Wn……

W1 –W1 -W1 W1 W1 -W1

W2 W2 -W2 -W2 -W2 -W2

-Wn –Wn Wn –Wn –Wn Wn

……

13, 108, -22, -5, -32, 551, 1, 0, 0, 0, 1

Step1: Compute TF-IDF

Step2: Hash Function

Step3: Signature

Step4: Sum

Step5: Generate Fingerprint

Page 9: 20140327 - Hashing Object Embedding

9

Data-Aware: Spectral Hashing [NIPS.2008]

2min :

. . : { 1,1}

0

1

ij i jij

ki

ii

Ti i

i

S y y

s t y

y

y yn

∈ −=

=

∑ I

min : ( ( ) )

. . : ( , ) { 1,1}

0

T

k

T

T

trace Y D W Y

s t Y i j

−∈ −

==

Y 1

Y Y ILaplacian Eigenmap

XW Y=

Page 10: 20140327 - Hashing Object Embedding

10

Some Questions?

1. Can we obtain hashing codes by binarizing the real-valued low-dimensional vectors such as LSI?

2. Can we get hashing codes by Deep Learning approaches such as RBM, or AutoEncoder?

Page 11: 20140327 - Hashing Object Embedding

11

Some Questions?

1. Can we obtain hashing codes by binarizing the real-valued low-dimensional vectors such as LSI?

Of Course ! [R. Salakhutdinov, G. Hinton. Semantic Hashing, SIGIR2007]

2. Can we get hashing codes by Deep Learning approaches such as RBM, or AutoEncoder?

No Problem ! [R. Salakhutdinov, G. Hinton. Semantic Hashing, SIGIR2007]

Page 12: 20140327 - Hashing Object Embedding

12

In 2013, What Did They Think About?

Total: 30

Page 13: 20140327 - Hashing Object Embedding

13

1/9 - ICML2013:

Title: Learning Hash Functions Using Column Generation Authors: Xi Li, Guosheng Lin, Chunhua Shen, Anton van den Hengel, Anthony DickOrganization: The University of Adelaide (Australia)Based On: NIPS2005: Distance Metric Learning for Large Margin Nearest Neighbor Classification

Motivation: In content based image retrieval, to collect feedback, users may be required to report whether image x looks more similar to x+ than it is to a third image x−. This task is typically much easier than to label each individual image.

11min

. . 0, 0;

( , ) ( , ) 1 ,

J

i i

H i i H i i i

C

s t

d d i

ξ

ξξ

=

− +

+

≥ ≥− ≥ − ∀

∑w,ξ

w

w

x x x x

Page 14: 20140327 - Hashing Object Embedding

14

2/9 - ICML2013:

Title: Predictable Dual-View Hashing Authors: Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Hal Daume III, Larry S. DavisOrganization: The University of Maryland (USA)

Motivation: It is often the case that information about data are available from two or more views, e.g., images and their textual descriptions. It is highly desirable to embed information from both domains in the binary codes, to increase search and retrieval capabilities.

2 2 2 2

2 2 2 2min

. . sgn(W X )

sgn(W X )

T T T TV V T T T T T V V V

TT T T

TV V V

W X Y Y Y I W X Y Y Y I

s t Y

Y

− + − + − + −

==

Y,U

Page 15: 20140327 - Hashing Object Embedding

15

3/9 - SIGIR2013:

Title: Semantic Hashing Using Tags and Topic Modeling. Authors: Qifan Wang, Dan Zhang, Luo SiOrganization: Purdue University (USA)

Motivation: Two major issues are not addressed in the existing hashing methods: (1) Tag information is not fully utilized in previous methods. Most existing methods only deal with the contents ofdocuments without utilizing the information contained in tags; (2) Document similarity in theoriginal keyword feature space is used as guidance for generating hashing codes in previous methods, which may not fully reflect the semantic relationship.

12

2 2 2min ( )

. . { 1,1} , 0

T

F

k n

C

s t

γ×

− + + −

∈ − =Y,U

T U Y U Yθ

Y Y1

g

Page 16: 20140327 - Hashing Object Embedding

16

3/9 - SIGIR2013:

Title: Semantic Hashing Using Tags and Topic Modeling. Authors: Qifan Wang, Dan Zhang, Luo SiOrganization: Purdue University (USA)

Motivation: Two major issues are not addressed in the existing hashing methods: (1) Tag information is not fully utilized in previous methods. Most existing methods only deal with the contents ofdocuments without utilizing the information contained in tags; (2) Document similarity in theoriginal keyword feature space is used as guidance for generating hashing codes in previous methods, which may not fully reflect the semantic relationship.

12

2 2 2min ( )

. . { 1,1} , 0

T

F

k n

C

s t

γ×

− + + −

∈ − =Y,U

T U Y U Yθ

Y Y1

g

Our experiments on 20Newsgroups

Page 17: 20140327 - Hashing Object Embedding

17

4/9 - IJCAI2013:

Title: A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing. Authors: Debing Zhang, Genmao Yang, Yao Hu, Zhongming Jin, Deng Cai, Xiaofei HeOrganization: Zhejiang University (China)

Motivation: Traditionally, to solve problem of nearest neighbor search, researchers mainly focus on building effective data structures such as hierarchical k-means tree or using hashing methods to accelerate the query process. In this paper, we propose a novel unified approximate nearest neighbor search scheme to combine the advantages of both the effective data structure and the fast Hamming distance computation in hashing methods.

Page 18: 20140327 - Hashing Object Embedding

18

5/9 - CVPR2013:

Title: K-means Hashing: an Affinity-Preserving Quantization Method for Learning Binary Compact Codes.Authors: Kaiming He, Fang Wen, Jian SunOrganization: Microsoft Research Asia (China)

Motivation: Both Hamming-based methods and lookup-based methods are of growing interest recently, and each category has its benefits depending on the scenarios. The lookup-based methods have been shown more accurate than some Hamming methods with the same code-length. However, the lookup-based distance computation is slower than the Hamming distance computation. Hamming methods also have the advantage that the distance computation is problem-independent

1 12

0 0

( ( , ) ( , ))k k

aff ij i j hi j

E w d c c d i j− −

= =

= −∑∑

Page 19: 20140327 - Hashing Object Embedding

19

6/9 - ICCV2013:

Title: Complementary Projection Hashing.Authors: Zhongming Jin1, Yao Hu1, Yue Lin1, Debing Zhang1, Shiding Lin2, Deng Cai1, Xuelong Li3

Organization: 1. Zhejiang University, 2. Baidu Inc., 3. Chinese Academy of Sciences, Xi’an (China)

Motivation: 1. (a) The hyperplane a crosses the sparse region and the neighbors are quantized into thesame bucket; (b) The hyperplane b crosses the dense region and the neighbors are quantized into the different buckets. Apparently, the hyperplane a is more suitable as a hashing function. 2. (a) (b) Both the hyperplane a and the hyperplane b can evenly separated the data. (c) However, putting them together does not generate a good two bits hash function. (d) A better example for two bits hash function

Page 20: 20140327 - Hashing Object Embedding

20

7/9 - CVPR2013:

Title: Hash Bit Selection: a Unified Solution for Selection Problems in Hashing.Authors: Xianglong Liu1, Junfeng He2,3, Bo Lang1, Shih-Fu Chang2.Organization: 1. Beihang University(China), 2. Columbia University(US), 3. Facebook(US)

Motivation: Recent years have witnessed the active development of hashing techniques for nearest neighbor search over big datasets. However, to apply hashing techniques successfully, there are several important issues remaining open in selection features, hashing algorithms, parameter settings, kernels, etc.

Page 21: 20140327 - Hashing Object Embedding

21

8/9 - ICCV2013:

Title: A General Two-Step Approach to Learning-Based Hashing.Authors: Guosheng Lin, Chunhua Shen, David Suter, Anton van den Hengel.Organization: University of Adelaide (Australia)Based On: SIGIR2010: Self-Taught Hashing for Fast Similarity Search

Motivation: Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. Their framework decomposes hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits.

Page 22: 20140327 - Hashing Object Embedding

22

9/9 - IJCAI2013:

Title: Smart Hashing Update for Fast Response.Authors: Qiang Yang, Long-Kai Huang, Wei-Shi Zheng, Yingbiao Ling.Organization: Sun Yat-sen University (China)Based On: DMKD2012: Active Hashing and Its Application to Image and Text Retrieval

Motivation: Although most existing hashing-based methods have been proven to obtain high accuracy,they are regarded as passive hashing and assume that the labeled points are provided in advance. In this paper, they consider updating a hashing model upon gradually increased labeled data in a fast response to users, called smart hashing update (SHU). 1. Consistency-based Selection;

2. Similarity-based Selection. [CVPR.2012]

( , ) min{ ( , , 1), ( , ,1)}Diff k j num k j num k j= −

2

{ 1,1}

1min

l rl

Tl l

HF

Q H H Sr×∈ −

= −

2

1 1{1,2,...,r}min k k T

k r r FkR rS H H− −∈

= −

Page 23: 20140327 - Hashing Object Embedding

23

Reporter: Xu Jiaming (Ph.D Student) Date: 2014.03.27

Computational-Brain Research Center

Institute of Automation, Chinese Academy of Sciences