Transcript

Graph Regularised Hashing

Sean Moran and Victor Lavrenko

Institute of Language, Cognition and ComputationSchool of Informatics

University of Edinburgh

ECIR’15 Vienna, March 2015

Graph Regularised Hashing (GRH)

Overview

GRH

Evaluation

Conclusion

Graph Regularised Hashing (GRH)

Overview

GRH

Evaluation

Conclusion

Locality Sensitive Hashing

H

DATABASE

Locality Sensitive Hashing

110101

010111

H

010101

111101

.....

DATABASE

HASH TABLE

Locality Sensitive Hashing

110101

010111

H

H

QUERY

DATABASE

HASH TABLE

010101

111101

.....

Locality Sensitive Hashing

110101

010111

H

H

COMPUTE SIMILARITY

NEARESTNEIGHBOURS

QUERY

010101

111101

.....

H

DATABASE

QUERY

HASH TABLE

Locality Sensitive Hashing

110101

010111

111101

H

H

Content Based IR

Image: Imense Ltd

Image: Doersch et al.

Image: Xu et al.

Location Recognition

Near duplicate detection

010101

111101

.....

H

QUERY

DATABASE

QUERY

NEARESTNEIGHBOURS

HASH TABLE

COMPUTE SIMILARITY

Previous work

I Data-independent: Locality Sensitive Hashing (LSH) [Indyk.98]

I Data-dependent (unsupervised): Anchor Graph Hashing(AGH) [Liu et al. ’11], Spectral Hashing (SH) [Weiss ’08]

I Data-dependent (supervised): Self Taught Hashing (STH)[Zhang ’10], Supervised Hashing with Kernels (KSH) [Liu etal. ’12], ITQ + CCA [Gong and Lazebnik ’11], BinaryReconstructive Embedding (BRE) [Kulis and Darrell. ’09]

Previous work

Method Data-Dependent Supervised Scalable Effectiveness

LSH X LowSH X LowSTH X X MediumBRE X X MediumITQ+CCA X X MediumKSH X X High

GRH X X X High

Graph Regularised Hashing (GRH)

Overview

GRH

Evaluation

Conclusion

Graph Regularised Hashing (GRH)

I Two step iterative hashing model:

I Step A: Graph Regularisation

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik

s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I Repeat for a set number of iterations (M)

Graph Regularised Hashing (GRH)

I Step A: Graph Regularisation [Diaz ’07][1]

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

[1] Diaz, F.: Regularizing query-based retrieval scores. In: IR(2007)

Graph Regularised Hashing (GRH)

I Step A: Graph Regularisation [Diaz ’07]

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Graph Regularised Hashing (GRH)

I Step A: Graph Regularisation [Diaz ’07]

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Graph Regularised Hashing (GRH)

I Step A: Graph Regularisation [Diaz ’07]

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Graph Regularised Hashing (GRH)

I Step A: Graph Regularisation [Diaz ’07]

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Graph Regularised Hashing (GRH)

-1 1 1

-1 -1 -1ba

c

1 1 1

S a b c

a 1 1 0b 1 1 1c 0 1 1

D−1 a b c

a 0.5 0 0b 0 0.33 0c 0 0 0.5

L0 b1 b2 b3

a −1 −1 −1b −1 1 1c 1 1 1

Graph Regularised Hashing (GRH)

-1 1 1

-1 -1 -1ba

c

1 1 1

L1 = sgn

−1 0 0−0.33 0.33 0.33

0 1 1

Graph Regularised Hashing (GRH)

-1 1 1

-1 1 1ba

c

1 1 1

L1 =

b1 b2 b3

a −1 1 1b −1 1 1c 1 1 1

Graph Regularised Hashing (GRH)

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik

s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I hk : Hyperplane k

I bk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ij

K : # bits

N: # data-points

Graph Regularised Hashing (GRH)

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik

s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I hk : Hyperplane k

I bk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ij

K : # bits

N: # data-points

Graph Regularised Hashing (GRH)

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik

s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I hk : Hyperplane k

I bk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ij

K : # bits

N: # data-points

Graph Regularised Hashing (GRH)

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik

s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I hk : Hyperplane k

I bk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ij

K : # bits

N: # data-points

Graph Regularised Hashing (GRH)

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik

s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I hk : Hyperplane k

I bk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ij

K : # bits

N: # data-points

Graph Regularised Hashing (GRH)

ba

c

e

f

g

h

d

Graph Regularised Hashing (GRH)

ba

c

e

f

g

h

d

Graph Regularised Hashing (GRH)

-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

1 -1 -1

1 -1 -1

1 1 1

Graph Regularised Hashing (GRH)

1 1 1

-1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

1 1 1

1 -1 -1

1 -1 -1

Graph Regularised Hashing (GRH)

-1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

-1 1 1

1 1 1

First bit flipped

1 1 -1 Second bit flipped

1 -1 -1

Graph Regularised Hashing (GRH)

-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1h1 . x−b1=0

h1

Negative (-1)half space

Positive (+1)half space

1 1 -1

1 -1 -1

-1 1 1

Graph Regularised Hashing (GRH)

-1 1 1

1 1 1

-1 -1 -1

1 1 -11 -1 -1

ba

c

e

f

g

h

1 1 -1

d

-1 1 1

-1 1 1

h2

Positive (+1)half space

h2 . x−b2=0 Negative (-1) half space

Evaluation

Overview

GRH

Evaluation

Conclusion

Datasets/Features

I Standard evaluation datasets [Liu et al. ’12], [Gong andLazebnik ’11]:

I CIFAR-10: 60K images, GIST descriptors, 10 classes1

I MNIST: 70K images, grayscale pixels, 10 classes2

I NUSWIDE: 270K images, GIST descriptors, 21 classes3

I True NNs: images that share at least one class in common[Liu et al. ’12]

1http://www.cs.toronto.edu/~kriz/cifar.html2http://yann.lecun.com/exdb/mnist/3http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm

Evaluation Metrics

I Hamming ranking evaluation paradigm [Liu et al. ’12], [Gongand Lazebnik ’11]

I Standard evaluation metrics [Liu et al. ’12], [Gong andLazebnik ’11]:

I Mean average precison (mAP)

I Precision at Hamming radius 2 (P@R2)

GRH vs Literature (CIFAR-10 @ 32 bits)

LSH BRE STH KSH GRH (Linear) GRH (RBF)0.10

0.15

0.20

0.25

0.30

0.35

mA

P LinearGRH

Non-linear GRH

GRH vs Literature (CIFAR-10 @ 32 bits)

LSH BRE STH KSH GRH (Linear)GRH (RBF)0.10

0.15

0.20

0.25

0.30

0.35

mA

P

GRH's straightforwardobjective outperforms

more complexobjectives

GRH vs Literature (CIFAR-10)

16 24 32 40 48 56 640.10

0.15

0.20

0.25

0.30

0.35

0.40LSHBREKSHGRH

# Bits

mAP

GRH vs Literature (CIFAR-10)

Small amount of supervision

required

16 24 32 40 48 56 640.10

0.15

0.20

0.25

0.30

0.35

0.40LSHBREKSHGRH

# Bits

mAP

+25-30%

GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)

GRH (Linear) GRH (RBF)0.00

0.05

0.10

0.15

0.20

0.25

0.30 LSH

ITQ+CCA

mAP

Linear GRH

Non-Linear GRH

GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)

GRH (Linear) GRH (RBF)0.00

0.05

0.10

0.15

0.20

0.25

0.30 LSH

ITQ+CCA

mAP

Eigendecompositionnot necessary -saves O(d^3)

GRH vs # Supervisory Data-Points (CIFAR-10)

Linear, T=1K Linear, T=2K RBF, T=1K RBF, T=2K0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

mA

P

LinearGRH

Non-linear GRH

LinearGRH

Non-linear GRH

GRH vs # Supervisory Data-Points (CIFAR-10)

Linear, T=1K Linear, T=2K RBF, T=1K RBF, T=2K0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

mAP

Small amount of supervision

required

GRH Timing (CIFAR-10 @ 32 bits)

Timings (s)Method Train Test TotalGRH 42.68 0.613 43.29KSH [1] 81.17 0.103 82.27

BRE [2] 231.1 0.370 231.4

[1] Liu, W.: Supervised Hashing with Kernels. In: CVPR (2012)[2] Kulis, B.: Binary Reconstructive Embedding. In: NIPS (2009)

Conclusion

Overview

GRH

Evaluation

Conclusion

Conclusions and Future Work

I Supervised hashing model that is both accurate and easilyscalable

I Take-home messages:

I Regularising bits over a graph is effective (and efficient) forhashcode learning

I An intermediate eigendecomposition step is not necessary

I Hyperplanes (linear hypersurfaces) can achieve a very goodretrieval accuracy

I Future work: extend to the cross-modal hashing scenario (e.g.Image ↔ Text, English ↔ Spanish)

Thank you for your attention

Sean Moran

Code and datasets available at:

sean.moran@ed.ac.ukwww.seanjmoran.com

top related