Convolutional Patch Representations for Image Retrieval An unsupervised approach

Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

29th Mar 2016

Original slides by Eva MohedanoInsight Centre for Data Analytics (Dublin City University

Mattis Paulin, Julien Mairal, Matthijs Douze, Zaid Harchaoui, Florent Perronnin, Cordelia Schmidt

OverviewPublished ICCV 2015 (A.K.A. Local Convolutional Features With Unsupervised

Training for Image Retrieval)

Deep Convolutional Architecture to produce patch-level descriptors

• Unsupervised framework

• Comparison in patch and retrieval datasets

• “RomePatches” dataset

Related Work

• Shallow patch descriptors

• Deep learning for image retrieval

• Deep patch descriptors

Related Work• Shallow patch descriptors

SIFT – Scale-Invariant Feature Transform

- stereo matching

- retrieval

- classification

SURF, BRIEF, LIOP, (…)

Hand crafted → Relatively small number of parameters.

Note: A patch is an

image region extracted

from an image.

Related Work• Deep learning for image retrieval

CNN learned on a sufficiently large labeled dataset (ImageNet) generates intermediate layers that

can be used as image descriptors.

Those descriptors work for a wide variety of tasks, including image retrieval


source image: http://pubs.sciepub.com/ajme/2/7/9/



Fully connected layers → Global Image Descriptors

● Compact representation

● lack of geometric invariance

Below state-of-the art in image

retrieval

Compute at different scales(Babenko, Razavian)



Convolutional layers

Related Work• Deep patch descriptors

3 different kind of supervision:

1. Category labels of ImageNet. [Long et al, 2014]

2. Surrogate patch labels: Each class is a given patch under different transformations [Fischer et al, 2014]

3. Matching/non-matching pairs. [Simo-Serra et al, 2015]

Works focussed in patch-level metrics, not image retrieval.

All approaches requiered some kind of supervision.

Image Retrieval Pipeline• Interest point detection

Hessian-Affine detector.

Rotation invariance.

• Interest point description

Feature representation in a Euclidean space

• Patch Matching

VLAD encoding.

Power normalization with exponent 0.5 + L2-norm.

Image Retrieval Pipeline• Interest point detection

Hessian-Affine detector.

Rotation invariance.

• Interest point description

Feature representation in a Euclidean space

• Patch Matching

VLAD encoding.

Power normalization with exponent 0.5 + L2-norm.

Convolutional DescriptorsPatch size = 51x51 – Optimal for SIFT on Oxford dataset.

CNN extended to retrieval by:

• Encoding local descriptors with model trained with an unrelated classification task

• Devising a surrogate classification problem that is as related as possible to image retrieval:

• Using unsupervised learning: Convolutional Kernel Network

http://arxiv.org/abs/1406.3332

Convolutional Descriptors• Using unsupervised learning: Convolutional Kernel Network

Feature representation based in a kernel (feature) map -- Data independent


Projection in Hilbert space

Explicit kernel map can be computed to approximate it for computational efficiency.

- Sub-sample of patches

- Stochastic Gradient Optimization


4 possible inputs

From left to right: CKN-raw, CKN-mean subs, CKN-white (mean subs + PCA-whitening), CKN-grad (fully invariant to color)

Only CKN-raw, CKN-white and CKN-grad are evaluated.

ExperimentsDatasets:

1. Rome Patches-Image

2. Oxford

3. UKbench and Holidays

CKN trained on 1M sub-patches. 300K iterations. Mini-batches size of 1000.

http://lear.inrialpes.fr/people/ paulin/projects/RomePatches/

Experiments

Conclusions• CKN offer similar and sometimes better performance than CNN in the

context of patch description.

• Good patch retrieval translates into good image retrieval.

• CKNs are orders of magnitude faster to train than CNNs (10 min vs 2-3 days

on a modern GPU)

• Fully unsupervised – no labels.

ResourcesRomePatches+Code (Although code is not accessible!)

Discriminative Unsupervised Feature Learning with Exemplar Convolutional

Neural Networks

- Code with augmentations in matlab

- Code for training models.

- Models already trained :-)

Triplet’s net + Code !!

- Greyscale local patches of 32x32. Tested in matching datasets

http://lear.inrialpes.fr/people/paulin/projects/RomePatches/

http://lmb.informatik.uni-freiburg.de/Publications/2015/DFB15/

https://github.com/vbalnt/pnnet

Convolutional Patch Representations for Image Retrieval An unsupervised approach

Data & Analytics