Top Banner
35

SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Jul 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features
Page 2: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

IMLG – 22 March 2018 John Collomosse 2

Visual Search?

Visual search: Querying visual lrepositories using visual (pictorial) queries.

80% of the Internet forecast to be visual data by end 2018 [Cisco, NF 2016] (on track for 84%)

iTrace – Visual Plagiarism Detection

A visual ‘TurnItIn’

Trialed at 4 HEIs

120k artworks in VADS.ac.uk + uploads

Page 3: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

John Collomosse 3

“Classic” Computer Vision

Matches visual features (“key points”) between images

SIFT features

(circa 2004)

IMLG – 22 March 2018

Page 4: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

John Collomosse 4

“Modern” Computer Vision

Use deep neural networks trained on example data to extract digital signatures from whole images

Convolutional Neural Network

CNN

IMLG – 22 March 2018

Page 5: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

John Collomosse 5

Deep Learning Revolution

Pre 2012 – State of the art…. Deep Learning (~2016) – State of the art….

Q: What is this? (Which of 1000 objects is this?) A: Mug (~9% accuracy)

Q: How many leftover donuts are there? A: Three (~70% accuracy)

IMLG – 22 March 2018

Page 6: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

John Collomosse 6

Deep Learning Revolution

Challenges for CNNs circa 2012:

- Data hungry. CNNs require a lot of training data.

- Processing power. CNNs require a lot of CPU to train. So, only simple CNNs were trained.

- Niche.

Then…

- ImageNet arrived (16m images, 1000 classes) [Deng et al. 2009]

- GPUs. General purpose GPU processing / CUDA. The algorithms for training a CNN are highly parallelisable.

- NIPS/ECCV 2012. Double-digit % gain on ImageNet accuracy announced using CNNs.

The vision community took notice!

IMLG – 22 March 2018

Page 7: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Sketch based Visual Search

John Collomosse 7

1. Several Million (10^7) Colour Images

Sketch based Retrieval of….

2. Images using Deeply Learned Descriptors

3. Sketching with Style: Search with Aesthetic Constraints

IMLG – 22 March 2018

Page 8: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

John Collomosse 8

Why Sketch?

“Most of the next generation will probably never use Desktop products. People don’t understand how profound a shift this is. The reality is for these hundreds of millions of users, mobile will be their entire gateway to services.” – Wired, 2017

• Touch screen (gesture) is the primary interface on mobile (replacing text/keyboard) • New discovery tools needed to release value in visual content • Sketch is an intuitive modality for describing desired visual attributes

IMLG – 22 March 2018

Page 9: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

(want to invest time to)

But the problem…

John Collomosse 9

“People don’t draw well!”

Sketching is visual communication

[1] Hu and Collomosse “Performance Evaluation of Gradient Field HOG” Comp. Vision. Image Understanding (CVIU) 2013.

Excerpt of Flickr15k [1]

Humans communicate efficiently, using vocabulary & context

Sketch for retrieval is a casual throw-away act (for a machine).

(… and some users are bad at sketching)

IMLG – 22 March 2018

Page 10: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Demo

John Collomosse 10

Android demo app available for phones/tablets at: https://play.google.com/store/apps/details?id=com.collomosse.sketcher

IMLG – 22 March 2018

Page 11: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Diversion into Text Search

John Collomosse 11

A common measure of text document similarity involves building a frequency histogram of the words in the document: a “bag of words”

Ma

rtian

s

eve

the

mo

lten

Life

Lig

ht

and

by

This descriptor encodes the distribution of words in the document; a function of its content

Careful choice of the words (bins) is key! Location of words doesn’t matter!

IMLG – 22 March 2018

Page 12: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Bag of Visual Words for Photo Retrieval

John Collomosse 12

Q. Why does BoVW find a swan?

A. The frequency / distribution of local texture patches cut from the query (SIFT) matches those cut from swan images in the database.

query

database

IMLG – 22 March 2018

Page 13: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Bag of Visual Words for Sketch Retrieval

John Collomosse 13

Q. Why do we see a swan?

A. The spatial relationships of strokes (edges) determine the object’s structure, from which we infer presence of a swan.

IMLG – 22 March 2018

Page 14: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Synthesising “Texture”

John Collomosse 14

Photos Sketches

IMLG – 22 March 2018

Page 15: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Feature Extraction (sketch & photo)

Photographs passed through edge detection filter

Multi-scale patches cut at every ‘edge’ or ‘sketch’ pixel (Gradient information / HOG)

Database images Feature

extraction

Feature

encoding

Index

file

Matching

Feature

extraction

Feature

encoding

Query

sketch

results

Edge

detection

John Collomosse 15 IMLG – 22 March 2018

Page 16: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Results: ImageNet Dataset:

~2s / query

16m image dataset.

Platform:

AMD 2.6Ghz

Single core benchmark

43Gb features

John Collomosse 16 IMLG – 22 March 2018

Page 17: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Sketch based Visual Search

John Collomosse 17

1. Several Million (10^7) Colour Images

Sketch based Retrieval of….

2. Images using Deeply Learned Descriptors

3. Sketching with Style: Search with Aesthetic Constraints

IMLG – 22 March 2018

Page 18: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Problems with an Edgemap Approach

Sketches are not edgemaps (Distortion, Level of Abstraction, etc.)

John Collomosse 18

House

Crocodile

TU-Berlin dataset, Eitz et al. 2012

IMLG – 22 March 2018

Page 19: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Cross domain metric learning for SBIR

Can we learn a low dimensional metric embedding of edge and sketch space?

John Collomosse 19

Before learning After learning

sketch image

IMLG – 22 March 2018

Page 20: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Sketch matching with CNN

John Collomosse 20

L(a,p,n)

𝐿 𝑎, 𝑝, 𝑛 = 1

2 {max (0, 𝑚 +

+ 𝑎 − 𝑝 22 − 𝑎 − 𝑛 2

2}

512

512 100

256 15 15

256 15 15

3 3

256 15 15

3 3

128 31 31

3 3

64 71

71

5 5

512

512 100

256 15 15

256 15 15

3 3

256 15 15

3 3

128 31 31

3 3

64 71

71

5 5

a p

512

512 100

256 15 15

256 15 15

3 3

256 15 15

3 3

128 31 31

3 3

64 71

71

5 5

n

m: margin

Triplet loss in triplet network:

IMLG – 22 March 2018

Page 21: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

What happens during training?

Learning joint embedding of edge and sketch space

John Collomosse 21

Before training

a

p

n

a p

n

m

Triplet loss

𝐿 𝑎, 𝑝, 𝑛 = 1

2 {max (0, 𝑚 +

+ 𝑎 − 𝑝 22 − 𝑎 − 𝑛 2

2}

IMLG – 22 March 2018

Page 22: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Datasets

John Collomosse 22

Training: • Sketch: TU-Berlin 20k@250 classes. • Image: Internet photo acquisition

Test: • Flickr15k: 15k photos + 330 sketches @ 33 classes.

Class diversity between training and test datasets:

TU-Berlin Flickr15k

“bridge” “bridge” “Tower bridge”

“Sydney bridge”

“Oxford bridge”

“duck” “swan” “duck-swan” “duck-swan”

“sun” “moon” “moon” “sunrise- sunset”

Human-skeleton nose mermaid angel

IMLG – 22 March 2018

Page 23: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Training Methodology

Data Augmentation and Triplet Formation

John Collomosse 23

• Images: • 25k photos: 100 photos/class. • Edge extraction: gPb [Arbelaez, 2011]. • Mean subtraction, random crop/rotation/scaling/flip.

• Sketches: • 20k sketches: 20s training, 60s validation per class. • Skeletonisation. • Mean subtraction, random crop/rotation/scaling/flip. • Random stroke removal.

• Triplet formation: • Random selection pos/neg samples.

• Training: • 10k epochs.

crop

rotation

scaling

flip

Stroke removal

IMLG – 22 March 2018

Page 24: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Representative Queries/Results

John Collomosse 24 IMLG – 22 March 2018

Page 25: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Sketch based Visual Search

John Collomosse 25

1. Several Million (10^7) Colour Images

Sketch based Retrieval of….

2. Images using Deeply Learned Descriptors

3. Sketching with Style: Search with Aesthetic Constraints

IMLG – 22 March 2018

Page 26: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Sketching with Style

1

2

1. User’s intermediate work in Photoshop (here, a graphite sketch)

2. Behance visually searched for inspiration in a specified style (here, watercolor)

Result searching 66.8m Behance images

John Collomosse 26 IMLG – 22 March 2018

Page 27: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Video Demo

John Collomosse 27 IMLG – 22 March 2018

Page 28: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Learning the Style Embedding

GoogleNet (Inception v3) with 128-D Bottleneck

Triplet design, fully siamese

Training Set (110k Behance)

John Collomosse 28 IMLG – 22 March 2018

Page 29: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Visualizing the Style Embedding (Behance 1m test set)

t-SNE perplexity 20

John Collomosse 29 IMLG – 22 March 2018

Page 30: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Putting it all together Two Stream Network Architecture:

1. A Structure Network – that learns an embedding to visually match structure irrespective of style

2. A Style Network – that learns an embedding to visually match aesthetics irrespective of content (structure)

25

6

Structure embedding

Style embedding

Search Index

12

8

12

8

Structure network: Sketch

Style network 12

8

12

8

Structure network: Image

Style network

25

6

John Collomosse 30 IMLG – 22 March 2018

Page 31: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Evaluating Sketch+Style Retrieval (top-1 result)

John Collomosse 31 IMLG – 22 March 2018

Page 32: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Fine Grain: Style Analogies Vector math in 128-D style space

= (watercolor + graphite) = (watercolor – graphite)

John Collomosse 32 IMLG – 22 March 2018

Page 33: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Fine Grain: Style Analogies Vector math in 128-D style space

= comic = (comic – pen+ink)

John Collomosse 33 IMLG – 22 March 2018

Page 34: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Fine Grain: Style Analogies Vector math in 128-D style space

= vectorart = (– vectorart)

John Collomosse 34 IMLG – 22 March 2018

Page 35: SIFT featuresfilestore.nationalarchives.gov.uk/resources/temp/...Mar 22, 2018  · Trialed at 4 HEIs 120k artworks in VADS.ac.uk + ... lassic omputer Vision Matches visual features

Closing Thoughts

Scalability - Scaling under sketch ambiguity is the challenge (not compute) - Need to integrate modalities beyond shape. Which? How to fuse? - How to determine user intent in prioritizing modalities?

Composition Breakdown

- All SBIR assumes a single dominant object, but real data isn’t like that

John Collomosse 35

Deep Learning - Deep Learning outperforms classic approaches at perceptual tasks like search - But they must be trained with a lot of representative data (+annotation) - Sketch data in particular is sparse (10^2 categories, 10^4 instances)

IMLG – 22 March 2018