Top Banner
Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean
52

Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Jan 14, 2016

Download

Documents

Daniela Holt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Zero-Shot Learning byConvex Combination of Semantic Embeddings

Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean

Page 2: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,
Page 3: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Image Annotation

Lion Tiger

Apple

Orange

(100,000+ classes)

Page 4: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Labeled Datasets

Lion Tiger

Page 5: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Training

10000

Tiger

Apple

Lion

Orange

Bear

Page 6: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Training

00010

Tiger

Apple

Lion

Orange

Bear

Page 7: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Testing

Lion? Tiger?

Generalize to Unseen Images

Page 8: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

How to Generalize to Unseen Labels

Training:

► Lion

► Apple

► Orange

► Tiger

► Bear

Test:

► Wolf

► Cougar

► Grapefruit

“Side information”

“Represenation of new classes”

Page 9: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

One-Shot Learning

Wolf Cougar

[Erik G. Learned-Miller et al. Learning from one example through shared densities on transforms, 2000]

[Li Fei-Fei et al. One-shot learning of object categories, 06]

Page 10: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Zero-Shot Learning by Supervised Attributes

•Mammal, quadruped, white / gray / black, big pointed ears, not spotted, …

•Mammal, quadruped, brown / gray, wild, long tail, not spotted, …

Wolf

Cougar

[Ali Farhadi et al. Describing objects by their attributes, 09]

[Christoph Lampert et al. Learning to detect unseen object classes by between-class attribute transfer, 09]

Page 11: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Zero-Shot Learning by Unsupervised Embeddings•Dog – Bear

•Cat – Tiger – Lion

•Orange – Lemon

Use embedding of labels in a vector space and a notion of semantic similarity in that space

Wolf

Cougar

Grapefruit

[Richard Socher et al. Zero-shot learning through cross-modal transfer, 13]

[Andrea Frome, Greg S. Corrado, Jon Shlens et al.DeViSE: A Deep Visual-Semantic Embedding Model, 13]

Page 12: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Semantic Embedding of Labels

apple

orange

tigerlion

bear

Page 13: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Semantic Embedding of Labels

cougar

grapefruitwolf

tiger

apple

orange

lion

bear

Page 14: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Semantic Embedding of Images

𝑓 (𝑥)

lion lion

Page 15: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Semantic Embedding of Images

tiger

𝑓 (𝑥)

tiger

Page 16: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Semantic Embedding of Images

cougar

► How to define the semantic embedding of labels► How to project images into that space► We use kNN search for label retrieval

𝑓 (𝑥)

Page 17: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Unsupervised Label Embedding

Word with similar context will get similar vectors

We apply Skip-gram to Wikipedia articles [Tomas Mikolov et al. Efficient estimation of word representations in vector space, 13] (word2vec)

Page 18: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Unsupervised Label Embedding

Page 19: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

ConSE: Convex Combination of Semantic Embeddings

Tiger

Apple

Lion

Orange

Bear

Page 20: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

ConSE: Convex Combination of Semantic Embeddings

𝑓 (𝑥 )=∑𝑖

𝑝 (𝑦 𝑖|𝑥 ¿𝑠 (𝑦 𝑖)¿kNN search with query to find relevant labels

“lion”“lion”“tiger”“tiger”) bear”“bear”) …

Page 21: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

ConSE: Convex Combination of Semantic Embeddings

𝑓 (𝑥 )=∑𝑖

𝑝 (𝑦 𝑖|𝑥 ¿𝑠 (𝑦 𝑖)¿

Page 22: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

ConSE(T): Convex Combination of Semantic Embeddings

A parameter to only select top predictions is among top probabilitiesWhen , then

Page 23: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

ConSE(T): Convex Combination of Semantic Embeddings

► No extra training is needed

► For training images as long as then

► The output of is likely to stay on the manifold of labels when top predictions are close in the sematic space (similar to LLE)

Page 24: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Alternative Embedding Models (1)

► Train a regression model on the training set

[Richard Socher et al. Zero-shot learning through cross-modal transfer, 13]

Page 25: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Alternative Embedding Models (2)

► Train a ranking model on the training set so that is closer to than

Triplet ranking hinge loss

[Andrea Frome, Greg S. Corrado, Jon Shlens et al. DeViSE: A Deep Visual-Semantic Embedding Model, 13]

Our main baseline

Page 26: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Experiments on ImageNet

21,841 labels in total

1000 labels for training

20,842 labels for zero-shot classification

Page 27: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Experiments on ImageNet

manta ray

stingray

wing chair

swivel chair

Page 28: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

[Tomas Mikolov et al. Efficient estimation of word representations in vector space, 13]

D space

Page 29: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

softmax layer

image

label

ConvNet

[Alex Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks, 12]

Page 30: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

softmax layer

image

label

ConvNet

DeViSE [NIPS 2013]

transformation

𝑓 (𝑥) ranking obj.

𝑠(𝑦 )

Page 31: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

ConSE(T)

image

ConvNet

∑𝑖∈𝑡𝑜𝑝𝑇

𝑝 ( 𝑦 𝑖|𝑥¿ 𝑠(𝑦 𝑖)¿

𝑓 (𝑥)

softmax layer

Page 32: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

ConSE(T)

image

ConvNet

∑𝑖∈𝑡𝑜𝑝𝑇

𝑝 ( 𝑦 𝑖|𝑥¿ 𝑠(𝑦 𝑖)¿

𝑓 (𝑥)

softmax layer

image

ConvNet

DeViSE

transformation

𝑓 (𝑥)

Page 33: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Experiments on ImageNet

Training Set: ImageNet 1K

1000

Page 34: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Experiments on ImageNet

1000

1549

2-hops: 1 or 2 tree hops away from training labels

Page 35: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Experiments on ImageNet

1000

7860

3-hops

Page 36: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Experiments on ImageNet

1000

20,842

all

Page 37: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Experiments on ImageNet

Training

2-hops

3-hops

Page 38: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Training labels candidate labels

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSE

1,549

6.0 10.0 18.1 26.4 36.4ConSE(1) 9.3 14.4 23.7 30.8 38.7

ConSE(10) 9.4 15.1 24.7 32.7 41.8

ConSE(1000) 9.2 14.8 24.1 32.1 41.1

Page 39: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Training labels candidate labels

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSE

1,549

6.0 10.0 18.1 26.4 36.4ConSE(1) 9.3 14.4 23.7 30.8 38.7

ConSE(10) 9.4 15.1 24.7 32.7 41.8

ConSE(1000) 9.2 14.8 24.1 32.1 41.1

Page 40: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSE

1,549

6.0 10.0 18.1 26.4 36.4ConSE(1) 9.3 14.4 23.7 30.8 38.7

ConSE(10) 9.4 15.1 24.7 32.7 41.8

ConSE(1000) 9.2 14.8 24.1 32.1 41.1

Training labels candidate labels

ConSE(10) > ConSE(1) by 1% – 8%

Page 41: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSE

1,549

6.0 10.0 18.1 26.4 36.4ConSE(1) 9.3 14.4 23.7 30.8 38.7

ConSE(10) 9.4 15.1 24.7 32.7 41.8

ConSE(1000) 9.2 14.8 24.1 32.1 41.1

Training labels candidate labels

ConSE(10) > ConSE(1000) by 2%

Page 42: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSE

1,549

6.0 10.0 18.1 26.4 36.4ConSE(1) 9.3 14.4 23.7 30.8 38.7

ConSE(10) 9.4 15.1 24.7 32.7 41.8

ConSE(1000) 9.2 14.8 24.1 32.1 41.1

Training labels candidate labels

ConSE(10) > DeViSE by 57% – 15%

Page 43: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSEConSE(1)ConSE(10)ConSE(1000)

1,549

6.09.39.49.2

10.014.415.114.8

18.123.724.724.1

26.430.832.732.1

36.438.741.841.1

3-hop

DeViSEConSE(1)ConSE(10)ConSE(1000)

7,860

1.72.62.72.6

2.94.24.44.3

5.37.37.87.6

8.210.811.511.3

12.514.816.115.7

Training labels candidate labels

Page 44: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSEConSE(1)ConSE(10)ConSE(1000)

1,549

6.09.39.49.2

10.014.415.114.8

18.123.724.724.1

26.430.832.732.1

36.438.741.841.1

3-hop

DeViSEConSE(1)ConSE(10)ConSE(1000)

7,860

1.72.62.72.6

2.94.24.44.3

5.37.37.87.6

8.210.811.511.3

12.514.816.115.7

ImageNet 21K

DeViSEConSE(1)ConSE(10)ConSE(1000)

20,842

0.81.31.41.3

1.42.12.22.1

2.53.63.93.8

3.95.45.85.6

6.07.68.38.1

Training labels candidate labels

Page 45: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Data Set Model #Candidate

Labels

Flat hit@k (%)

1 2 5 10 20

2-hop

DeViSEConSE(1)ConSE(10)ConSE(1000)

1,549

6.09.39.49.2

10.014.415.114.8

18.123.724.724.1

26.430.832.732.1

36.438.741.841.1

Training labels candidate labels

2-hop(+1K)

DeViSEConSE(1)ConSE(10)ConSE(1000)

1,549+1000

0.80.20.30.3

2.77.16.26.2

7.917.217.016.7

14.224.024.924.5

22.731.833.532.9

Page 46: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Test Image ConvNet DeViSE ConSE(10)

sea lioncarpenter’s planecowboy bootloggerheadgoose

elephantturtleturtleneckflip-flopcart, handcart

California sea lionSteller sea lionAustralian sea lionSouth American sea lioneared seal

(Stellar sea lion)

Page 47: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Test Image ConvNet DeViSE ConSE(10)

sea lioncarpenter’s planecowboy bootloggerheadgoose

elephantturtleturtleneckflip-flopcart, handcart

California sea lionSteller sea lionAustralian sea lionSouth American sea lioneared seal

hamsterbroccoliPomeranianCapuchinweasel

golden hamsterrhesus monkeypipeshakerAmerican mink

golden hamsterrodent, gnawerEurasian hamsterrhesus monkeyrabbit, coney, cony

(Stellar sea lion)

(golden hamst.)

Page 48: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Test Image ConvNet DeViSE ConSE(10)

wigfur coatSalukiAfghan houndstole

water spanieltea gownbridal gownspanieltights, leotards

business suitdress, frockhairpiece, false hairswimsuitkit, outfit

Ostrichblack storkvulturecranepeacock

heronowl, bird of nighthawkraptorfinch

flightless bird, ratitePeafowlcommon spoonbillNew World vultureGreek partridge

(dress, frock)

(flightless bird)

Page 49: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Test Image ConvNet DeViSE ConSE(10)

threshertractorharvesterhalf-tracksnowplow

truckskiddertank carautomatic riflehouse trailer

flatcar, flatbed, flattruck, motortrucktracked vehiclebulldozer, dozerwheeled vehicle

Tibetan mastifftiti monkeyKoalallamachow-chow

kernellittoral zonecarillonCabernet Sauvignonpoodle dog

domestic dogdomestic catschnauzerBelgian sheepdogdomestic llama

(Lama pacos)

(farm machine)

Page 50: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Why ConSE > DeViSE zero-shot learning?

► ConSE < DeViSE on the training set

► In DeViSE the function is unrestricted, so it can project images out of the manifold of labels

► In ConSE the projection is much more restricted. We have to stay on the manifold, so the projection function is much more regularized

Page 51: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Conclusion

ConSE: A deterministic way to embed images in a semantic embedding space using probabilistic predictions of a classifier

Experiments suggest that this model performs very well for zero-shot learning compared to regression based algorithms

Page 52: Zero-Shot Learning by Convex Combination of Semantic Embeddings Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome,

Thank you!Liger?