Top Banner
6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
114

6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Sep 13, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

6.819/6.869 Advances in Computer Vision

Image by kirkh.deviantart.com

Aditya Khosla

Page 2: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Today’s class

• Part 1: From state-of-the-art to state-of-the-artest• Fine-tuning

• Data augmentation

• Part 2: Applications• Detection, segmentation, …

• Part 3: Learning sequences• RNNs/LSTMs

Page 3: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

ImageNet Challenge Year

Object recognition

6.8%

15%11%

5%

2015

AlexNet

GoogLeNet

VGGNet

Page 4: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 5: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

GoogLeNet

Credit: Szegedy et al

Page 6: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

GoogLeNet vs AlexNet

GoogLeNet

AlexNet

Credit: Szegedy et al

Page 7: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 8: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

GoogLeNet

• Power and Memory use considerations are important for practical use.

• Image data is mostly sparse and clustered.

• Hebbian Principle:

“Neurons that fire together, wire together”

Page 9: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

In images, correlations tend to be local

Credit: Szegedy et al

Page 10: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Cover very local clusters by 1x1 convolutions

1x1number of filters

Credit: Szegedy et al

Page 11: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Less spread out correlations

1x1number of filters

Credit: Szegedy et al

Page 12: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Cover more spread out clusters by 3x3 convolutions

1x1

3x3

number of filters

Credit: Szegedy et al

Page 13: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Cover more spread out clusters by 5x5 convolutions

1x1number of filters

3x3

Credit: Szegedy et al

Page 14: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Cover more spread out clusters by 5x5 convolutions

1x1number of filters

3x35x5

Credit: Szegedy et al

Page 15: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

A heterogeneous set of convolutions

1x1number of filters

3x3

5x5

Credit: Szegedy et al

Page 16: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Schematic view (naive version)

1x1number of filters

3x3

5x5

1x1 convolutions

3x3 convolutions

5x5 convolutions

Filter concatenation

Previous layer

Credit: Szegedy et al

Page 17: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

1x1 convolutions

3x3 convolutions

5x5 convolutions

Filter concatenation

Previous layer

Naive idea

Credit: Szegedy et al

Page 18: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

1x1 convolutions

3x3 convolutions

5x5 convolutions

Filter concatenation

Previous layer

Naive idea (does not work!)

3x3 max pooling

Credit: Szegedy et al

Page 19: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

1x1 convolutions

3x3 convolutions

5x5 convolutions

Filter concatenation

Previous layer

Inception module

3x3 max pooling

1x1 convolutions

1x1 convolutions

1x1 convolutions

Credit: Szegedy et al

Page 20: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

1x1 convolutions

3x3 convolutions

5x5 convolutions

Filter concatenation

Previous layer

Inception module

3x3 max pooling

1x1 convolutions

1x1 convolutions

1x1 convolutions

Dimensionality reduction!

Credit: Szegedy et al

Page 21: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Inception

9 Inception modulesConvolutionPoolingSoftmaxOther

Credit: Szegedy et al

Page 22: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

256 480 480512

512 512832 832 1024

Inception

Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules.

Can remove fully connected layers on top completely

Number of parameters is reduced to 5 million

Computional cost is increased by less than 2X compared to AlexNet. (<1.5Bn operations/evaluation)

Credit: Szegedy et al

Page 23: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 24: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 25: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 26: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 27: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 28: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 29: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

VGGnet

Credit: Fei-Fei

Page 30: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Data augmentation

Page 31: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Data augmentation

• For both training and testing

Page 32: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Classification results on ImageNet 2012

Number of Models Number of Crops Computational Cost Top-5Error

Compared to Base

1 1 (center crop) 1x 10.07% -

1 10* 10x 9.15% -0.92%

1 144 (Our approach) 144x 7.89% -2.18%

7 1 (center crop) 7x 8.09% -1.98%

7 10* 70x 7.62% -2.45%

7 144 (Our approach) 1008x 6.67% -3.41%

*Cropping by [Krizhevsky et al 2014]

Page 33: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Fine-tuning

input output

Objects

Page 34: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Fine-tuning

input output

Objects

Page 35: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Fine-tuning

input output

Scenes

backpropagation

Page 36: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Using a CNN off-the-shelf representation with linear SVMs training significantly outperforms a majority of the baselines.

Results of MIT 67 Scene Classification

Visual Classification

Page 37: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 38: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

R-CNN

Credit: Fei-Fei

Page 39: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

R-CNN

Credit: Fei-Fei

Page 40: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Fei-Fei

Page 41: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

R-CNN Results

Credit: Fei-Fei

Page 42: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Pixels in, pixels out

monocular depth estimation (Liu et al. 2015)

boundary prediction (Xie & Tu 2015)

semanticsegmentation

Credit: Long et al

Page 43: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

“tabby cat”

1000-dim vector

< 1 millisecond

ConvNets perform Classification

end-to-end learning

Credit: Long et al

Page 44: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

< 1/5 second

end-to-end learning

???

Credit: Long et al

Page 45: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

“tabby cat”

A Classification Network

Credit: Long et al

Page 46: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Becoming fully convolutional

Credit: Long et al

Page 47: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Becoming fully convolutional

Credit: Long et al

Page 48: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Upsampling output

Credit: Long et al

Page 49: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

conv, pool,nonlinearity

upsampling

pixelwiseoutput + loss

End-to-end, Pixels-to-pixels network

Credit: Long et al

Page 50: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Deconvolutional network

Credit: Noh et al

Page 51: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Upsampling

Credit: Noh et al

Page 52: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Deconvolution

Credit: Noh et al

Page 53: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Unpooling and Deconv Effects

Credit: Noh et al

Page 54: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Results - segmentation

Credit: Noh et al

Page 55: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Predicting Human Visual Memory

Memorability = The likelihood of remembering a particular image.

Page 56: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Welcome to the

Visual Memory Game

A stream of images will be presented on the screen for 1 second each.

Your task:

Clap anytime you see an image you saw before in this experiment.

Page 57: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Ready?

(Seriously, get ready to clap. The images go by fast…)

Page 58: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 59: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 60: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 61: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 62: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 63: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 64: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 65: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

<clap!>

Page 66: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 67: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 68: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 69: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 70: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 71: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 72: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 73: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 74: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 75: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 76: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla
Page 77: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

<clap!>

Page 78: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Measuring Memorability

Memorability = Probability of correctly detecting a repeat after a single view of an image in a long sequence.

Understanding Image Memorability, Khosla et al, ICCV 2015

Memorability is an intrinsic property of an image!

Page 79: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

La Mem

• Focused

• Enclosed Setting

•Dynamics

•Unusual

memorability

• No single focus

• Distant view

• Static

• Common

Page 80: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Training MemNet

input output

LaMem

backpropagation

Page 81: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

MemNet Performance

0.4 0.45 0.5 0.55 0.6 0.65 0.7

HOG2x2

MemNet - small

HybridNet

MemNet

Human

rank correlation

Human rank correlation: 0.68

Prediction rank correlation: 0.64!

Page 82: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

stro

ng p

osi

tive

stro

ng n

egat

ive

Visualizing Neurons

http://memorability.csail.mit.edu

Page 83: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

‘selfie selection’

Predicting popularity

Page 84: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Popularity dataset

views

Dataset: 2.3 million Flickr images

Page 85: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Predicting popularity

What makes an image popular? Khosla et al, WWW 2014

Input Image

HOGGIST SIFTImage

Features

Support Vector

Regression

2.73 Image popularity

Page 86: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Predicting popularity

0

0.1

0.2

0.3

0.4

Gist Texture Color BoW Gradient DeepLearning

Combined

ran

k co

rrel

atio

n

What makes an image popular? Khosla et al, WWW 2014

Page 87: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

What makes an image popular?

colors

colo

r im

por

tance

What makes an image popular? Khosla et al, WWW 2014

Page 88: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

What makes an image popular?

Input Image

Object likelihood

Support Vector

Regression

1.9 Image popularity

Page 89: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Medium positive impact

What makes an image popular?

giant panda ladybug basketball

plow cheetah llama

Page 90: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Strong positive impact

What makes an image popular?

brassiere revolver miniskirt

maillot bikini cup

Page 91: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Negative impact

What makes an image popular?

spatula plunger laptop

Page 92: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

http://popularity.csail.mit.edu

Page 93: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

http://popularity.csail.mit.edu

Page 94: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Predicting the future

Goal: - predict Lucas-Kanade

optical flow given just one image!

Credit: Walker et al

Page 95: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Predicting the future

Credit: Walker et al

Page 96: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Learning sequences

Page 97: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Sequences are everywhere…

Credit: Alex Graves, Kevin Gimpel, Dhruv Batra

Page 98: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Even where you might not expect a sequence…

Credit: Dhruv Batra, Vinyals et al.

Page 99: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

How do we model sequences?

• It’s a spectrum…

Input: No sequence

Output: No sequence

Example: “standard”

classification / regression problems

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: Sequence

Output: No sequence

Example: sentence classification,

multiple-choice question

answering

Input: Sequence

Output: Sequence

Example: machine translation, video captioning, open-ended question

answering, video question answering

Credit: Dhruv Batra, Andrej Karpathy

Page 100: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Recurrent Neural Networks (RNNs)

Credit: Christopher Olah

Page 101: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Recurrent Neural Networks (RNNs)

Credit: Christopher Olah

Page 102: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Recurrent Neural Networks (RNNs)

Credit: Christopher Olah

Page 103: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Long-term dependencies–hard to model!

Credit: Christopher Olah

Page 104: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

From plain RNNs to LSTMs

(LSTM: Long Short Term Memory Networks)

Credit: Christopher Olah

Page 105: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

From plain RNNs to LSTMs

(LSTM: Long Short Term Memory Networks)Credit: Christopher Olah

Page 106: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

LSTMs Intuition: Memory

• Cell State / Memory

Credit: Christopher Olah

Page 107: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

LSTMs Intuition: Forget Gate

• Should we continue to remember this “bit” of information or not?

Credit: Christopher Olah

Page 108: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

LSTMs Intuition: Input Gate

• Should we update this “bit” of information or not?• If so, with what?

Credit: Christopher Olah

Page 109: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

LSTMs Intuition: Memory Update

• Forget that + memorize this

Credit: Christopher Olah

Page 110: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

LSTMs Intuition: Output Gate

• Should we output this “bit” of information to “deeper” layers?

Credit: Christopher Olah

Page 111: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

LSTM: A pretty sophisticated cell

Credit: Christopher Olah

Page 112: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Generating image captions

Credit: Vinyals et al

Page 113: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Generating image captions

Credit: Vinyals et al

Page 114: 6.819/6.869 Advances in Computer Vision6.869.csail.mit.edu/fa15/lecture/Khosla_11_12_2015.pdf · 6.819/6.869 Advances in Computer Vision Image by kirkh.deviantart.com Aditya Khosla

Credit: Vinyals et al