Top Banner
The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji, M-H Yang 4.55pm: Discussion 5.30pm: End Feature / Deep Learnin Compositional Models Learning Representatio Overview Low-level Representatio Learning on the fly
13

Fcv learn fergus

May 06, 2015

Download

Technology

zukun
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fcv learn fergus

The Role of Learning in Vision

3.30pm: Rob Fergus3.40pm: Andrew Ng3.50pm: Kai Yu4.00pm: Yann LeCun4.10pm: Alan Yuille4.20pm: Deva Ramanan4.30pm: Erik Learned-Miller4.40pm: Erik Sudderth4.50pm: Spotlights

- Qiang Ji, M-H Yang4.55pm: Discussion5.30pm: End

Feature / Deep Learning

Compositional Models

Learning Representations

Overview

Low-level Representations

Learning on the fly

Page 2: Fcv learn fergus

An Overview of Hierarchical Feature Learning and Relations to Other Models

Rob Fergus

Dept. of Computer Science, Courant Institute,

New York University

Page 3: Fcv learn fergus

Motivation

• Multitude of hand-designed features currently in use– SIFT, HOG, LBP, MSER, Color-SIFT………….

• Maybe some way of learning the features?

• Also, just capture low-level edge gradients

Felzenszwalb, Girshick, McAllester and Ramanan, PAMI

2007

Yan & Huang (Winner of PASCAL 2010 classification

competition)

Page 4: Fcv learn fergus

• Mid-level cues

Beyond Edges?

“Tokens” from Vision by D.Marr:

Continuation Parallelism Junctions Corners

• High-level object parts:

• Difficult to hand-engineer What about learning them?

Page 5: Fcv learn fergus

• Build hierarchy of feature extractors (≥ 1 layers)– All the way from pixels classifier– Homogenous structure per layer– Unsupervised training

Deep/Feature Learning Goal

Layer 1Layer 1 Layer 2Layer 2 Layer 3Layer 3 Simple Classifier

Image/VideoPixels

• Numerous approaches:– Restricted Boltzmann Machines (Hinton, Ng, Bengio,…)– Sparse coding (Yu,

Fergus, LeCun)– Auto-encoders (LeCun,

Bengio)– ICA variants (Ng, Cottrell)

& many more….

Page 6: Fcv learn fergus

Single Layer Architecture

Filter

Normalize

Pool

Input: Image Pixels / Features

Output: Features / Classifier

Details in the boxes matter

(especially in a hierarchy)

Links to neuroscience

Page 7: Fcv learn fergus

Example Feature Learning Architectures

Pixels /Features

Filter with Dictionary(patch/tiled/convolutional)

Spatial/Feature (Sum or Max)

Normalizationbetween feature responses

Features

+ Non-linearity

Local Contrast Normalization (Subtractive /

Divisive)

(Group)

Sparsity

Max /

Softmax

Page 8: Fcv learn fergus

SIFT Descriptor

Image Pixels Apply

Gabor filters

Spatial pool (Sum)

Normalize to unit length

Feature Vector

Page 9: Fcv learn fergus

SIFTFeatures

Filter with Visual Words

Multi-scalespatial pool (Sum)

Max

Classifier

Spatial Pyramid Matching

Lazebnik, Schmid,

Ponce [CVPR 2006]

Page 10: Fcv learn fergus

Role of Normalization

• Lots of different mechanisms (max, sparsity, LCN etc.)

• All induce local competition between features to explain input– “Explaining away” – Just like top-down models– But more local mechanism

Example: Convolutional Sparse Coding

FiltersConvolution

|.|1|.|1|.|1|.|1

Zeiler et al. [CVPR’10/ICCV’11],Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]

Page 11: Fcv learn fergus

Role of Pooling

• Spatial pooling– Invariance to small

transformations

Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007]

• Pooling across feature groups– Gives AND/OR type behavior– Compositional models of Zhu,

Yuille

– Larger receptive fields

Zeiler, Taylor, Fergus [ICCV 2011]

• Pooling with latent variables (& springs)– Pictorial structures models

Felzenszwalb, Girshick, McAllester, Ramanan[PAMI 2009]

Page 12: Fcv learn fergus
Page 13: Fcv learn fergus

HOGPyramid

Apply objectpart filters

Pool part responses (latent variables & springs) Non-maxSuppression(Spatial)

Score

Object Detection with Discriminatively Trained Part-Based Models

Felzenszwalb, Girshick,

McAllester, Ramanan

[PAMI 2009]

+ +