Top Banner
Lecture 16 - Silvio Savarese 4-Mar-15 • In-class presentations next week! Monday March 9 th ; 11am-12:15pm (in class) Tuesday March 10 th ; 1:30pm-3:30pm (Bishop Auditorium) Wednesday March 11 th ; 11am-12:15pm (in class) • 2 ½ minutes for each presentation + Q&A • It’s a team presentation • In-class or Piazza questions count toward attendance evaluation • Best presentation award • See Piazza and website for more information Announcements
97

Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Jan 26, 2019

Download

Documents

dinhkhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Lecture 16 -Silvio Savarese 4-Mar-15

• In-class presentations next week!• Monday March 9th ; 11am-12:15pm (in class)• Tuesday March 10th ; 1:30pm-3:30pm (Bishop Auditorium)• Wednesday March 11th; 11am-12:15pm (in class)

• 2 ½ minutes for each presentation + Q&A• It’s a team presentation• In-class or Piazza questions count toward attendance evaluation• Best presentation award • See Piazza and website for more information

Announcements

Page 2: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Lecture 16 -Silvio Savarese 4-Mar-15

• Thanks for the online evaluation

• Your feedback is extremely important! (see lecture notes )

Announcements

Page 3: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Lecture 16 -Silvio Savarese 4-Mar-15

• Shameless advertisement

• The course surveys recent developments in computer vision, graphics and image processing for mobile applications

• Three problems sets (each related to a toy mobile application).

• Course project: Extend one of the toy applications

• Latest Nvidia Shield Tablets will be assigned to each student

• No midterm; no final

New course:231M: Mobile Computer Vision

Page 4: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Lecture 16 -Silvio Savarese 4-Mar-15

• Datasets in computer vision• 3D scene understanding

Lecture 16Closure

Page 5: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Caltech 101

• Pictures of objects belonging to 101 categories.

• About 40 to 800 images per category. Most categories have about 50 images.

• The size of each image is roughly 300 x 200 pixels.

Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. L. Fei-Fei, R.

Fergus, and P. Perona. CVPR 2004, Workshop on Generative-Model Based Vision. 2004

Page 6: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Caltech 101 images

Page 7: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

• Smallest category size is 31 images:

• Too easy?

– left-right aligned

– Rotation artifacts

– Saturated performance

Caltech-101: Drawbacks

N train 30

Page 8: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Results up to 2007 -- recent methods obtain almost 100%

Caltech-101: Evaluation

Page 9: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Caltech-256

• Smallest category size now 80 images

• About 30K images

• Harder

– Not left-right aligned

– No artifacts

– More categories

• New and larger clutter category

Griffin, Gregory and Holub, Alex and Perona, Pietro (2007) Caltech-256 Object Category Dataset. California

Page 10: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Caltech 256 images

ba

se

ba

ll-ba

tbasketb

all-

hoop

dog

ka

ya

ctr

affic

lig

ht

Page 11: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

The PASCAL Visual Object Classes

(VOC) Dataset and Challenge

(2005-2012)Mark Everingham

Luc Van Gool

Chris Williams

John Winn

Andrew Zisserman

Page 12: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Dataset Content

• 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat,

chair, cow, dining table, dog, horse, motorbike, person,

potted plant, sheep, train, TV

• Real images downloaded from flickr, not filtered for “quality”

• Complex scenes, scale, pose, lighting, occlusion, ...

Page 13: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Annotations

• Complete annotation of all objects

• Annotated in one session with written guidelines

Truncated

Object extends

beyond BB

Occluded

Object is significantly

occluded within BB

Pose

Facing left

Difficult

Not scored in

evaluation

Page 14: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Examples

Aeroplane

Bus

Bicycle Bird Boat Bottle

Car Cat Chair Cow

Page 15: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

History

• Challenge: annotation of test set is withheld until after

challenge

Images Objects Classes Notes

2005 2,232 2,871 4 Collection of existing

and some new data.

2006 5,304 9,507 10 Completely new dataset

from flickr (+MSRC)

2007 9,963 24,640 20 Increased classes to 20.

Introduced tasters.

2008

2012

8,776 20,739 20 Added “occlusion” flag.”

Added segmentation

masks11,530 27,450 20

Page 16: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

PASCAL 3D+

• 12 rigid categories from PASCAL VOC are annotated with 3D pose and aligned with 3D cad models (now almost 100 categories!)

• Integrated with images from other repositories• New benchmark for continuous 3D pose estimation and shape recovery

of object categories

Xiang, Mottaghi, Savarese WACV 14

Page 17: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

PASCAL 3D+

Page 18: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Other recent datasets

ESP[Ahn et al, 2006]

LabelMe[ Russell et al, 2005]

TinyImageTorralba et al. 2007

Lotus Hill[ Yao et al, 2007]

MSRC[Shotton et al. 2006]

Page 19: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

• ~20K categories; • 14 million images; • ~700im/categ;

• free to public at www.image-net.org

21

Largest dataset for object categories up to date

J. Deng, H. Su. K. Li , L. Fei-Fei ,

Page 20: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

is a knowledge ontology

• Taxonomy

• Partonomy

• The “social network” of visual concepts

– Hidden knowledge and structure among visual concepts

– Prior knowledge

– Context

Page 21: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

OpenSurfacesSean Bell, Paul Upchurch, Noah Snavely, Kavita Bala

Cornell University

+80K annotated images of materials

Page 22: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

COCO dataset

• Microsoft COCO is a new image recognition,

segmentation, and captioning dataset.

• Features:– Object segmentation

– Recognition in Context

– Multiple objects per image

– More than 300,000 images

– More than 2 Million instances

– 80 object categories

– 5 captions per image

Tsung-Yi Lin Cornell Tech

Michael Maire TTI Chicago

Serge Belongie Cornell Tech

Lubomir Bourdev Facebook AI

Ross Girshick Microsoft Research

James Hays Brown University

Pietro Perona Caltech

Deva Ramanan UC Irvine

Larry Zitnick Microsoft Research

Piotr Dollár Facebook AI

Page 23: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

More Datasets….

UIUC Cars (2004)S. Agarwal, A. Awan, D. Roth

3D Textures (2005)S. Lazebnik, C. Schmid, J. Ponce

CuRRET Textures (1999)K. Dana B. Van Ginneken S. Nayar J. Koenderink

CAVIAR Tracking (2005)R. Fisher, J. Santos-Victor J. Crowley

FERET Faces (1998)P. Phillips, H. Wechsler, J. Huang, P. Raus

CMU/VASC Faces (1998)H. Rowley, S. Baluja, T. Kanade

MNIST digits (1998-10)Y LeCun & C. Cortes

KTH human action (2004)I. Leptev & B. Caputo

Sign Language (2008)P. Buehler, M. Everingham, A. Zisserman

Segmentation (2001)D. Martin, C. Fowlkes, D. Tal, J. Malik.

Middlebury Stereo (2002)D. Scharstein R. Szeliski

COIL Objects (1996)S. Nene, S. Nayar, H. Murase

Page 24: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

CMU/MIT frontal faces vasc.ri.cmu.edu/idb/html/face/frontal_images

cbcl.mit.edu/software-datasets/FaceData2.html

Patches Frontal faces

Graz-02 Database www.emt.tugraz.at/~pinz/data/GRAZ_02/ Segmentation masks Bikes, cars, people

UIUC Image Database l2r.cs.uiuc.edu/~cogcomp/Data/Car/ Bounding boxes Cars

TU Darmstadt Database www.vision.ethz.ch/leibe/data/ Segmentation masks Motorbikes, cars, cows

LabelMe dataset people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary >500 Categories

Caltech 101 www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Segmentation masks 101 categories

Caltech 256

COIL-100

http://www.vision.caltech.edu/Image_Datasets/Caltech256/

www1.cs.columbia.edu/CAVE/research/softlib/coil-100.html

Bounding Box

Patches

256 Categories

100 instances

NORB www.cs.nyu.edu/~ylclab/data/norb-v1.0/ Bounding box 50 toys

Databases for object localization

Databases for object recognition

On-line annotation toolsESP game www.espgame.org Global image descriptions Web images

LabelMe people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary High resolution images

The next tables summarize some of the available datasets for training and testing

object detection and recognition algorithms. These lists are far from exhaustive.

Links to datasets

CollectionsPASCAL http://www.pascal-network.org/challenges/VOC/ Segmentation, boxes various

Page 25: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Lecture 16 -Silvio Savarese 4-Mar-15

• Datasets in computer vision• 3D scene understanding

Lecture 16Closure

Page 26: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

29

What does it mean to understand a scene?

Page 27: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

car

car

Car

Street

Building

image labels

Image-to-labels paradigm

Page 28: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Is computer vision solved??

Oh my gosh!

Future generation of computer vision students

Page 29: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

32

Chocolate bar

Tree

Road

Sky

Page 30: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

33

Sky

Chocolate bar

Tree

Road

Page 31: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,
Page 32: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

35

Car-right

Car-right

Car-left

Car- 3/4 right

toy car

Building facade

Road

Car-3/4 right

Page 33: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Road

36

Building facade

Page 34: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

37

Page 35: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

38

Page 36: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

39

Page 37: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Modeling this interplay is criticalfor 3D perception!

40

Objects are constrained by the 3D space

The 3D space is shaped by its objects

Page 38: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Biederman, Mezzanotte and Rabinowitz, 1982

Humans perceive the world in 3D

41

Page 39: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Visual processing in the brain

V1

where pathway(dorsal stream)

what pathway(ventral stream)

42

Page 40: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

V1

where pathway(dorsal stream)

what pathway(ventral stream)

Pre-frontal cortex

43

Visual processing in the brain

Page 41: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Current state of computer vision

3D Reconstruction• 3D shape recovery • 3D scene reconstruction • Camera localization• Pose estimation

2D Recognition• Object detection• Texture classification• Target tracking• Activity recognition

44

Page 42: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

3D Reconstruction• 3D shape recovery • 3D scene reconstruction • Camera localization• Pose estimation

Current state of computer vision

Levoy et al., 00Hartley & Zisserman, 00Dellaert et al., 00Rusinkiewic et al., 02Nistér, 04Brown & Lowe, 04Schindler et al, 04Lourakis & Argyros, 04Colombo et al. 05

Golparvar-Fard, et al. JAEI 10Pandey et al. IFAC , 2010Pandey et al. ICRA 2011

Microsoft’s PhotoSynthSnavely et al., 06-08Schindler et al., 08Agarwal et al., 09Frahm et al., 10

Lucas & Kanade, 81Chen & Medioni, 92Debevec et al., 96Levoy & Hanrahan, 96Fitzgibbon & Zisserman, 98Triggs et al., 99Pollefeys et al., 99Kutulakos & Seitz, 99

45

Savarese et al. IJCV 05Savarese et al. IJCV 06

Snavely

et al., 06

-08

Page 43: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

3D Reconstruction• 3D shape recovery • 3D scene reconstruction • Camera localization• Pose estimation

Current state of computer vision

Levoy et al., 00Hartley & Zisserman, 00Dellaert et al., 00Rusinkiewic et al., 02Nistér, 04Brown & Lowe, 04Schindler et al, 04Lourakis & Argyros, 04Colombo et al. 05

Golparvar-Fard, et al. JAEI 10Pandey et al. IFAC , 2010Pandey et al. ICRA 2011

Microsoft’s PhotoSynthSnavely et al., 06-08Schindler et al., 08Agarwal et al., 09Frahm et al., 10

Lucas & Kanade, 81Chen & Medioni, 92Debevec et al., 96Levoy & Hanrahan, 96Fitzgibbon & Zisserman, 98Triggs et al., 99Pollefeys et al., 99Kutulakos & Seitz, 99

Savarese et al. IJCV 05Savarese et al. IJCV 06

Snavely

et al., 06

-08

Page 44: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

2D Recognition• Object detection• Texture classification• Target tracking• Activity recognition

Current state of computer vision

Turk & Pentland, 91Poggio et al., 93Belhumeur et al., 97LeCun et al. 98Amit and Geman, 99Shi & Malik, 00Viola & Jones, 00Felzenszwalb & Huttenlocher 00Belongie & Malik, 02Ullman et al. 02

Argawal & Roth, 02Ramanan & Forsyth, 03Weber et al., 00Vidal-Naquet & Ullman 02Fergus et al., 03Torralba et al., 03Vogel & Schiele, 03Barnard et al., 03Fei-Fei et al., 04Kumar & Hebert ‘04

He et al. 06Gould et al. 08Maire et al. 08Felzenszwalb et al., 08Kohli et al. 09L.-J. Li et al. 09Ladicky et al. 10,11Gonfaus et al. 10Farhadi et al., 09 Lampert et al., 09 47

Page 45: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

2D Recognition• Object detection• Texture classification• Target tracking• Activity recognition

Current state of computer vision

car person car

Car Car

Street

Person

Bike

Building Tree

Turk & Pentland, 91Poggio et al., 93Belhumeur et al., 97LeCun et al. 98Amit and Geman, 99Shi & Malik, 00Viola & Jones, 00Felzenszwalb & Huttenlocher 00Belongie & Malik, 02Ullman et al. 02

Argawal & Roth, 02Ramanan & Forsyth, 03Weber et al., 00Vidal-Naquet & Ullman 02Fergus et al., 03Torralba et al., 03Vogel & Schiele, 03Barnard et al., 03Fei-Fei et al., 04Kumar & Hebert ‘04

He et al. 06Gould et al. 08Maire et al. 08Felzenszwalb et al., 08Kohli et al. 09L.-J. Li et al. 09Ladicky et al. 10,11Gonfaus et al. 10Farhadi et al., 09 Lampert et al., 09

Page 46: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

3D reconstruction 2D recognition

49

Perceiving the World in 3D

Current state of computer vision

• Modeling objects and their 3D properties• Modeling interaction among objects and space• Modeling relationships of object/space across views

Page 47: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Outline

50

• Modeling objects and their 3D properties

• Modeling interaction among objects and space

• Modeling relationships of objects across views

Page 48: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Detecting objects and estimating their 3D properties

5151

Page 49: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

3D

ob

ject

dat

aset

[Sa

vare

se &

Fei

-Fei

07

]

CAR a=330 e=15 d=7 CAR a=150 e=15 d=7

MOUSE a=300 e=45 d=23 SHOE a=240 e=45 d=11

Results

52

Page 50: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Imag

eNet

dat

aset

[D

eng

et a

l. 2

01

0]

BED a=30 e=15 d=2.5

CHAIR a=0 e=30 d=7

SOFA a=345 e=15 d=3.5a=60 e-30 d=2.5

TABLE a=60 e=15 d=2

Results

53

Page 51: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

SHOE Pose: back-left-side

ResultsExamples of failure (wrong category)

54

This can’t be a shoe!

Page 52: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Outline

55

• Modeling objects and their 3D properties

• Modeling interaction among objects and space

• Modeling relationships of objects across views

Page 53: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

56

Scene understanding is an interplay between objects and space

Page 54: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

57

3D space is shaped by its objects

Page 55: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

58

Objects are placed into 3D space

Page 56: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

A first attempt….

Interactions object-ground Hoiem et al, 2006-2008

Page 57: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Lab

elm

ed

atas

et [

Ru

ssel

l et

al.,

08

]

A first attempt….

60

Bao et al. CVPR 2010; BMVC 2010; CIVC 2011; IJCV 2012

Page 58: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Lab

elm

ed

atas

et [

Ru

ssel

l et

al.,

08

]

A first attempt….

61

Bao, et al. CVPR 2010; BMVC 2010;CIVC 2011 (editor choice)IJCV 2012

Page 59: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Generalization #1

Interactions between:- Objects-space

Oliva & Torralba, 2007Rabinovich et al, 2007Li & Fei-Fei, 2007Vogel & Schiele, 2007

Hoiem et al, 2006Herdau et al., 2009Gupta et al, 2010Fouhey et al, 2012

Desai et al, 2009Sadeghi & Farhardi, 2011Li et al, 2012- Object-object

Choi, et al., CVPR 13

Page 60: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

3D Geometric Phrases

63

A 3DGP encodes geometric and semantic relationships between groups of objects and space elements which frequently co-occur in spatially consistent configurations.

Page 61: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

64

Training Dataset 3DGPs

• W/o annotations• Compact

Using Max-Margin learning

w/ novel Latent Completion algorithm

3D Geometric PhrasesChoi, Chao, Pantofaru, Savarese, CVPR 13

• View-invariant

Page 62: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table

Estimated Layout 3D Geometric Phrases

Results

66

Ind

oo

r sc

ene

dat

aset

[C

ho

i et

al.,

12

]

Page 63: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

67

Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table

Estimated Layout 3D Geometric Phrases

ResultsIn

do

or

scen

e d

atas

et [

Ch

oi e

t al

., 1

2]

Page 64: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Results: Object Detection

68

56.9

65.2

20

30

40

50

60

70

80

90

100

Sofa Table Chair Bed D. Table S. Table Overall

Average Precision %

Felzenszwalb et al. 3DGP

+10.5%

+15.7%

Ind

oo

r sc

ene

dat

aset

[C

ho

i et

al.,

12

]

Page 65: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Outline

69

• Modeling objects and their 3D properties

• Modeling interaction among objects and space

• Modeling relationships of objects across views

Page 66: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

• Interaction between object-space• Interaction among objects• Transfer semantics across views

Modeling relationships of objects across views

Page 67: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

71

• Interaction between object-space• Interaction among objects• Transfer semantics across views

Modeling relationships of objects across views

Page 68: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

72

Semantic structure from motion

Page 69: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

73

Semantic structure from motion

Page 70: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

YCO

YCB

YCQ

Fact

or

grap

h

74

Semantic structure from motion

Page 71: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

SSFM: point-level compatibility

YCQ

75

Page 72: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

• Tomasi & Kanade ‘92• Triggs et al ’99• Soatto & Perona 99• Hartley & Zisserman 00• Dellaert et al. 00

Point re-projection error

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

SSFM: point-level compatibility

projection

observation

• Pollefeys & V. Gool 02• Nister 04• Brown & Lowe 07• Snavely et al. 0876

Page 73: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

SSFM: Object-level compatibility

YCO

77

YCB

YCQ

Page 74: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

SSFM: Object-level compatibility

YCO

Object “re-projection” error

78

Page 75: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Camera 1 Camera 2

• Agreement with measurements is computed using position, pose and scale

SSFM: Object-level compatibility

79

Page 76: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Camera 1 Camera 2

• Agreement with measurements is computed using position, pose and scale

SSFM: Object-level compatibility

80

Page 77: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

SSFM with interactions

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

YOB

YQB

YQO

YCO

YCB

YCQ

Bao, Bagra, Chao, Savarese CVPR 2012

• Interactions of points, regions and objects across views• Interactions among object-regions-points

Page 78: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

SSFM with interactions

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

Object-Region Interactions:

Page 79: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

SSFM with interactions

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

Object-Region Interactions:

Page 80: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

SSFM with interactions

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

Object-point Interactions:

xx

Page 81: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

SSFM with interactions

•Measurements I• Points (x,y,scale)

• Objects (x,y, scale, pose)

• Regions (x,y, pose)

•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions

• C = cam. prm. K, R, T

Object-point Interactions:

xx

Page 82: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Solving the SSFM problem

• Modified Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) sampling algorithm

• Initialization of the cameras, objects, and points are critical for the sampling

• Initialize configuration of cameras using:• SFM• consistency of object/region properties across views

[Dellaert et al., 2000]

86

Page 83: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

87

ResultsFO

RD

CA

MP

US

dat

aset

[Pa

nd

ey e

t al

., 0

9]

Input images

• Wide baseline• Background clutter• Limited visibility• Un-calibrated cameras

Page 84: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Results

…Input images

88

Car Person Tree Sky

Street Building Else

Page 85: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Results

…Input images

89

Car Person Tree Sky

Street Building Else

Page 86: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

90

Results

Input images

90 Monitor Bottle Mug

Wall Desk Else

Fro

m t

he

off

ice

dat

aset

[B

ao e

t al

., 1

1]

Page 87: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Results

91

Monitor Bottle Mug

Wall Desk Else

Input images

Fro

m t

he

off

ice

dat

aset

[B

ao e

t al

., 1

1]

Page 88: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Results

DPM [1] SSFM 2 viewsno int.

SSFM2 views

SSFM 4 views

54.5% 61.3% 62.8% 66.5%

FORD CAMPUS dataset [Pandey et al., 09]Office dataset [Bao et al., 11]

Average precision in detecting objects in the 2D image

Average precision in localizing objects in the 3D space

Hoiemet al. 2011

SSFMno int.

SSFM

FORD CAMPUS 21.4% 32.7% 43.1%

OFFICE 15.5% 20.2% 21.6%

92[1] Felzenszwalb et al. 2008

Page 89: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Results

FORD CAMPUS dataset [Pandey et al., 09]Office dataset [Bao et al., 11]Street dataset [Bao et al., 11]

Camera translation error

SFMSnavelyet al., 08

SSFMno int.

SSFM

FORD CAMPUS 26.5 19.9 12.1

OFFICE 8.5 4.7 4.2

STREET 27.1 17.6 11.4

Camera rotation error

SFMSnavelyet al., 08

SSFMno int.

SSFM

<1 <1 <1

9.6 4.2 3.5

21.1 3.1 3.0

93

Page 90: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Wide-baseline feature correspondence

xxx

x

Page 91: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

SSFM

Camera Pose Estimation v.s. Base Line Width

FORD dataset

Camera baseline [m]

(eT)

Err

or

(Deg

ree

)

SSFM +

SSFM

Bundler [1]

SSFM Source code available!Please visit: http://www.eecs.umich.edu/vision/research.html

Page 92: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Applications

96

Robotic navigation

Mobile vision

Construction monitoring

Safe driving

Page 93: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Visual intelligence and large scale information management

Golparvar-Fard, Pena-Mora, Savarese , 2008-2012

Page 94: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

12/02/2006; 1:13:00 PM (As-built)

Page 95: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

99

Behind ScheduleOn ScheduleAhead of Schedule

12/02/2006; 1:13:00 PM (As-planned)

12/02/2006; 1:13:00 PM (As-planned)12/02/2006; 1:13:00 PM (As-built)

Page 96: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

100

Behind ScheduleOn ScheduleAhead of Schedule

12/02/2006; 1:13:00 PM (As-planned)

12/02/2006; 1:13:00 PM (As-planned)12/02/2006; 1:13:00 PM (As-built)

Page 97: Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,

Hope you enjoyed this course

Good luck on your presentations

on next week!