Lecture 16 - Silvio Savarese 4-Mar-15 • In-class presentations next week! • Monday March 9 th ; 11am-12:15pm (in class) • Tuesday March 10 th ; 1:30pm-3:30pm (Bishop Auditorium) • Wednesday March 11 th ; 11am-12:15pm (in class) • 2 ½ minutes for each presentation + Q&A • It’s a team presentation • In-class or Piazza questions count toward attendance evaluation • Best presentation award • See Piazza and website for more information Announcements
97
Embed
Announcements - Stanford Universitycvgl.stanford.edu/teaching/cs231a_winter1415/lecture/lecture16... · Vidal-Naquet & Ullman 02 Fergus et al., 03 Torralba et al., 03 Vogel & Schiele,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Lecture 16 -Silvio Savarese 4-Mar-15
• In-class presentations next week!• Monday March 9th ; 11am-12:15pm (in class)• Tuesday March 10th ; 1:30pm-3:30pm (Bishop Auditorium)• Wednesday March 11th; 11am-12:15pm (in class)
• 2 ½ minutes for each presentation + Q&A• It’s a team presentation• In-class or Piazza questions count toward attendance evaluation• Best presentation award • See Piazza and website for more information
Announcements
Lecture 16 -Silvio Savarese 4-Mar-15
• Thanks for the online evaluation
• Your feedback is extremely important! (see lecture notes )
Announcements
Lecture 16 -Silvio Savarese 4-Mar-15
• Shameless advertisement
• The course surveys recent developments in computer vision, graphics and image processing for mobile applications
• Three problems sets (each related to a toy mobile application).
• Course project: Extend one of the toy applications
• Latest Nvidia Shield Tablets will be assigned to each student
• No midterm; no final
New course:231M: Mobile Computer Vision
Lecture 16 -Silvio Savarese 4-Mar-15
• Datasets in computer vision• 3D scene understanding
Lecture 16Closure
Caltech 101
• Pictures of objects belonging to 101 categories.
• About 40 to 800 images per category. Most categories have about 50 images.
• The size of each image is roughly 300 x 200 pixels.
Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. L. Fei-Fei, R.
Fergus, and P. Perona. CVPR 2004, Workshop on Generative-Model Based Vision. 2004
Turk & Pentland, 91Poggio et al., 93Belhumeur et al., 97LeCun et al. 98Amit and Geman, 99Shi & Malik, 00Viola & Jones, 00Felzenszwalb & Huttenlocher 00Belongie & Malik, 02Ullman et al. 02
Argawal & Roth, 02Ramanan & Forsyth, 03Weber et al., 00Vidal-Naquet & Ullman 02Fergus et al., 03Torralba et al., 03Vogel & Schiele, 03Barnard et al., 03Fei-Fei et al., 04Kumar & Hebert ‘04
He et al. 06Gould et al. 08Maire et al. 08Felzenszwalb et al., 08Kohli et al. 09L.-J. Li et al. 09Ladicky et al. 10,11Gonfaus et al. 10Farhadi et al., 09 Lampert et al., 09 47
Turk & Pentland, 91Poggio et al., 93Belhumeur et al., 97LeCun et al. 98Amit and Geman, 99Shi & Malik, 00Viola & Jones, 00Felzenszwalb & Huttenlocher 00Belongie & Malik, 02Ullman et al. 02
Argawal & Roth, 02Ramanan & Forsyth, 03Weber et al., 00Vidal-Naquet & Ullman 02Fergus et al., 03Torralba et al., 03Vogel & Schiele, 03Barnard et al., 03Fei-Fei et al., 04Kumar & Hebert ‘04
He et al. 06Gould et al. 08Maire et al. 08Felzenszwalb et al., 08Kohli et al. 09L.-J. Li et al. 09Ladicky et al. 10,11Gonfaus et al. 10Farhadi et al., 09 Lampert et al., 09
3D reconstruction 2D recognition
49
Perceiving the World in 3D
Current state of computer vision
• Modeling objects and their 3D properties• Modeling interaction among objects and space• Modeling relationships of object/space across views
Outline
50
• Modeling objects and their 3D properties
• Modeling interaction among objects and space
• Modeling relationships of objects across views
Detecting objects and estimating their 3D properties
5151
3D
ob
ject
dat
aset
[Sa
vare
se &
Fei
-Fei
07
]
CAR a=330 e=15 d=7 CAR a=150 e=15 d=7
MOUSE a=300 e=45 d=23 SHOE a=240 e=45 d=11
Results
52
Imag
eNet
dat
aset
[D
eng
et a
l. 2
01
0]
BED a=30 e=15 d=2.5
CHAIR a=0 e=30 d=7
SOFA a=345 e=15 d=3.5a=60 e-30 d=2.5
TABLE a=60 e=15 d=2
Results
53
SHOE Pose: back-left-side
ResultsExamples of failure (wrong category)
54
This can’t be a shoe!
Outline
55
• Modeling objects and their 3D properties
• Modeling interaction among objects and space
• Modeling relationships of objects across views
56
Scene understanding is an interplay between objects and space
57
3D space is shaped by its objects
58
Objects are placed into 3D space
A first attempt….
Interactions object-ground Hoiem et al, 2006-2008
Lab
elm
ed
atas
et [
Ru
ssel
l et
al.,
08
]
A first attempt….
60
Bao et al. CVPR 2010; BMVC 2010; CIVC 2011; IJCV 2012
Hoiem et al, 2006Herdau et al., 2009Gupta et al, 2010Fouhey et al, 2012
Desai et al, 2009Sadeghi & Farhardi, 2011Li et al, 2012- Object-object
Choi, et al., CVPR 13
3D Geometric Phrases
63
A 3DGP encodes geometric and semantic relationships between groups of objects and space elements which frequently co-occur in spatially consistent configurations.
64
Training Dataset 3DGPs
• W/o annotations• Compact
Using Max-Margin learning
w/ novel Latent Completion algorithm
3D Geometric PhrasesChoi, Chao, Pantofaru, Savarese, CVPR 13
• View-invariant
Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table
Estimated Layout 3D Geometric Phrases
Results
66
Ind
oo
r sc
ene
dat
aset
[C
ho
i et
al.,
12
]
67
Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table
Estimated Layout 3D Geometric Phrases
ResultsIn
do
or
scen
e d
atas
et [
Ch
oi e
t al
., 1
2]
Results: Object Detection
68
56.9
65.2
20
30
40
50
60
70
80
90
100
Sofa Table Chair Bed D. Table S. Table Overall
Average Precision %
Felzenszwalb et al. 3DGP
+10.5%
+15.7%
Ind
oo
r sc
ene
dat
aset
[C
ho
i et
al.,
12
]
Outline
69
• Modeling objects and their 3D properties
• Modeling interaction among objects and space
• Modeling relationships of objects across views
• Interaction between object-space• Interaction among objects• Transfer semantics across views
Modeling relationships of objects across views
71
• Interaction between object-space• Interaction among objects• Transfer semantics across views
Modeling relationships of objects across views
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
72
Semantic structure from motion
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
73
Semantic structure from motion
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
YCO
YCB
YCQ
Fact
or
grap
h
74
Semantic structure from motion
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
SSFM: point-level compatibility
YCQ
75
• Tomasi & Kanade ‘92• Triggs et al ’99• Soatto & Perona 99• Hartley & Zisserman 00• Dellaert et al. 00
Point re-projection error
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
SSFM: point-level compatibility
projection
observation
• Pollefeys & V. Gool 02• Nister 04• Brown & Lowe 07• Snavely et al. 0876
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
SSFM: Object-level compatibility
YCO
77
YCB
YCQ
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
SSFM: Object-level compatibility
YCO
Object “re-projection” error
78
Camera 1 Camera 2
• Agreement with measurements is computed using position, pose and scale
SSFM: Object-level compatibility
79
Camera 1 Camera 2
• Agreement with measurements is computed using position, pose and scale
SSFM: Object-level compatibility
80
SSFM with interactions
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
YOB
YQB
YQO
YCO
YCB
YCQ
Bao, Bagra, Chao, Savarese CVPR 2012
• Interactions of points, regions and objects across views• Interactions among object-regions-points
SSFM with interactions
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
Object-Region Interactions:
SSFM with interactions
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
Object-Region Interactions:
SSFM with interactions
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
Object-point Interactions:
xx
SSFM with interactions
•Measurements I• Points (x,y,scale)
• Objects (x,y, scale, pose)
• Regions (x,y, pose)
•Model Parameters:• Q = 3D points• O = 3D objects• B = 3D regions
• C = cam. prm. K, R, T
Object-point Interactions:
xx
Solving the SSFM problem
• Modified Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) sampling algorithm
• Initialization of the cameras, objects, and points are critical for the sampling
• Initialize configuration of cameras using:• SFM• consistency of object/region properties across views