Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado Slides by: Gary Bradski, Intel Research and Stanford SAIL
Dec 20, 2015
Stanford CS223B Computer Vision, Winter 2006
Lecture 8 Structure From Motion
Professor Sebastian ThrunCAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado
Slides by: Gary Bradski, Intel Research and Stanford SAIL
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion
camera
features
Recover: structure (feature locations), motion (camera extrinsics)
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion (1)
[Tomasi & Kanade 92]
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion (2)
[Tomasi & Kanade 92]
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion (3)
[Tomasi & Kanade 92]
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion (4a): Images
Marc Pollefeys
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion (4b)
Marc Pollefeys
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion
Problem 1:– Given n points pij =(xij, yij) in m images
– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)
– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)
Problem 2:– Establish correspondence: c(pij)
Sebastian Thrun Stanford University CS223B Computer Vision
SFM: General Formulation
iz
jz
jy
jx
ii
ii
ii
ii
iy
ix
jz
jy
jx
ii
ii
ii
ii
ii
ii
jy
jx
b
P
P
P
b
b
P
P
P
fp
p
,
,
,
,
,
,
,
,
,
,
,
cossin0
sincos0
001
cos0sin
010
sin0cos
100
cossin0
sincos0
001
cos0sin
010
sin0cos
0cossin
0sincos
fZ Z
fXx
XO
-x
Sebastian Thrun Stanford University CS223B Computer Vision
SFM: Bundle Adjustment
min
cossin0
sincos0
001
cos0sin
010
sin0cos
100
cossin0
sincos0
001
cos0sin
010
sin0cos
0cossin
0sincos
2
,
,
,
,
,
,
,
,
,
,
,
,
ji
iz
jz
jy
jx
ii
ii
ii
ii
iy
ix
jz
jy
jx
ii
ii
ii
ii
ii
ii
jy
jx
b
P
P
P
b
b
P
P
P
fp
p
fZ Z
fXx
XO
-x
Sebastian Thrun Stanford University CS223B Computer Vision
Bundle Adjustment
SFM = Nonlinear Least Squares problem Minimize through
– Gradient Descent– Conjugate Gradient– Gauss-Newton– Levenberg Marquardt (!)
Prone to local minima
Sebastian Thrun Stanford University CS223B Computer Vision
Count # Constraints vs #Unknowns
m camera poses n points 2mn point constraints 6m+3n unknowns
Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???
Sebastian Thrun Stanford University CS223B Computer Vision
How Many Parameters Can’t We Recover?
0 3 6 7 8 10 12 n m nm
Place Your Bet!
We can recover all but…
Sebastian Thrun Stanford University CS223B Computer Vision
Count # Constraints vs #Unknowns
m camera poses n points 2mn point constraints 6m+3n unknowns
Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???
– Can’t recover origin, orientation (6 params)– Can’t recover scale (1 param)
Thus, we need 2mn 6m + 3n - 7
Sebastian Thrun Stanford University CS223B Computer Vision
Are done?
No, bundle adjustment has many local minima.
Sebastian Thrun Stanford University CS223B Computer Vision
The “Trick Of The Day”
Replace Perspective by Orthographic Geometry Replace Euclidean Geometry by Affine Geometry Solve SFM linearly (“closed” form, globally optimal) Post-Process to make solution Euclidean Post-Process to make solution perspective
By Tomasi and Kanade, 1992
Sebastian Thrun Stanford University CS223B Computer Vision
Orthographic Camera Model
Limit of Pinhole Model:
z
y
x
z
y
x
z
y
x
b
b
b
P
P
P
aaa
aaa
aaa
p
p
p
333231
232221
131211
Extrinsic Parameters
Rotation
Orthographic Projection bAPb
b
P
P
P
a
a
a
a
a
a
p
p
y
x
Z
Y
X
y
x
23
13
22
12
21
11
Sebastian Thrun Stanford University CS223B Computer Vision
Orthographic Projection
Limit of Pinhole Model:
Orthographic Projection
1||
1||
0
22
21
21
a
a
aa
rotation is
333231
232221
131211
aaa
aaa
aaa
ijij bPAp
featurejcamerai
bAPb
b
P
P
P
a
a
a
a
a
a
p
p
y
x
Z
Y
X
y
x
23
13
22
12
21
11
Sebastian Thrun Stanford University CS223B Computer Vision
The Orthographic SFM Problem
}{ and },{recover jPii bA
ijij bPAp featurejcamerai 1||
1||
0
22
21
21
a
a
aa
subject to
Sebastian Thrun Stanford University CS223B Computer Vision
The Affine SFM Problem
}{ and },{recover jPii bA
ijij bPAp featurejcamerai 1||
1||
0
22
21
21
a
a
aa
subject todrop theconstraints
Sebastian Thrun Stanford University CS223B Computer Vision
Count # Constraints vs #Unknowns
m camera poses n points 2mn point constraints 8m+3n unknowns
Suggests: need 2mn 8m + 3n But: Can we really recover all parameters???
ijij bPAp featurejcamerai
Sebastian Thrun Stanford University CS223B Computer Vision
How Many Parameters Can’t We Recover?
0 3 6 7 8 10 12 n m nm
Place Your Bet!
We can recover all but…
Sebastian Thrun Stanford University CS223B Computer Vision
The Answer is (at least): 12
ijij bPAp ''' ijij bPAp
dCPCP jj11'
CAA ii '
iii bdAb 'singular-non , Cd
iijij bdAdCPCCAp ))(( :Proof 11
iji bPA
iiiji bdAdAPA
Sebastian Thrun Stanford University CS223B Computer Vision
Points for Solving Affine SFM Problem
m camera poses n points
Need to have: 2mn 8m + 3n-12
Sebastian Thrun Stanford University CS223B Computer Vision
Affine SFM
jij PAp
Fix coordinate systemby making p0=origin
m
j
p
p
q 1
mA
A
A 1
jj APqm :cameras
ADQn :points
NPPD 1
mn
n
m p
p
p
p
Q
1
1
11
ijij bPAp
Proof:
3m2 size has A
Rank Theorem: Q has rank 3
nD 3 size has
Sebastian Thrun Stanford University CS223B Computer Vision
The Rank Theorem
3rank has
1
1
1
1
11
11
Nyy
Nxx
Nyy
Nxx
MM
MM
pp
pp
pp
pp
n elements
2m
ele
me
nts
Sebastian Thrun Stanford University CS223B Computer Vision
Singular Value Decomposition
T
Nyy
Nxx
Nyy
Nxx
VWU
pp
pp
pp
pp
MM
MM
1
1
1
1
11
11
n332 m 33
Sebastian Thrun Stanford University CS223B Computer Vision
Affine Solution to Orthographic SFM
structure affine TWV
positions camera affine U
Gives also the optimal affine reconstruction under noise
Sebastian Thrun Stanford University CS223B Computer Vision
Back To Orthographic Projection
1||
1||
0
sConstraint
22
21
21
a
a
aa
matrix singular -non , vector Cd
with
Find C and d for which constraints are metSearch in 12-dim space (instead of 8m + 3n-12)
''' ijij bPAp
dCPCP jj11'
ii CAA '
iii bdAb '
Sebastian Thrun Stanford University CS223B Computer Vision
Back To Projective Geometry
Orthographic (in the limit)
Projective
Sebastian Thrun Stanford University CS223B Computer Vision
Back To Projective Geometry
min
cossin0
sincos0
001
cos0sin
010
sin0cos
100
cossin0
sincos0
001
cos0sin
010
sin0cos
0cossin
0sincos
2
,
,
,
,
,
,
,
,
,
,
,
,
ji
iz
jz
jy
jx
ii
ii
ii
ii
iy
ix
jz
jy
jx
ii
ii
ii
ii
ii
ii
jy
jx
b
P
P
P
b
b
P
P
P
fp
p
fZ Z
fXx
XO
-x
Optimize
Using orthographic solution as starting point
Sebastian Thrun Stanford University CS223B Computer Vision
The “Trick Of The Day”
Replace Perspective by Orthographic Geometry Replace Euclidean Geometry by Affine Geometry Solve SFM linearly (“closed” form, globally optimal) Post-Process to make solution Euclidean Post-Process to make solution perspective
By Tomasi and Kanade, 1992
Sebastian Thrun Stanford University CS223B Computer Vision
Structure From Motion
Problem 1:– Given n points pij =(xij, yij) in m images
– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)
– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)
Problem 2:– Establish correspondence: c(pij)
Sebastian Thrun Stanford University CS223B Computer Vision
The Correspondence Problem
View 1 View 3View 2
Sebastian Thrun Stanford University CS223B Computer Vision
Correspondence: Solution 1
Track features (e.g., optical flow)
…but fails when images taken from widely different poses
Sebastian Thrun Stanford University CS223B Computer Vision
Correspondence: Solution 2
Start with random solution A, b, P Compute soft correspondence: p(c|A,b,P) Plug soft correspondence into SFM Reiterate
See Dellaert/Seitz/Thorpe/Thrun, Machine Learning Journal, 2003
Sebastian Thrun Stanford University CS223B Computer Vision
Correspondence: Alternative Approach
Ransac [Fisher/Bolles]
= Random sampling and consensus
Sebastian Thrun Stanford University CS223B Computer Vision
Summary SFM
Problem– Determine feature locations (=structure)– Determine camera extrinsic (=motion)
Two Principal Solutions– Bundle adjustment (nonlinear least squares, local minima)– SVD (through orthographic approximation, affine geometry)
Correspondence– (RANSAC)– Expectation Maximization