3D Photography: Structure from Motion
3D Photography:Structure from Motion
Feb 17 Introduction
Feb 24 Geometry, Camera Model, Calibration
Mar 3 Features, Tracking / Matching
Mar 10 Project Proposals by Students
Mar 17 Structure from Motion (SfM) + 1 papers
Mar 24 Dense Correspondence (stereo / optical flow) + 1 papers
Mar 31 Bundle Adjustment & SLAM + 1 papers
Apr 7 Multi-View Stereo & Volumetric Modeling + 1 papers
Apr 14 Project Updates
Apr 21 Easter
Apr 28 3D Modeling with Depth Sensors + 1 papers
May 5 3D Scene Understanding
May 12 4D Video & Dynamic Scenes + 1 papers
May 19 Guest lecture: KinectFusion by Shahram Izadi
May 26 Final Demos
Schedule (tentative)
Structure from Motion
• Two view reconstruction
• Epipolar geometry computation
• Triangulation
• Adding more views
• Pose estimation
(i) Correspondence geometry: Given an image point x in
the first image, how does this constrain the position of the
corresponding point x’ in the second image?
(ii) Camera geometry (motion): Given a set of corresponding
image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?
(iii) Scene geometry (structure): Given corresponding image
points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?
Three questions:
Two-view geometry
The epipolar geometry
C,C’,x,x’ and X are coplanar
The epipolar geometry
What if only C,C’,x are known?
The epipolar geometry
All points on π project on l and l’
The epipolar geometry
Family of planes π and lines l and l’Intersection in e and e’
The epipolar geometry
epipoles e,e’= intersection of baseline with image plane = projection of projection center in other image= vanishing point of camera motion direction
an epipolar plane = plane containing baseline (1-D family)
an epipolar line = intersection of epipolar plane with image(always come in corresponding pairs)
Example: converging cameras
Example: motion parallel with image plane
(simple for stereo → rectification)
The fundamental matrix F
algebraic representation of epipolar geometry
l'x a
we will see that mapping is (singular) correlation (i.e. projective mapping from points to lines) represented by the fundamental matrix F
The fundamental matrix F
geometric derivation
xHx'π
=
x'e'l' ×= [ ] FxxHe'π
== ×
mapping from 2-D to 1-D family (rank 2)
The fundamental matrix F
algebraic derivation
( ) λCxPλX += + ( )IPP =+
[ ] +
×= PP'e'F
xPP'CP'l' +×=
(note: doesn’t work for C=C’⇒ F=0)
xP+
( )λX
The fundamental matrix F
correspondence condition
0Fxx'T =
The fundamental matrix satisfies the condition
that for any pair of corresponding points x↔x’ in the two images
( )0l'x'T =
The fundamental matrix F - recap
F is the unique 3x3 rank 2 matrix that
satisfies x’TFx=0 for all x↔x’
(i) Transpose: if F is fundamental matrix for (P,P’), then FT is fundamental matrix for (P’,P)
(ii) Epipolar lines: l’=Fx & l=FTx’(iii) Epipoles: on all epipolar lines, thus e’TFx=0, ∀x
⇒e’TF=0, similarly Fe=0(iv) F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2)(v) F is a correlation, projective mapping from a point x to
a line l’=Fx (not a proper correlation, i.e. not invertible)
Computation of F
• Linear (8-point)
• Minimal (7-point)
• Calibrated (5-point) (Essential matrix)
• Practical two-view geometry computation
Epipolar geometry: basic equation
0Fxx'T =
separate known from unknown
0'''''' 333231232221131211 =++++++++ fyfxffyyfyxfyfxyfxxfx
[ ][ ] 0,,,,,,,,1,,,',',',',','T
333231232221131211 =fffffffffyxyyyxyxyxxx
(data) (unknowns)
(linear)
0Af =
0f1''''''
1'''''' 111111111111
=
nnnnnnnnnnnn yxyyyxyxyxxx
yxyyyxyxyxxx
MMMMMMMMM
0
1´´´´´´
1´´´´´´
1´´´´´´
33
32
31
23
22
21
13
12
11
222222222222
111111111111
=
f
f
f
f
f
f
f
f
f
yxyyyyxxxyxx
yxyyyyxxxyxx
yxyyyyxxxyxx
nnnnnnnnnnnn
MMMMMMMMM
~10000 ~10000 ~10000 ~10000~100 ~100 1~100 ~100
!Orders of magnitude difference
between column of data matrix
→ least-squares yields poor results
the NOT normalized 8-point algorithm
Transform image to ~[-1,1]x[-1,1]
(0,0)
(700,500)
(700,0)
(0,500)
(1,-1)
(0,0)
(1,1)(-1,1)
(-1,-1)
−
−
1
1500
2
10700
2
normalized least squares yields good results(Hartley, PAMI´97)
the normalized 8-point algorithm
the singularity constraint
0Fe'T = 0Fe = 0detF = 2Frank =
T
333
T
222
T
111
T
3
2
1
VσUVσUVσUV
σ
σ
σ
UF ++=
=
SVD from linearly computed F matrix (rank 3)
T
222
T
111
T
2
1
VσUVσUV
0
σ
σ
UF' +=
=
FF'-FminCompute closest rank-2 approximation
the minimum case – 7 point correspondences
0f1''''''
1''''''
777777777777
111111111111
=
yxyyyxyxyxxx
yxyyyxyxyxxx
MMMMMMMMM
( ) T
9x9717x7 V0,0,σ,...,σdiagUA =
9x298 0]VA[V =⇒ ( )T
8
T ] 000000010[Ve.g.V =
1...70,)xλFF(x 21
T=∀=+ iii
one parameter family of solutions
but F1+λF2 not automatically rank 2
F1 F2
F
σ3
F7pts
0λλλ)λFFdet( 01
2
2
3
321 =+++=+ aaaa
(obtain 1 or 3 solutions)
(cubic equation)
0)λIFFdet(Fdet)λFFdet( 1
-1
2221 =+=+
the minimum case – impose rank 2
Compute possible λ as eigenvalues of
(only real solutions are potential solutions)1
-1
2 FF
( ) ( ) ( )( )B.detAdetABdet =
• Linear equations for 5 points
• Linear solution space
• Non-linear constraints
Calibrated case:5-point relative motion
E = xX + yY + zZ + wW
detE = 0 10 cubic polynomials
w = 1scale does not matter, choose
(Nister, CVPR03)
′x1x1′x1y1
′x11′y1x1
′y1y1′y1 x1 y1 1
′x2x2′x2y2
′x21′y2x2
′y2y2′y2 x2 y2 1
′x3x3′x3y3
′x31′y3x3
′y3y3′y3 x3 y3 1
′x4x4′x4y4
′x41′y4x4
′y4y4′y4 x4 y4 1
′x5x5′x5y5
′x51′y5x5
′y5y5′y5 x5 y5 1
E11
E12
E13
E21
E22
E23
E31
E32
E33
= 0
(assumes normalized coordinates)
Calibrated case:5-point relative motion
• Perform Gauss-Jordan elimination on polynomials
-z
-z
-z
[n] represents polynomial of degree n in z
(Nister, CVPR03)
Step 1. Extract features
Step 2. Compute a set of potential matches
Step 3. doStep 3.1 select minimal sample (i.e. 7 or 5 matches)
Step 3.2 compute solution(s) for F
Step 3.3 determine inliers
until Γ(#inliers,#samples)<95%
( ) samples#7)1(1
matches#
inliers#−−=Γ
#inliers 90% 80% 70% 60% 50%
#samples 5 13 35 106 382
Step 4. Compute F based on all inliers
Step 5. Look for additional matches
Step 6. Refine F based on all correct matches
(generate
hypothesis)
(verify hypothesis)
}
Automatic computation of F
RANSAC
restrict search range to neighborhood of epipolar line (e.g. ±1.5 pixels)
relax disparity restriction (along epipolar line)
Finding more matches
(i) Correspondence geometry: Given an image point x in
the first image, how does this constrain the position of the
corresponding point x’ in the second image?
(ii) Camera geometry (motion): Given a set of corresponding
image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?
(iii) Scene geometry (structure): Given corresponding image
points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?
Three questions:
Two-view geometry
Initial structure and motion
[ ]
[ ][ ]eeaFeP
0IP
Tx +=
=
2
1
Epipolar geometry ↔ Projective calibration
012 =FmmT
compatible with F
Yields correct projective camera setup(Faugeras´92,Hartley´92)
Obtain structure through triangulation
Use reprojection error for minimizationAvoid measurements in projective space
Initial structure and motion(calibrated case)
Essential Matrix:
Essential Matrix decomposition
Recover R and t from E
use or
use orambiguity
P1 = I 0
P2 = R t
(e.g. see Hartley and Zisserman, Sec.8.6)
(i) Correspondence geometry: Given an image point x in
the first image, how does this constrain the position of the
corresponding point x’ in the second image?
(ii) Camera geometry (motion): Given a set of corresponding
image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?
(iii) Scene geometry (structure): Given corresponding image
points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?
Three questions:
Two-view geometry
Triangulation
C1x1
L1
x2
L2
X
C2
Triangulation
- calibration
- correspondences
Triangulation
• Backprojection
• Triangulation
Iterative least-squares
• Maximum Likelihood Triangulation (geometric error)
C1 x1L1
x2
L2
X
C
2
Optimal 3D point in epipolar plane
• Given an epipolar plane, find best 3D point for (m1,m2)
m1
m2
l1 l2
l1m1
m2l2
m1´
m2´
Select closest points (m1´,m2´) on epipolar lines
Obtain 3D point through exact triangulation
Guarantees minimal reprojection error (given this epipolar plane)
Non-iterative optimal solution
• Reconstruct matches in projective frame by minimizing the reprojection error
• Non-iterative method
Determine the epipolar plane for reconstruction
Reconstruct optimal point from selected epipolar plane Note: only works for two views
( ) ( )2
22
2
11 ,, MPmMPm DD +
(Hartley and Sturm, CVIU´97)
( )( ) ( )( )2
22
2
11 ,, α+α lmlm DD (polynomial of degree 6)
m1
m2
l1(α) l2(α)
3DOF
1DOF
Initialize Motion (P1,P2 compatibel with F or E)
Sequential Structure and Motion Computation
Initialize Structure (minimize reprojection error)
Sequential structure and motion recovery
• Initialize structure and motion from two views
• For each additional view
• Determine pose
• Refine and extend structure
• Determine correspondences robustly by jointly estimating matches and epipolar geometry
Compute Pi+1 using robust approach (6-point RANSAC)
Extend and refine reconstruction
)x,...,X(xPx 11 −= iii
2D-2D
2D-3D 2D-3D
mimi+1
M
new view
Determine pose towards existing structure
Compute P with 6-point RANSAC
• Generate hypothesis using 6 points
• Planar scenes are degerate!
(similar DLT algorithm as see in 2nd lecture for homographies)
(two equations per point)
Three points perspective pose –p3p (calibrated case)
(Haralick et al., IJCV94)
All techniques yield 4th order polynomial
19031841
Initialize Motion (P1,P2 compatibel with F or E)
Sequential Structure and Motion Computation
Initialize Structure (minimize reprojection error)
Extend motion(compute pose through matches seen in 2 or more previous views)
Extend structure(Initialize new structure,refine existing structure)
Changchang’’’’s SfM code
for iconic graph• uses 5-point+RANSAC for 2-view initialization• uses 3-point+RANSAC for adding views• performs bundle adjustmentFor additional images• use 3-point+RANSAC pose estimation
http://ccwu.me/vsfm/
Rome on a cloudless day(Frahm et al. ECCV 2010)
GIST & clustering (1h35)
SIFT & Geometric verification (11h36)
SfM & Bundle (8h35)
Dense Reconstruction (1h58)
Some numbers
• 1PC
• 2.88M images
• 100k clusters
• 22k SfM with 307k images
• 63k 3D models
• Largest model 5700 images
• Total time 23h53
Hierarchical structure and motion recovery
• Compute 2-view
• Compute 3-view
• Stitch 3-view reconstructions
• Merge and refine reconstruction
F
T
H
PM
Stitching 3-view reconstructions
Different possibilities1. Align (P2,P3) with (P’1,P’2) ( ) ( )-1
23
-1
12H
HP',PHP',Pminarg AA dd +
2. Align X,X’ (and C’C’) ( )∑j
jjAd HX',XminargH
3. Minimize reproj. error ( )
( )∑
∑
+j
jj
j
jj
d
d
x',HXP'
x,X'PHminarg 1-
H
4. MLE (merge) ( )∑j
jjd x,PXminargXP,
Next week: Dense Correspondences