3D Photography: Structure from Motion - CVG @ ETHZ · 2014-03-17 · 3D Photography: Structure from Motion. Feb 17 Introduction Feb 24 Geometry, Camera Model, Calibration Mar 3 Features,

Post on 11-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

3D Photography:Structure from Motion

Feb 17 Introduction

Feb 24 Geometry, Camera Model, Calibration

Mar 3 Features, Tracking / Matching

Mar 10 Project Proposals by Students

Mar 17 Structure from Motion (SfM) + 1 papers

Mar 24 Dense Correspondence (stereo / optical flow) + 1 papers

Mar 31 Bundle Adjustment & SLAM + 1 papers

Apr 7 Multi-View Stereo & Volumetric Modeling + 1 papers

Apr 14 Project Updates

Apr 21 Easter

Apr 28 3D Modeling with Depth Sensors + 1 papers

May 5 3D Scene Understanding

May 12 4D Video & Dynamic Scenes + 1 papers

May 19 Guest lecture: KinectFusion by Shahram Izadi

May 26 Final Demos

Schedule (tentative)

Structure from Motion

• Two view reconstruction

• Epipolar geometry computation

• Triangulation

• Adding more views

• Pose estimation

(i) Correspondence geometry: Given an image point x in

the first image, how does this constrain the position of the

corresponding point x’ in the second image?

(ii) Camera geometry (motion): Given a set of corresponding

image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?

(iii) Scene geometry (structure): Given corresponding image

points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?

Three questions:

Two-view geometry

The epipolar geometry

C,C’,x,x’ and X are coplanar

The epipolar geometry

What if only C,C’,x are known?

The epipolar geometry

All points on π project on l and l’

The epipolar geometry

Family of planes π and lines l and l’Intersection in e and e’

The epipolar geometry

epipoles e,e’= intersection of baseline with image plane = projection of projection center in other image= vanishing point of camera motion direction

an epipolar plane = plane containing baseline (1-D family)

an epipolar line = intersection of epipolar plane with image(always come in corresponding pairs)

Example: converging cameras

Example: motion parallel with image plane

(simple for stereo → rectification)

The fundamental matrix F

algebraic representation of epipolar geometry

l'x a

we will see that mapping is (singular) correlation (i.e. projective mapping from points to lines) represented by the fundamental matrix F

The fundamental matrix F

geometric derivation

xHx'π

=

x'e'l' ×= [ ] FxxHe'π

== ×

mapping from 2-D to 1-D family (rank 2)

The fundamental matrix F

algebraic derivation

( ) λCxPλX += + ( )IPP =+

[ ] +

×= PP'e'F

xPP'CP'l' +×=

(note: doesn’t work for C=C’⇒ F=0)

xP+

( )λX

The fundamental matrix F

correspondence condition

0Fxx'T =

The fundamental matrix satisfies the condition

that for any pair of corresponding points x↔x’ in the two images

( )0l'x'T =

The fundamental matrix F - recap

F is the unique 3x3 rank 2 matrix that

satisfies x’TFx=0 for all x↔x’

(i) Transpose: if F is fundamental matrix for (P,P’), then FT is fundamental matrix for (P’,P)

(ii) Epipolar lines: l’=Fx & l=FTx’(iii) Epipoles: on all epipolar lines, thus e’TFx=0, ∀x

⇒e’TF=0, similarly Fe=0(iv) F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2)(v) F is a correlation, projective mapping from a point x to

a line l’=Fx (not a proper correlation, i.e. not invertible)

Computation of F

• Linear (8-point)

• Minimal (7-point)

• Calibrated (5-point) (Essential matrix)

• Practical two-view geometry computation

Epipolar geometry: basic equation

0Fxx'T =

separate known from unknown

0'''''' 333231232221131211 =++++++++ fyfxffyyfyxfyfxyfxxfx

[ ][ ] 0,,,,,,,,1,,,',',',',','T

333231232221131211 =fffffffffyxyyyxyxyxxx

(data) (unknowns)

(linear)

0Af =

0f1''''''

1'''''' 111111111111

=

nnnnnnnnnnnn yxyyyxyxyxxx

yxyyyxyxyxxx

MMMMMMMMM

0

1´´´´´´

1´´´´´´

1´´´´´´

33

32

31

23

22

21

13

12

11

222222222222

111111111111

=

f

f

f

f

f

f

f

f

f

yxyyyyxxxyxx

yxyyyyxxxyxx

yxyyyyxxxyxx

nnnnnnnnnnnn

MMMMMMMMM

~10000 ~10000 ~10000 ~10000~100 ~100 1~100 ~100

!Orders of magnitude difference

between column of data matrix

→ least-squares yields poor results

the NOT normalized 8-point algorithm

Transform image to ~[-1,1]x[-1,1]

(0,0)

(700,500)

(700,0)

(0,500)

(1,-1)

(0,0)

(1,1)(-1,1)

(-1,-1)

1

1500

2

10700

2

normalized least squares yields good results(Hartley, PAMI´97)

the normalized 8-point algorithm

the singularity constraint

0Fe'T = 0Fe = 0detF = 2Frank =

T

333

T

222

T

111

T

3

2

1

VσUVσUVσUV

σ

σ

σ

UF ++=

=

SVD from linearly computed F matrix (rank 3)

T

222

T

111

T

2

1

VσUVσUV

0

σ

σ

UF' +=

=

FF'-FminCompute closest rank-2 approximation

the minimum case – 7 point correspondences

0f1''''''

1''''''

777777777777

111111111111

=

yxyyyxyxyxxx

yxyyyxyxyxxx

MMMMMMMMM

( ) T

9x9717x7 V0,0,σ,...,σdiagUA =

9x298 0]VA[V =⇒ ( )T

8

T ] 000000010[Ve.g.V =

1...70,)xλFF(x 21

T=∀=+ iii

one parameter family of solutions

but F1+λF2 not automatically rank 2

F1 F2

F

σ3

F7pts

0λλλ)λFFdet( 01

2

2

3

321 =+++=+ aaaa

(obtain 1 or 3 solutions)

(cubic equation)

0)λIFFdet(Fdet)λFFdet( 1

-1

2221 =+=+

the minimum case – impose rank 2

Compute possible λ as eigenvalues of

(only real solutions are potential solutions)1

-1

2 FF

( ) ( ) ( )( )B.detAdetABdet =

• Linear equations for 5 points

• Linear solution space

• Non-linear constraints

Calibrated case:5-point relative motion

E = xX + yY + zZ + wW

detE = 0 10 cubic polynomials

w = 1scale does not matter, choose

(Nister, CVPR03)

′x1x1′x1y1

′x11′y1x1

′y1y1′y1 x1 y1 1

′x2x2′x2y2

′x21′y2x2

′y2y2′y2 x2 y2 1

′x3x3′x3y3

′x31′y3x3

′y3y3′y3 x3 y3 1

′x4x4′x4y4

′x41′y4x4

′y4y4′y4 x4 y4 1

′x5x5′x5y5

′x51′y5x5

′y5y5′y5 x5 y5 1

E11

E12

E13

E21

E22

E23

E31

E32

E33

= 0

(assumes normalized coordinates)

Calibrated case:5-point relative motion

• Perform Gauss-Jordan elimination on polynomials

-z

-z

-z

[n] represents polynomial of degree n in z

(Nister, CVPR03)

Step 1. Extract features

Step 2. Compute a set of potential matches

Step 3. doStep 3.1 select minimal sample (i.e. 7 or 5 matches)

Step 3.2 compute solution(s) for F

Step 3.3 determine inliers

until Γ(#inliers,#samples)<95%

( ) samples#7)1(1

matches#

inliers#−−=Γ

#inliers 90% 80% 70% 60% 50%

#samples 5 13 35 106 382

Step 4. Compute F based on all inliers

Step 5. Look for additional matches

Step 6. Refine F based on all correct matches

(generate

hypothesis)

(verify hypothesis)

}

Automatic computation of F

RANSAC

restrict search range to neighborhood of epipolar line (e.g. ±1.5 pixels)

relax disparity restriction (along epipolar line)

Finding more matches

(i) Correspondence geometry: Given an image point x in

the first image, how does this constrain the position of the

corresponding point x’ in the second image?

(ii) Camera geometry (motion): Given a set of corresponding

image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?

(iii) Scene geometry (structure): Given corresponding image

points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?

Three questions:

Two-view geometry

Initial structure and motion

[ ]

[ ][ ]eeaFeP

0IP

Tx +=

=

2

1

Epipolar geometry ↔ Projective calibration

012 =FmmT

compatible with F

Yields correct projective camera setup(Faugeras´92,Hartley´92)

Obtain structure through triangulation

Use reprojection error for minimizationAvoid measurements in projective space

Initial structure and motion(calibrated case)

Essential Matrix:

Essential Matrix decomposition

Recover R and t from E

use or

use orambiguity

P1 = I 0

P2 = R t

(e.g. see Hartley and Zisserman, Sec.8.6)

(i) Correspondence geometry: Given an image point x in

the first image, how does this constrain the position of the

corresponding point x’ in the second image?

(ii) Camera geometry (motion): Given a set of corresponding

image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views?

(iii) Scene geometry (structure): Given corresponding image

points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?

Three questions:

Two-view geometry

Triangulation

C1x1

L1

x2

L2

X

C2

Triangulation

- calibration

- correspondences

Triangulation

• Backprojection

• Triangulation

Iterative least-squares

• Maximum Likelihood Triangulation (geometric error)

C1 x1L1

x2

L2

X

C

2

Optimal 3D point in epipolar plane

• Given an epipolar plane, find best 3D point for (m1,m2)

m1

m2

l1 l2

l1m1

m2l2

m1´

m2´

Select closest points (m1´,m2´) on epipolar lines

Obtain 3D point through exact triangulation

Guarantees minimal reprojection error (given this epipolar plane)

Non-iterative optimal solution

• Reconstruct matches in projective frame by minimizing the reprojection error

• Non-iterative method

Determine the epipolar plane for reconstruction

Reconstruct optimal point from selected epipolar plane Note: only works for two views

( ) ( )2

22

2

11 ,, MPmMPm DD +

(Hartley and Sturm, CVIU´97)

( )( ) ( )( )2

22

2

11 ,, α+α lmlm DD (polynomial of degree 6)

m1

m2

l1(α) l2(α)

3DOF

1DOF

Initialize Motion (P1,P2 compatibel with F or E)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Sequential structure and motion recovery

• Initialize structure and motion from two views

• For each additional view

• Determine pose

• Refine and extend structure

• Determine correspondences robustly by jointly estimating matches and epipolar geometry

Compute Pi+1 using robust approach (6-point RANSAC)

Extend and refine reconstruction

)x,...,X(xPx 11 −= iii

2D-2D

2D-3D 2D-3D

mimi+1

M

new view

Determine pose towards existing structure

Compute P with 6-point RANSAC

• Generate hypothesis using 6 points

• Planar scenes are degerate!

(similar DLT algorithm as see in 2nd lecture for homographies)

(two equations per point)

Three points perspective pose –p3p (calibrated case)

(Haralick et al., IJCV94)

All techniques yield 4th order polynomial

19031841

Initialize Motion (P1,P2 compatibel with F or E)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Extend motion(compute pose through matches seen in 2 or more previous views)

Extend structure(Initialize new structure,refine existing structure)

Changchang’’’’s SfM code

for iconic graph• uses 5-point+RANSAC for 2-view initialization• uses 3-point+RANSAC for adding views• performs bundle adjustmentFor additional images• use 3-point+RANSAC pose estimation

http://ccwu.me/vsfm/

Rome on a cloudless day(Frahm et al. ECCV 2010)

GIST & clustering (1h35)

SIFT & Geometric verification (11h36)

SfM & Bundle (8h35)

Dense Reconstruction (1h58)

Some numbers

• 1PC

• 2.88M images

• 100k clusters

• 22k SfM with 307k images

• 63k 3D models

• Largest model 5700 images

• Total time 23h53

Hierarchical structure and motion recovery

• Compute 2-view

• Compute 3-view

• Stitch 3-view reconstructions

• Merge and refine reconstruction

F

T

H

PM

Stitching 3-view reconstructions

Different possibilities1. Align (P2,P3) with (P’1,P’2) ( ) ( )-1

23

-1

12H

HP',PHP',Pminarg AA dd +

2. Align X,X’ (and C’C’) ( )∑j

jjAd HX',XminargH

3. Minimize reproj. error ( )

( )∑

+j

jj

j

jj

d

d

x',HXP'

x,X'PHminarg 1-

H

4. MLE (merge) ( )∑j

jjd x,PXminargXP,

Next week: Dense Correspondences

top related