Top Banner
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Computer Vision – Lecture 17 Structure-from-Motion 21.01.2009 Bastian Leibe RWTH Aachen http://www.umic.rwth-aachen.de/multimedia [email protected] Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz
75

Computer Vision – Lecture 17

Feb 11, 2016

Download

Documents

Leiko

Computer Vision – Lecture 17. Structure-from-Motion 21.01.2009. Bastian Leibe RWTH Aachen http://www.umic.rwth-aachen.de/multimedia [email protected]. Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz. TexPoint fonts used in EMF. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Computer Vision – Lecture 17Structure-from-Motion

21.01.2009

Bastian LeibeRWTH Aachenhttp://www.umic.rwth-aachen.de/multimedia

[email protected] slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz

Page 2: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Course Outline• Image Processing Basics• Segmentation & Grouping• Object Recognition• Local Features & Matching• Object Categorization• 3D Reconstruction

Epipolar Geometry and Stereo Basics Camera calibration & Uncalibrated Reconstruction Structure-from-Motion

• Motion and Tracking

2

Page 3: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Recap: A General Point• Equations of the form

• How do we solve them? (always!) Apply SVD

Singular values of A = square roots of the eigenvalues of ATA.

The solution of Ax=0 is the nullspace vector of A. This corresponds to the smallest singular vector of A.

3

Ax 0

11 11 1

1

TN

T

NN N NN

d v v

d v v

A UDV U

SVD

Singular valuesSingular vectors

B. Leibe

Page 4: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Properties of SVD• Frobenius norm

Generalization of the Euclidean norm to matrices

• Partial reconstruction property of SVD Let i i=1,…,N be the singular values of A. Let Ap = UpDpVp

T be the reconstruction of A when we set p+1,…, N to zero.

Then Ap = UpDpVpT is the best rank-p approximation of A in

the sense of the Frobenius norm (i.e. the best least-squares approximation).

4

2

1 1

m n

ijFi j

A a

min( , )

2

1

m n

ii

B. Leibe

Page 5: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Recap: Camera Parameters• Intrinsic parameters

Principal point coordinates Focal length Pixel magnification factors Skew (non-rectangular pixels) Radial distortion

• Extrinsic parameters Rotation R Translation t

(both relative to world coordinate system)

• Camera projection matrix General pinhole camera: 9 DoF CCD Camera with square pixels: 10 DoF General camera: 11 DoF

5B. Leibe

0

0

1 1 1

x x x

y y y

m f p xK m f p y

s s

P K R | t

Page 6: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Recap: Calibrating a CameraGoal• Compute intrinsic and

extrinsic parameters using observed camera data.

Main idea• Place “calibration object”

with known geometry in the scene

• Get correspondences• Solve for mapping from scene

to image: estimate P=PintPext

6B. LeibeSlide credit: Kristen Grauman ? P

Xi

xi

Page 7: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Camera Calibration (DLT Algorithm)

• P has 11 degrees of freedom.• Two linearly independent equations per independent

2D/3D correspondence.• Solve with SVD (similar to homography estimation)

Solution corresponds to smallest singular vector.• 5 ½ correspondences needed for a minimal solution.

7B. LeibeSlide adapted from Svetlana Lazebnik

0pA 0PPP

X0XXX0

X0XXX0

3

2

1111

111

Tnn

TTn

Tnn

Tn

T

TTT

TTT

xy

xy

• Solve with …?

Page 8: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

• Two independent equations each in terms of three unknown entries of X.

• Stack equations and solve with SVD.• This approach nicely generalizes to multiple cameras.

Recap: Triangulation – Linear Algebraic Approach

8B. Leibe

XPxXPx

222

111

0XPx0XPx

22

11

0XP][x0XP][x

22

11

Slide credit: Svetlana Lazebnik

O1 O2

x1x2

X?R1R2

• Stack equations and solve with …?

Page 9: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Epipolar Geometry – Calibrated Case

9B. Leibe

X

x x’

Camera matrix: [I|0]X = (u, v, w, 1)T

x = (u, v, w)T

Camera matrix: [RT | –RTt]Vector x’ in second coord. system has coordinates Rx’ in the first one.

t

The vectors x, t, and Rx’ are coplanar

R

Slide credit: Svetlana Lazebnik

Page 10: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Epipolar Geometry – Calibrated Case

10B. Leibe

X

x x’

Slide credit: Svetlana Lazebnik

Essential Matrix(Longuet-Higgins, 1981)

0)]([ xRtx RtExExT ][with0

Page 11: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Epipolar Geometry – Uncalibrated Case

• The calibration matrices K and K’ of the two cameras are unknown

• We can write the epipolar constraint in terms of unknown normalized coordinates:

11B. Leibe

X

x x’

Slide credit: Svetlana Lazebnik

0ˆˆ xExT xKxxKx ˆ,ˆ

Page 12: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Epipolar Geometry – Uncalibrated Case

12B. Leibe

X

x x’

Slide credit: Svetlana Lazebnik

Fundamental Matrix(Faugeras and Luong, 1992)

0ˆˆ xExT

xKxxKx

ˆˆ

1with0 KEKFxFx TT

Page 13: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

• Problem: poor numerical conditioning

Recap: The Eight-Point Algorithm

13B. Leibe

x = (u, v, 1)T, x’ = (u’, v’, 1)T

Minimize:

under the constraint|F|2 = 1

2

1

)( i

N

i

Ti xFx

Slide credit: Svetlana Lazebnik

Page 14: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Normalized Eight-Point Algorithm1. Center the image data at the origin, and scale it so

the mean squared distance between the origin and the data points is 2 pixels.

2. Use the eight-point algorithm to compute F from the normalized points.

3. Enforce the rank-2 constraint using SVD.

4. Transform fundamental matrix back to original units: if T and T’ are the normalizing transformations in the two images, than the fundamental matrix in original coordinates is TT F T’.

14B. Leibe [Hartley, 1995]Slide credit: Svetlana Lazebnik

11 11 13

22

33 31 33

T

T

d v vF d

d v v

UDV U

SVDSet d33 to zero and

reconstruct F

Page 15: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Comparison of Estimation Algorithms

15B. Leibe

8-point Normalized 8-point Nonlinear least squares

Av. Dist. 1 2.33 pixels 0.92 pixel 0.86 pixel

Av. Dist. 2 2.18 pixels 0.85 pixel 0.80 pixel

Slide credit: Svetlana Lazebnik

Page 16: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Recap: Epipolar Transfer• Assume the epipolar geometry is known• Given projections of the same point in two

images, how can we compute the projection of that point in a third image?

16B. Leibe

x1 x2 x3 l32l31

l31 = FT13 x1

l32 = FT23 x2

Slide credit: Svetlana Lazebnik

Page 17: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Recap: Active Stereo with Structured Light

• Optical triangulation Project a single stripe of laser light Scan it across the surface of the object This is a very precise version of structured light scanning

17B. Leibe

Digital Michelangelo Projecthttp://graphics.stanford.edu/projects/mich/

Slide credit: Steve Seitz

Page 18: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Topics of This Lecture• Structure from Motion (SfM)

Motivation Ambiguity

• Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data

• Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations

• Applications18B. Leibe

Page 19: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Structure from Motion

• Given: m images of n fixed 3D points

xij = Pi Xj , i = 1, … , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij

19

x1j

x2j

x3j

Xj

P1

P2

P3

B. LeibeSlide credit: Svetlana Lazebnik

Page 20: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

What Can We Use This For?

20B. Leibe

• E.g. movie special effects

Video

Video Credit: Stefan Hafeneger

Page 21: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Structure from Motion Ambiguity• If we scale the entire scene by some factor k and,

at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same:

It is impossible to recover the absolute scale of the scene!

21B. Leibe

)(1 XPPXx kk

Slide credit: Svetlana Lazebnik

Page 22: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Structure from Motion Ambiguity• If we scale the entire scene by some factor k

and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same.

• More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change

22B. LeibeSlide credit: Svetlana Lazebnik

QXPQPXx -1 QXPQPXx -1

Page 23: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Reconstruction Ambiguity: Similarity

23B. Leibe

XQPQPXx S-1S

Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Page 24: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Reconstruction Ambiguity: Affine

24B. LeibeSlide credit: Svetlana Lazebnik

XQPQPXx A-1

A

Images from Hartley & Zisserman

Page 25: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Reconstruction Ambiguity: Projective

25B. Leibe

XQPQPXx P-1

P

Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Page 26: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Projective Ambiguity

26B. LeibeSlide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Page 27: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

From Projective to Affine

27B. LeibeSlide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Page 28: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

From Affine to Similarity

28B. LeibeSlide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Page 29: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Hierarchy of 3D Transformations

• With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction.

• Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean.

29B. Leibe

vTvtAProjectiv

e15dof

Affine12dof

Similarity7dofEuclidean6dof

Preserves intersection and tangency

Preserves parallellism, volume ratios

Preserves angles, ratios of length

10tA

T

10tR

T

s

10tR

TPreserves angles, lengths

Slide credit: Svetlana Lazebnik

Page 30: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Topics of This Lecture• Structure from Motion (SfM)

Motivation Ambiguity

• Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data

• Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations

• Applications30B. Leibe

Page 31: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Structure from Motion• Let’s start with affine cameras (the math is

easier)

31B. Leibe

center atinfinity

Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Page 32: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Orthographic Projection• Special case of perspective projection

Distance from center of projection to image plane is infinite

Projection matrix:

32B. LeibeSlide credit: Steve Seitz

Image World

Page 33: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Affine Cameras

33B. Leibe

Orthographic Projection

Parallel Projection

Slide credit: Svetlana Lazebnik

Page 34: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Affine Cameras• A general affine camera combines the effects of

an affine transformation of the 3D space, orthographic projection, and an affine transformation of the image:

• Affine projection is a linear mapping + translation in inhomogeneous coordinates

34B. Leibe

10bA

P1000

]affine44[100000100001

]affine33[ 2232221

1131211

baaabaaa

x

Xa1

a2

bAXx

2

1

232221

131211

bb

ZYX

aaaaaa

yx

Projection ofworld originSlide credit: Svetlana Lazebnik

Page 35: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Affine Structure from Motion• Given: m images of n fixed 3D points:

• xij = Ai Xj + bi , i = 1,… , m, j = 1, … , n

• Problem: use the mn correspondences xij to estimate m projection matrices Ai and translation vectors bi, and n points Xj

• The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom):

• We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity). Thus, we must have 2mn >= 8m + 3n – 12. For two views, we need four point correspondences.

35B. Leibe

1X

Q1X

,Q10bA

10bA 1

Slide credit: Svetlana Lazebnik

Page 36: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Affine Structure from Motion• Centering: subtract the centroid of the image

points

• For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points.

• After centering, each normalized point xij is related to the 3D point Xi by

36B. Leibe

ji

n

kkji

n

kikiiji

n

kikijij

n

nn

XAXXA

bXAbXAxxx

ˆ1

11ˆ

1

11

jiij XAx ˆ

Slide credit: Svetlana Lazebnik

Page 37: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Affine Structure from Motion• Let’s create a 2m × n data (measurement)

matrix:

37B. Leibe

mnmm

n

n

xxx

xxxxxx

D

ˆˆˆ

ˆˆˆˆˆˆ

21

22221

11211

Cameras(2 m)

Points (n)

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.

Slide credit: Svetlana Lazebnik

Page 38: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Affine Structure from Motion• Let’s create a 2m × n data (measurement)

matrix:

• The measurement matrix D = MS must have rank 3!

38B. Leibe

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.

Slide credit: Svetlana Lazebnik

Cameras(2 m × 3)

n

mmnmm

n

n

XXX

A

AA

xxx

xxxxxx

D

212

1

21

22221

11211

ˆˆˆ

ˆˆˆˆˆˆ

Points (3 × n)

Page 39: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Factorizing the Measurement Matrix

39B. LeibeSlide credit: Martial Hebert

Page 40: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Factorizing the Measurement Matrix• Singular value decomposition of D:

40Slide credit: Martial Hebert

Page 41: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Factorizing the Measurement Matrix• Singular value decomposition of D:

41Slide credit: Martial Hebert

Page 42: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Factorizing the Measurement Matrix• Obtaining a factorization from SVD:

42Slide credit: Martial Hebert

Page 43: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Factorizing the Measurement Matrix• Obtaining a factorization from SVD:

43Slide credit: Martial Hebert

This decomposition minimizes|D-MS|2

Page 44: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Affine Ambiguity

• The decomposition is not unique. We get the same D by using any 3×3 matrix C and applying the transformations M → MC, S →C-1S.

• That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axis to be perpendicular, for example). We need a Euclidean upgrade.

44B. LeibeSlide credit: Martial Hebert

Page 45: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Estimating the Euclidean Upgrade• Orthographic assumption: image axes are

perpendicular and scale is 1.

• This can be converted into a system of 3m equations:

45B. Leibe

x

Xa1

a2

a1 · a2 = 0

|a1|2 = |a2|2 = 1

Slide adapted from S. Lazebnik, M. Hebert

1 2 1 2

1 1 1

2 2 2

ˆ ˆ 0 0ˆ 1 1 , 1,...,ˆ 1 1

T Ti i i i

T Ti i i

T Ti i i

a a a CC aa a CC a i ma a CC a

Page 46: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Estimating the Euclidean Upgrade• This can be converted into a system of 3m equations:

• Let

• Then this translates to 3m equations in L

Solve for L Recover C from L by Cholesky decomposition: L = CCT

Update M and S: M = MC, S = C-1S

46B. LeibeSlide adapted from S. Lazebnik, M. Hebert

1 2 1 2

1 1 1

2 2 2

ˆ ˆ 0 0ˆ 1 1 , 1,...,ˆ 1 1

T Ti i i i

T Ti i i

T Ti i i

a a a CC aa a CC a i ma a CC a

1

2

, 1,...,Ti

i Ti

aA i m

a

, 1,...,Ti iA LA I i m

TL CC

Page 47: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Algorithm Summary• Given: m images and n features xij

• For each image i, center the feature coordinates.• Construct a 2m × n measurement matrix D:

Column j contains the projection of point j in all views Row i contains one coordinate of the projections of all the n

points in image i• Factorize D:

Compute SVD: D = U W VT

Create U3 by taking the first 3 columns of U Create V3 by taking the first 3 columns of V Create W3 by taking the upper left 3 × 3 block of W

• Create the motion and shape matrices: M = U3W3

½ and S = W3½ V3

T (or M = U3 and S = W3V3T)

• Eliminate affine ambiguity47

Slide credit: Martial Hebert

Page 48: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Reconstruction Results

48B. Leibe

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992. Slide credit: Svetlana Lazebnik Image Source: Tomasi & Kanade

Page 49: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Dealing with Missing Data• So far, we have assumed that all points are

visible in all views• In reality, the measurement matrix typically

looks something like this:

49B. Leibe

Cameras

Points

Slide credit: Svetlana Lazebnik

Page 50: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Dealing with Missing Data• Possible solution: decompose matrix into

dense sub-blocks, factorize each sub-block, and fuse the results Finding dense maximal sub-blocks of the matrix is

NP-complete (equivalent to finding maximal cliques in a graph)

• Incremental bilinear refinement

50

(1) Perform factorization on a dense sub-block

F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.

Slide credit: Svetlana Lazebnik

Page 51: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Dealing with Missing Data• Possible solution: decompose matrix into

dense sub-blocks, factorize each sub-block, and fuse the results Finding dense maximal sub-blocks of the matrix is

NP-complete (equivalent to finding maximal cliques in a graph)

• Incremental bilinear refinement

51

(1) Perform factorization on a dense sub-block

(2) Solve for a new 3D point visible by at least two known cameras (linear least squares)F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting,

Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.

Slide credit: Svetlana Lazebnik

Page 52: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Dealing with Missing Data• Possible solution: decompose matrix into

dense sub-blocks, factorize each sub-block, and fuse the results Finding dense maximal sub-blocks of the matrix is

NP-complete (equivalent to finding maximal cliques in a graph)

• Incremental bilinear refinement

52

(1) Perform factorization on a dense sub-block

(2) Solve for a new 3D point visible by at least two known cameras (linear least squares)

(3) Solve for a new camera that sees at least three known 3D points (linear least squares)F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting,

Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.

Slide credit: Svetlana Lazebnik

Page 53: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Comments: Affine SfM• Affine SfM was historically developed first.• It is valid under the assumption of affine

cameras. Which does not hold for real physical cameras… …but which is still tolerable if the scene points are

far away from the camera.

• For good results with real cameras, we typically need projective SfM. Harder problem, more ambiguity Math is a bit more involved…

(Here, only basic ideas. If you want to implement it, please look at the H&Z book for details).

53B. Leibe

Page 54: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Topics of This Lecture• Structure from Motion (SfM)

Motivation Ambiguity

• Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data

• Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations

• Applications54B. Leibe

Page 55: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Projective Structure from Motion

• Given: m images of n fixed 3D points

xij = Pi Xj , i = 1, … , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij

55

x1j

x2j

x3j

Xj

P1

P2

P3

B. LeibeSlide credit: Svetlana Lazebnik

Page 56: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Projective Structure from Motion• Given: m images of n fixed 3D points

• zij xij = Pi Xj , i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij

• With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q:

X → QX, P → PQ-1

• We can solve for structure and motion when 2mn >= 11m +3n – 15

• For two cameras, at least 7 points are needed.56B. LeibeSlide credit: Svetlana Lazebnik

Page 57: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Projective SfM: Two-Camera Case• Assume fundamental matrix F between the two

views First camera matrix: [I|0]Q-1

Second camera matrix: [A|b]Q-1

• Let , then • And

• So we have57B. Leibe

[ ]z z x A I | 0 X b Ax b

QXX~

z z x b Ax b

( ) ( )z z x b x Ax b x

0][T AxbxAbF ][

[ ] , [ | ]z z x I | 0 X x A b X

b: epipole (FTb = 0), A = –[b×]FSlide adapted from Svetlana Lazebnik

0 ( )z Ax b x

F&P sec. 13.3.1

Page 58: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Projective SfM: Two-Camera Case• This means that if we can compute the

fundamental matrix between two cameras, we can directly estimate the two projection matrices from F.

• Once we have the projection matrices, we can compute the 3D position of any point X by triangulation.

• How can we obtain both kinds of information at the same time?

58B. Leibe

Page 59: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Projective Factorization

• If we knew the depths z, we could factorize D to estimate M and S.

• If we knew M and S, we could solve for z.• Solution: iterative approach (alternate between

above two steps).59B. Leibe

n

mmnmnmmmm

nn

nn

zzz

zzzzzz

XXX

P

PP

xxx

xxxxxx

D

212

1

2211

2222222121

1112121111

Cameras(3 m × 4)

Points (4 × n)

D = MS has rank 4

Slide credit: Svetlana Lazebnik

Page 60: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Sequential Structure from Motion• Initialize motion from two images

using fundamental matrix• Initialize structure• For each additional view:

Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration

60B. Leibe

Cam

eras

Points

Slide credit: Svetlana Lazebnik

Page 61: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Sequential Structure from Motion• Initialize motion from two images

using fundamental matrix• Initialize structure• For each additional view:

Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration

Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation

61B. Leibe

Cam

eras

Points

Slide credit: Svetlana Lazebnik

Page 62: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

• Initialize motion from two images using fundamental matrix

• Initialize structure• For each additional view:

Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration

Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation

• Refine structure and motion: bundle adjustment

Sequential Structure from Motion

62B. Leibe

Cam

eras

Points

Slide credit: Svetlana Lazebnik

Page 63: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Bundle Adjustment• Non-linear method for refining structure and

motion• Minimizing mean-square reprojection error

63B. Leibe

2

1 1

,),(

m

i

n

jjiijDE XPxXP

x1j

x2j

x3j

Xj

P1

P2

P3

P1Xj

P2XjP3Xj

Slide credit: Svetlana Lazebnik

Page 64: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Bundle Adjustment• Seeks the Maximum Likelihood (ML) solution

assuming the measurement noise is Gaussian.• It involves adjusting the bundle of rays between

each camera center and the set of 3D points.• Bundle adjustment should generally be used as the

final step of any multi-view reconstruction algorithm. Considerably improves the results. Allows assignment of individual covariances to each

measurement.

• However… It needs a good initialization. It can become an extremely large minimization problem.

• Very efficient algorithms available.64B. Leibe

Page 65: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Projective Ambiguity• If we don’t know anything about the camera or

the scene, the best we can get with this is a reconstruction up to a projective ambiguity Q. This can already be useful. E.g. we can answer questions like

“at what point does a line intersect a plane”?

• If we want to convert this to a “true” reconstruction, we need a Euclidean upgrade. Need to put in additional knowledge

about the camera (calibration) orabout the scene (e.g. from markers).

Several methods available (see F&P Chapter 13.5 or H&Z Chapter 19)

65B. Leibe Images from Hartley & Zisserman

Page 66: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Self-Calibration• Self-calibration (auto-calibration) is the

process of determining intrinsic camera parameters directly from uncalibrated images.

• For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images. Compute initial projective reconstruction and find 3D

projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri | ti].

• Can use constraints on the form of the calibration matrix: square pixels, zero skew, fixed focal length, etc.

66B. LeibeSlide credit: Svetlana Lazebnik

Page 67: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Practical Considerations (1)

1.Role of the baseline Small baseline: large depth error Large baseline: difficult search problem

• Solution Track features between frames until baseline is sufficient.

67B. Leibe

Large BaselineSmall Baseline

Slide adapted from Steve Seitz

Page 68: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Practical Considerations (2)2. There will still be many outliers

Incorrect feature matches Moving objects

Apply RANSAC to get robust estimates based on the inlier points.

3. Estimation quality depends on the point configuration Points that are close together

in the image produce less stablesolutions.

Subdivide image into a grid and tryto extract about the same number offeatures per grid cell.

68B. Leibe

Page 69: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

General Guidelines• Use calibrated cameras wherever possible.

It makes life so much easier, especially for SfM.

• SfM with 2 cameras is far more robust than with a single camera. Triangulate feature points in 3D using stereo. Perform 2D-3D matching to recover the motion. More robust to loss of scale (main problem of 1-camera SfM).

• Any constraint on the setup can be useful E.g. square pixels, zero skew, fixed focal length in each

camera E.g. fixed baseline in stereo SfM setup E.g. constrained camera motion on a ground plane Making best use of those constraints may require adapting

the algorithms (some known results are described in H&Z).

69B. Leibe

Page 70: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Topics of This Lecture• Structure from Motion (SfM)

Motivation Ambiguity

• Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data

• Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations

• Applications70B. Leibe

Page 71: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Commercial Software Packages• boujou

(http://www.2d3.com/)• PFTrack

(http://www.thepixelfarm.co.uk/)• MatchMover

(http://www.realviz.com/)• SynthEyes

(http://www.ssontech.com/)• Icarus

(http://aig.cs.man.ac.uk/research/reveal/icarus/)• Voodoo Camera Tracker

(http://www.digilab.uni-hannover.de/)

71B. Leibe

Page 72: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

boujou demo

(We have a license available, so if you want to try it for interesting projects, contact us.)

72B. Leibe

Page 73: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9

Applications: Matchmoving

73B. Leibe

• Putting virtual objects into real-world videosOriginal sequence Tracked featuresSfM results Final video

Videos from Stefan Hafeneger

Page 74: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9Another Example: The Campanile Movie

74Video from SIGGRAPH’97 Animation Theatrehttp://www.debevec.org/Campanile/#movie

Page 75: Computer Vision – Lecture 17

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Com

pute

r Vi

sion

WS

08/0

9References and Further Reading• A (relatively short) treatment of affine and

projective SfM and the basic ideas and algorithms can be found in Chapters 12 and 13 of

• More detailed information (if you reallywant to implement this) and betterexplanations can be found in Chapters 10, 18 (factorization) and 19 (self-calibration) of

B. Leibe 75

D. Forsyth, J. Ponce,Computer Vision – A Modern Approach.Prentice Hall, 2003

R. Hartley, A. ZissermanMultiple View Geometry in Computer Vision2nd Ed., Cambridge Univ. Press, 2004