Top Banner
Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado Slides by: Gary Bradski, Intel Research and Stanford SAIL
44

Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Stanford CS223B Computer Vision, Winter 2006

Lecture 8 Structure From Motion

Professor Sebastian ThrunCAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado

Slides by: Gary Bradski, Intel Research and Stanford SAIL

Page 2: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion

camera

features

Recover: structure (feature locations), motion (camera extrinsics)

Page 3: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion (1)

[Tomasi & Kanade 92]

Page 4: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion (2)

[Tomasi & Kanade 92]

Page 5: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion (3)

[Tomasi & Kanade 92]

Page 6: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion (4a): Images

Marc Pollefeys

Page 7: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion (4b)

Marc Pollefeys

Page 8: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion

Problem 1:– Given n points pij =(xij, yij) in m images

– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)

– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)

Problem 2:– Establish correspondence: c(pij)

Page 9: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

SFM: General Formulation

iz

jz

jy

jx

ii

ii

ii

ii

iy

ix

jz

jy

jx

ii

ii

ii

ii

ii

ii

jy

jx

b

P

P

P

b

b

P

P

P

fp

p

,

,

,

,

,

,

,

,

,

,

,

cossin0

sincos0

001

cos0sin

010

sin0cos

100

cossin0

sincos0

001

cos0sin

010

sin0cos

0cossin

0sincos

fZ Z

fXx

XO

-x

Page 10: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

SFM: Bundle Adjustment

min

cossin0

sincos0

001

cos0sin

010

sin0cos

100

cossin0

sincos0

001

cos0sin

010

sin0cos

0cossin

0sincos

2

,

,

,

,

,

,

,

,

,

,

,

,

ji

iz

jz

jy

jx

ii

ii

ii

ii

iy

ix

jz

jy

jx

ii

ii

ii

ii

ii

ii

jy

jx

b

P

P

P

b

b

P

P

P

fp

p

fZ Z

fXx

XO

-x

Page 11: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Bundle Adjustment

SFM = Nonlinear Least Squares problem Minimize through

– Gradient Descent– Conjugate Gradient– Gauss-Newton– Levenberg Marquardt (!)

Prone to local minima

Page 12: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Count # Constraints vs #Unknowns

m camera poses n points 2mn point constraints 6m+3n unknowns

Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???

Page 13: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

How Many Parameters Can’t We Recover?

0 3 6 7 8 10 12 n m nm

Place Your Bet!

We can recover all but…

Page 14: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Count # Constraints vs #Unknowns

m camera poses n points 2mn point constraints 6m+3n unknowns

Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???

– Can’t recover origin, orientation (6 params)– Can’t recover scale (1 param)

Thus, we need 2mn 6m + 3n - 7

Page 15: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Are done?

No, bundle adjustment has many local minima.

Page 16: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

The “Trick Of The Day”

Replace Perspective by Orthographic Geometry Replace Euclidean Geometry by Affine Geometry Solve SFM linearly (“closed” form, globally optimal) Post-Process to make solution Euclidean Post-Process to make solution perspective

By Tomasi and Kanade, 1992

Page 17: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Orthographic Camera Model

Limit of Pinhole Model:

z

y

x

z

y

x

z

y

x

b

b

b

P

P

P

aaa

aaa

aaa

p

p

p

333231

232221

131211

Extrinsic Parameters

Rotation

Orthographic Projection bAPb

b

P

P

P

a

a

a

a

a

a

p

p

y

x

Z

Y

X

y

x

23

13

22

12

21

11

Page 18: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Orthographic Projection

Limit of Pinhole Model:

Orthographic Projection

1||

1||

0

22

21

21

a

a

aa

rotation is

333231

232221

131211

aaa

aaa

aaa

ijij bPAp

featurejcamerai

bAPb

b

P

P

P

a

a

a

a

a

a

p

p

y

x

Z

Y

X

y

x

23

13

22

12

21

11

Page 19: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

The Orthographic SFM Problem

}{ and },{recover jPii bA

ijij bPAp featurejcamerai 1||

1||

0

22

21

21

a

a

aa

subject to

Page 20: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

The Affine SFM Problem

}{ and },{recover jPii bA

ijij bPAp featurejcamerai 1||

1||

0

22

21

21

a

a

aa

subject todrop theconstraints

Page 21: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Count # Constraints vs #Unknowns

m camera poses n points 2mn point constraints 8m+3n unknowns

Suggests: need 2mn 8m + 3n But: Can we really recover all parameters???

ijij bPAp featurejcamerai

Page 22: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

How Many Parameters Can’t We Recover?

0 3 6 7 8 10 12 n m nm

Place Your Bet!

We can recover all but…

Page 23: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

The Answer is (at least): 12

ijij bPAp ''' ijij bPAp

dCPCP jj11'

CAA ii '

iii bdAb 'singular-non , Cd

iijij bdAdCPCCAp ))(( :Proof 11

iji bPA

iiiji bdAdAPA

Page 24: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Points for Solving Affine SFM Problem

m camera poses n points

Need to have: 2mn 8m + 3n-12

Page 25: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Affine SFM

jij PAp

Fix coordinate systemby making p0=origin

m

j

p

p

q 1

mA

A

A 1

jj APqm :cameras

ADQn :points

NPPD 1

mn

n

m p

p

p

p

Q

1

1

11

ijij bPAp

Proof:

3m2 size has A

Rank Theorem: Q has rank 3

nD 3 size has

Page 26: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

The Rank Theorem

3rank has

1

1

1

1

11

11

Nyy

Nxx

Nyy

Nxx

MM

MM

pp

pp

pp

pp

n elements

2m

ele

me

nts

Page 27: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Singular Value Decomposition

T

Nyy

Nxx

Nyy

Nxx

VWU

pp

pp

pp

pp

MM

MM

1

1

1

1

11

11

n332 m 33

Page 28: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Affine Solution to Orthographic SFM

structure affine TWV

positions camera affine U

Gives also the optimal affine reconstruction under noise

Page 29: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Back To Orthographic Projection

1||

1||

0

sConstraint

22

21

21

a

a

aa

matrix singular -non , vector Cd

with

Find C and d for which constraints are metSearch in 12-dim space (instead of 8m + 3n-12)

''' ijij bPAp

dCPCP jj11'

ii CAA '

iii bdAb '

Page 30: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Back To Projective Geometry

Orthographic (in the limit)

Projective

Page 31: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Back To Projective Geometry

min

cossin0

sincos0

001

cos0sin

010

sin0cos

100

cossin0

sincos0

001

cos0sin

010

sin0cos

0cossin

0sincos

2

,

,

,

,

,

,

,

,

,

,

,

,

ji

iz

jz

jy

jx

ii

ii

ii

ii

iy

ix

jz

jy

jx

ii

ii

ii

ii

ii

ii

jy

jx

b

P

P

P

b

b

P

P

P

fp

p

fZ Z

fXx

XO

-x

Optimize

Using orthographic solution as starting point

Page 32: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

The “Trick Of The Day”

Replace Perspective by Orthographic Geometry Replace Euclidean Geometry by Affine Geometry Solve SFM linearly (“closed” form, globally optimal) Post-Process to make solution Euclidean Post-Process to make solution perspective

By Tomasi and Kanade, 1992

Page 33: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Structure From Motion

Problem 1:– Given n points pij =(xij, yij) in m images

– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)

– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)

Problem 2:– Establish correspondence: c(pij)

Page 34: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

The Correspondence Problem

View 1 View 3View 2

Page 35: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Correspondence: Solution 1

Track features (e.g., optical flow)

…but fails when images taken from widely different poses

Page 36: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Correspondence: Solution 2

Start with random solution A, b, P Compute soft correspondence: p(c|A,b,P) Plug soft correspondence into SFM Reiterate

See Dellaert/Seitz/Thorpe/Thrun, Machine Learning Journal, 2003

Page 37: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Example

Page 38: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Results: Cube

Page 39: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Animation

Page 40: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Tomasi’s Benchmark Problem

Page 41: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Reconstruction with EM

Page 42: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

3-D Structure

Page 43: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Correspondence: Alternative Approach

Ransac [Fisher/Bolles]

= Random sampling and consensus

Page 44: Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.

Sebastian Thrun Stanford University CS223B Computer Vision

Summary SFM

Problem– Determine feature locations (=structure)– Determine camera extrinsic (=motion)

Two Principal Solutions– Bundle adjustment (nonlinear least squares, local minima)– SVD (through orthographic approximation, affine geometry)

Correspondence– (RANSAC)– Expectation Maximization