Top Banner
3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang Zhu, City College of New York [email protected]
28

3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing 3D Vision3D Vision

Topic 4 of Part II Visual Motion

CSc I6716Fall 2011

Cover Image/video credits: Rick Szeliski, MSR

Zhigang Zhu, City College of New York [email protected]

Page 2: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Outline of Motion Outline of Motion n Problems and Applications

l The importance of visual motionl Problem Statement

n The Motion Field of Rigid Motionl Basics – Notations and Equationsl Three Important Special Cases: Translation, Rotation and Moving Planel Motion Parallax

n Optical Flowl Optical flow equation and the aperture probleml Estimating optical flowl 3D motion & structure from optical flow

n Feature-based Approachl Two-frame algorithml Multi-frame algorithml Structure from motion – Factorization method

n Advanced Topics l Spatio-Temporal Image and Epipolar Plane Imagel Video Mosaicing and Panorama Generationl Motion-based Segmentation and Layered Representation

Page 3: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video ComputingThe Importance of Visual MotionThe Importance of Visual Motionn Structure from Motion

l Apparent motion is a strong visual clue for 3D reconstructionn More than a multi-camera stereo system

n Recognition by motion (only) l Biological visual systems use visual motion to infer properties of

3D world with little a priori knowledge of itn Blurred image sequence

n Visual Motion = Video ! [Go to CVPR 2004-2010 Sites for Workshops]l Video Coding and Compression: MPEG 1, 2, 4, 7…l Video Mosaicing and Layered Representation for IBRl Surveillance (Human Tracking and Traffic Monitoring)l HCI using Human Gesture (video camera)l Image-based Renderingl …

Page 4: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Blurred SequenceBlurred Sequence

An up-sampling from images of resolution 15x20 pixels

From: James W. Davis. MIT Media Lab

Recognition by Actions: Recognize object from motion even if we cannot distinguish it in any images …

Page 5: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Problem StatementProblem Statement

n Two Subproblemsl Correspondence: Which elements of a frame correspond to which

elements in the next frame?l Reconstruction :Given a number of correspondences, and possibly

the knowledge of the camera’s intrinsic parameters, how to recovery the 3-D motion and structure of the observed world

n Main Difference between Motion and Stereol Correspondence: the disparities between consecutive frames are

much smaller due to dense temporal samplingl Reconstruction: the visual motion could be caused by multiple

motions ( instead of a single 3D rigid transformation)n The Third Subproblem, and Fourth….

l Motion Segmentation: what are the regions the the image plane corresponding to different moving objects?

l Motion Understanding: lip reading, gesture, expression, event…

Page 6: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing ApproachesApproaches

n Two Subproblemsl Correspondence:

n Differential Methods - >dense measure (optical flow)n Matching Methods -> sparse measure

l Reconstruction : More difficult than stereo since n Motion (3D transformation betw. Frames) as well as

structure needs to be recoveredn Small baseline causes large errors

n The Third Subprobleml Motion Segmentation: Chicken and Egg problem

n Which should be solved first? Matching or Segmentation Segmentation for matching elements Matching for Segmentation

Page 7: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video ComputingThe Motion Field of Rigid ObjectsThe Motion Field of Rigid Objects

n Motion: l 3D Motion ( R, T):

n camera motion (static scene) n or single object motion n Only one rigid, relative motion between the camera and the scene

(object)

l Image motion field: n 2D vector field of velocities of the image points induced by the

relative motion.n Data: Image sequence

l Many framesn captured at time t=0, 1, 2, …

l Basics: only consider two consecutive framesn We consider a reference frame and its consecutive frame

l Image motion field n can be viewed disparity map of the two frames captured at two

consecutive camera locations ( assuming we have a moving camera)

Page 8: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video ComputingThe Motion Field of Rigid ObjectsThe Motion Field of Rigid Objectsn Notations

l P = (X,Y,Z)T: 3-D point in the camera reference frame

l p = (x,y,f)T : the projection of the scene point in the pinhole camera

n Relative motion between P and the cameral T= (Tx,Ty,Tz)T: translation component of

the motionl =(w wx, wy,wz)T: the angular velocity

n Note:l How to connect this with stereo geometry

(with R, T)?l Image velocity v= ?

PpZ

f

PωTV

p

OX

P V

fZ

Y v

Page 9: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video ComputingThe Motion Field of Rigid ObjectsThe Motion Field of Rigid Objectsn Notations

l P = (X,Y,Z)T: 3-D point in the camera reference frame

l p = (x,y,f)T : the projection of the scene point in the pinhole camera

n Relative motion between P and the cameral T= (Tx,Ty,Tz)T: translation component of

the motionl =(w wx, wy,wz)T: the angular velocity

n Note:l How to connect this with stereo geometry

(with R, T)?

PpZ

f

PωTV

PTVPP

0

0

0

xy

xz

yz

TPP

1

1

1

xy

xz

yz

coscoscossinsinsincossinsincossincos

cossincoscossinsinsinsincoscossinsin

sinsincoscoscos

R

Page 10: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video ComputingBasic Equations of Motion FieldBasic Equations of Motion Field

n Notes:l Take the time derivative of both

sides of the projection equation

l The motion field is the sum of two componentsn Translational part n Rotational part

l Assume known intrinsic parameters

)(2

PVv zVZZ

f

PpZ

fPωTV

z

y

x

z

y

x

y

x

T

T

T

yf

xf

Zfxxyfy

fyfxxyfv

v

0

01)(122

22

Rotation part: no depth information

Translation part: depth Z

Page 11: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Motion Field vs. DisparityMotion Field vs. Disparity

n Correspondence and Point Displacements

Stereo Motion

Disparity Motion field

Displacement – (dx, dy) Differential concept – velocity (vx, vy), i.e. time derivative (dx/dt, dy/dt)

No such constraint Consecutive frame close to guarantee good discrete approximation

Page 12: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Special Case 1: Pure TranslationSpecial Case 1: Pure Translationn Pure Translation (w =0)

n Radial Motion Field (Tz <> 0)l Vanishing point p0 =(x0, y0)T :

n motion directionl FOE (focus of expansion)

n Vectors away from p0 if Tz < 0 l FOC (focus of contraction)

n Vectors towards p0 if Tz > 0l Depth estimation

n depth inversely proportional to magnitude of motion vector v, and also proportional to distance from p to p0

n Parallel Motion Field (Tz= 0)l Depth estimation:

n depth inversely proportional to magnitude of motion vector v

z

y

x

y

x

T

T

T

yf

xf

Zv

v

0

01

0

0

yy

xx

Z

Tv

vz

y

x

y

x

z T

T

T

f

y

x

0

0

20

20 )()( yyxx

TZ z

v

y

x

y

x

T

T

Z

fv

v

Tz =0

22yx TT

fZ

v

Page 13: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Special Case 2: Pure Rotation Special Case 2: Pure Rotation

n Pure Rotation (T =0)l Does not carry 3D information

n Motion Field (approximation)l Small motionl A quadratic polynomial in image

coordinates (x,y,f)T

n Image Transformation between two frames (accurate)l Motion can be largel Homography (3x3 matrix) for all points

n Image mosaicing from a rotating camera l 360 degree panorama

z

y

x

y

x

fxxyfy

fyfxxyfv

v

22

22 )(1

RPP '

PpZ

f

Rpp '

''

'' PpZ

f

Page 14: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Special Case 3: Moving PlaneSpecial Case 3: Moving Planen Planes are common in the man-made world

n Motion Field (approximation)l Given small motion

l a quadratic polynomial in image

n Image Transformation between two frames (accurate)l Any amount of motion (arbitrary)l Homography (3x3 matrix) for all pointsl See Topic 5 Camera Models

n Image Mosaicing for a planar scenel Aerial image sequencel Video of blackboard

dZf

fnynxn zyx )(

dPnT

z

y

x

z

y

x

y

x

T

T

T

yf

xf

Zfxxyfy

fyfxxyfv

v

0

01)(122

22

App '

Only has 8 independent parameters (write it out!)

Page 15: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Special Cases: A SummarySpecial Cases: A Summary

n Pure Translationl Vanishing point and FOE (focus of expansion)l Only translation contributes to depth estimation

n Pure Rotationl Does not carry 3D informationl Motion field: a quadratic polynomial in image, or l Transform: Homography (3x3 matrix R) for all pointsl Image mosaicing from a rotating camera

n Moving Planel Motion field is a quadratic polynomial in image, orl Transform: Homography (3x3 matrix A) for all pointsl Image mosaicing for a planar scene

Page 16: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Motion ParallaxMotion Parallax

n [Observation 1] The relative motion field of two instantaneously coincident pointsl Does not depend on the rotational component of motionl Points towards (away from) the vanishing point of the

translation direction

n [Observation 2] The motion field of two frames after rotation compensation l only includes the translation component l points towards (away from) the vanishing point p0 ( the

instantaneous epipole)l the length of each motion vector is inversely proportional to

the depth, and also proportional to the distance from point p to the vanishing point p0 of the translation direction

l Question: how to remove rotation? n Active vision : rotation known approximately?

Page 17: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Motion ParallaxMotion Parallax

n [Observation 1] The relative motion field of two instantaneously coincident pointsl Does not depend on the rotational component of motionl Points towards (away from) the vanishing point of the

translation direction (the instantaneous epipole)

Epipole (x0, y0)

At instant t, three pairs of points happen to be coincident

The difference of the motion vectors of each pair cancels the rotational components

. … and the relative motion field point in ( towards or away from) the VP of the translational direction (Fig 8.5 ???)

0

0

xx

yy

v

v

x

y

Page 18: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Motion ParallaxMotion Parallaxn [Observation 2] The motion field of two frames after

rotation compensation

l only includes the translation component

l points towards (away from) the vanishing point p0 ( the instantaneous epipole)

l the length of each motion vector is inversely proportional to the depth,

l and also proportional to the distance from point p to the vanishing point p0 of the translation direction (if Tz <> 0)

Question: how to remove rotation? n Active vision : rotation known approximately?n Rotation compensation can be done by image

warping after finding three (3) pairs of coincident points

0

0

xx

yy

v

vTx

Ty

FOEp0

pv

20

20 )()( yyxx

Z

Tz v

Page 19: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing SummarySummary

n Importance of visual motion (apparent motion)l Many applications…l Problems:

n correspondence, reconstruction, segmentation, understanding in x-y-t space

n Image motion field of rigid objectsl Time derivative of both sides of the projection equation

n Three important special casesl Pure translation – FOE l Pure rotation – no 3D information, but lead to mosaicingl Moving plane – homography with arbitrary motion

n Motion parallax l Only depends on translational component of motion

Page 20: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing

n Next lecture

Page 21: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Notion of Optical FlowNotion of Optical Flow

n The Notion of Optical Flow l Brightness constancy equation

n Under most circumstance, the apparent brightness of moving objects remain constant

l Optical Flow Equationn Relation of the apparent motion

with the spatial and temporal derivatives of the image brightness

n Aperture probleml Only the component of the motion

field in the direction of the spatial image gradient can be determined

l The component in the direction perpendicular to the spatial gradient is not constrained by the optical flow equation

0),,(

dt

tyxdE

0 tyx EvEuE

?

Page 22: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Estimating Optical FlowEstimating Optical Flow

n Constant Flow Methodl Assumption: the motion field is well approximated by a

constant vector within any small region of the image planel Solution: Least square of two variables (u,v) from NxN

Equations – NxN (=5x5) planar patchl Condition: ATA is NOT singular (null or parallel gradients)

n Weighted Least Square Methodl Assumption: the motion field is approximated by a constant

vector within any small region, and the error made by the approximation increases with the distance from the center where optical flow is to be computed

l Solution: Weighted least square of two variables (u,v) from

NxN Equations – NxN patch n Affine Flow Method

l Assumption: the motion field is well approximated by a affine parametric model uT = ApT+b (a plane patch with arbitrary orientation)

l Solution: Least square of 6 variables (A,b) from NxN

Equations – NxN planar patch

Page 23: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Using Optical FlowUsing Optical Flown 3D motion and structure from optical flow (p 208- 212)

l Input: n Intrinsic camera parametersn dense motion field (optical flow) of single rigid motion

l Algorithm n ( good comprise between ease of implementation and quality of results)n Stage 1: Translation direction

Epipole (x0, y0) through approximate motion parallax Key: Instantaneously coincident image points Approximation: estimating differences for ALMOST coincident image points

n Stage 2: Rotation flow and Depth Knowns: flow vector, and direction of translational component One point, one equation (without depth)–

w Least square approximation of the rotational component of flow From motion field to depth

l Outputn Direction of translation (f Tx/Tz, f Ty/Tz, f) = (x0, y0, f)n Angular velocity n 3-D coordinates of scene points (up to a common unknown scale)

y

x

z T

T

T

f

y

x

0

0

0

0

xx

yy

v

vTx

Ty

Page 24: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Some Details Some Details

n Step 1. Get (Tx, Ty, Tz) = s (x0,y0,f)n Step 2. For every point (x,y,f) with known v, get one

equation about w from the motion equation (by eliminate Z since it’s different from point to point)

n Step 3. Get Z (up to a scale s) given T/s and w

z

y

x

z

y

x

y

x

T

T

T

yf

xf

Zfxxyfy

fyfxxyfv

v

0

01)(122

22

Rotation part: no depth information

Translation part: depth Z

Page 25: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Feature-Based ApproachFeature-Based Approach

n Two frame method - Feature matchingl An Algorithm Based on the Constant Flow Method

n Features – corners detection by observing the coefficient matrix of the spatial gradient evaluation (2x2 matrix ATA)

n Iteration approach: estimation – warping – comparison

n Multiple frame method - Feature trackingl Kalman Filter Algorithm

n Estimating the position and uncertainty of a moving feature in the next frame

n Two parts: prediction (from previous trajectory) and measurement from feature matching

n Using a sparse motion field l 3D motion and structure by feature tracking over framesl Factorization method

n Orthographic projection modeln Feature tracking over multiple framesn SVD

Page 26: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing Motion-Based SegmentationMotion-Based Segmentation

n Change Detectionl Stationary camera(s), multiple moving subjectsl Background modeling and updatingl Background subtractionl Occlusion handling

n Layered representation (I)– rotating cameral Rotating camera + Independent moving objectsl Sprite - background mosaicingl Synopsis – foreground object sequences

n Layered representation (II)– translating (and rotating) cameral Arbitrary camera motion l Scene segmentation into layers

Page 27: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing SummarySummary

n After learning motion, you should be able tol Explain the fundamental problems of motion analysisl Understand the relation of motion and stereol Estimate optical flow from a image sequencel Extract and track image features over time l Estimate 3D motion and structure from sparse motion

fieldl Extract Depth from 3D ST image formation under

translational motionl Know some important application of motion, such as

change detection, image mosaicing and motion-based segmentation

Page 28: 3D Computer Vision and Video Computing 3D Vision Topic 4 of Part II Visual Motion CSc I6716 Fall 2011 Cover Image/video credits: Rick Szeliski, MSR Zhigang.

3D Computer Vision

and Video Computing NextNext

n Reviews, Exam and Projects

Exam&

Project Presentations

n Homework #4 due in May 03, 2011 before class