Shape and Motion under Varying Illumination: Unifying Structure from Motion, Photometric Stereo, and Multi-view Stereo Li Zhang* Brian Curless* Aaron Hertzmann** Steven M. Seitz* *University of Washington, Seattle, WA, USA **University of Toronto, Toronto, ON, Canada Introduction Example Dense shape reconstruction of moving objects under varying illumination from a single video. A hand-held figurine rotating in front of a fixed camera under static lighting. Goal Standard methods Our solution Optical flow under varying illumination ... ... ... => Contributions Using both spatial and temporal brightness variations ♦dense structure from motion ♦stereo matching under lighting changes ♦photometric stereo for moving scenes ^ | ^ | ^ | ^ | dense surface reconstruction in both textured and textureless regions spatial brightness variation motion cue surface position temporal brightness variation photometric cue surface orientation } ^ | Mathematical Formualtion Brightness-varying flow: I t (x t,p ) = γ t,p · I 0 (x 0,p ) Assume a Lambertian object moving rigidly in front of an orthographic camera under static distant light. Multi-point multi-frame optical flow can be computed by minimizing: Φ({x t,p , γ t,p })= Σ ( I t (x t,p ) - γ t,p · I 0 (x 0,p ) ) 2 t,p light camera frame 0 n p s p relative light and camera motion x 0,p object surface camera frame t x t,p light Let l t and b t be the directional and ambient light at frame t, and n p and α p be the normal and albedo of the point p, then I t (x t,p ) = α p · ( l t T n p + b t ) Image Formation: Irradiance ratio: γ t,p = = l 0 T n p + b 0 l t T n p + b t I t (x t,p ) I 0 (x 0,p ) only feature points only constant lighting only static objects ♦Structure from Motion ♦Multi-view Stereo ♦Photometric Stereo R t , o t Reconstruction algorithm Results Conclusion Reconstruction of the figurine Three example frames. Optical flow is ambiguous within constant brightness regions; however, the temporal brightness variations uniquely determine surface orientation. Structure from Motion Multi-view Stereo Photometric Stereo X, Y R x , O x , R y , O y , Γ=1 Γ, constant X, Y R x , O x , R y , O y , S S N, L Known Unnown ♦ Motion cue constrains 3D positions inaccurately for low textured pixels ♦ Photometric cue reveals normals accurately even for moving scenes, despite the noisy motion estimation ♦ Combing both cues recovers moving shape densely Constraints on brightness-varying flow Our formulation extends [Irani99] and applies to features, edges, and textureless regions. Our formulation also subsumes Structure from Motion, Multi-view Stereo, and Photometric Stereo as special cases: Initialization Track features, compute R x , O x , R y , O y , estimate feature normals, and initialize L. Step1 Fix R x , O x , R y , O y , and L, and update S and N; Step2 Correct S with large uncertainties by integrating N; Step3 Fix R x , O x , R y , O y , and S, and update L. The algorithm is implemented in a coarse-to-fine manner, and two iterations are computed at each level of detail. The coarse-to-fine reconstruction, using both motion and photometric cues Reconstruction using only motion cue A profile view of the reconstruction The profile view with recovered albedo map Reconstruction of a nearly textureless object Rank 3 constraint on {x t,p , y t,p } [Tomasi&Kanade90]: Rank 4 constraint on {γ t,p } [Basri&Jacob01]: Consistency constraint on S and N: δS δx ⊥ N, δS δy ⊥ N. δS δx ⊥ N, δS δy ⊥ N. Multi-point multi-frame brightness-varying optical flow s.t. X = R x S+O x , Y = R y S+O y , Γ = LN, min Φ(X, Y, Γ29 Reconstruction using only motion cues Reconstruction using both motion and photometric cues Y = R y S + O y , similarly. X = = R x S + O x = … … s p [ ] + o x1 o x1 o xT o xT … … … … [ ] … r xt T [ ] … … x 1,1 x 1,P x T,1 x T,P … … … … [ ] … where β p = l 0 T n p +b 0 . Γ = = LN = γ 1,1 γ 1,P γ T,1 γ T,P … … … … … [ ] [ ] … … [ ] n p β p β p 1 … … ~ l t T b t … … … …