In Defense of 3D-Label Stereo Carl Olsson, Johannes Ul´ en Yuri Boykov Centre for Mathematical Sciences, Lund University, Sweden Department of Computer Science, University of Western Ontario, Canada Overview It is commonly believed that higher order smoothness should be modeled using higher order interactions. For example, 2nd-order derivatives for deformable (active) contours are represented by triple cliques. Similarly, the 2nd-order regularization methods in stereo predominantly use MRF models with scalar (1D) disparity labels and triple clique interactions. In this paper we advocate a largely overlooked alternative approach to stereo where 2nd-order surface smoothness is represented by pairwise interactions with 3D-labels, e.g. tangent planes. This general paradigm has been criticized due to perceived computational complexity of optimization in higher-dimensional label space. Contrary to popular beliefs, we demonstrate that representing 2nd-order surface smoothness with 3D labels leads to simpler optimization problems with (nearly) submodular pairwise interactions. Our theoretical and experimental results demonstrate advantages over state-of-the-art methods for 2nd-order smoothness stereo. 1 Code available at http://www.maths.lth.se/∼ulen/ Compared to other works Woodford et. al. [1] Li and Zucker [2] This paper • 1D-labels (disparity/depth) • 3D - la bels (tan gent planes) • 3D - la bels (tan gent planes) • Triple cliques approximate 2nd - or der deriva tives • Local precomputed tangents 2nd-order regularization • Pairwise cliques approximate 2nd - or der deriva tive • Reduction to QPBO fu sion moves • Belief propagation • QPBO fu sions moves • Hard QPBO problem • No solution guarantees • Submodularity properties lead to simpler problems • Generalization to higher order interactions Background We assign each pixel p a tangent plane. From the tangent planes it is straight forward to extract a cor- responding disparity or depth estimate. The underlying energy function is optimized by performing fusion moves on proposed solutions (proposals). Definition 1. Let D (p) be the disparity at pixel p. Furthermore let T p D : I 7→ R define the tangent at the point p seen as a function of the whole image, that is T p D (x)= D (p)+ ∇D (p) T (x - p). (1) We define a regularization between neighboring pixels as V pq = |T p D (q ) -D (q ) |. (2) V pq measures the curve’s deviation from the tangent plane. Using the Taylor expansion D (q ) ≈D (p)+ ∇D (p) T (q - p)+ 1 2 (q - p) T ∇ 2 D (p)(q - p), (3) where ∇ 2 D (p) is the Hessian at p, we see that V pq ≈| 1 2 (q - p) T ∇ 2 D (p)(q - p)|. (4) That is, V pq measures the second derivative at p in the direction q - p of the underlying disparity function. I q p V pq D (p) D (q ) T p D (q ) I q p d(p)p h d(q )q h V pq T p D (q ) q h Fig. 1: Left, Rectified cameras : Geometric interpretation of the smoothness term for parallel viewing rays. Right, Regular cameras : Smoothness term when the viewing rays are not parallel. To make the energy discontinuity preserving we add a threshold t to the interaction, E pq (D , P ) := min(V pq (D , P ),t). (5) Theoretical results Proposition 2. If the proposal P is a plane then the fusion with any function D is a submodular move for both E pq and V pq . Proof. Since P is a plane we have T p P (q )= P (q ) (6) and therefore V pq (P , P ) = 0. Furthermore, V pq (D , D )= T p D (q ) -D (q ) (7) = T p D (q ) -P (q )+ T p P (q ) -D (q ) (8) ≤ T p D (q ) -P (q ) + T p P (q ) -D (q ) (9) = V pq (D , P )+ V pq (P , D ) (10) which shows that submodularity, V pq (D , D )+ V pq (P , P ) ≤ V pq (P , D )+ V pq (D , P ), (11) holds. The proof for E pq is given in the paper. u t Proposition 3. If both D and P are convex (or alternatively both concave) between p and q then the interactions V pq and V qp are submodular for the fusion move. Generalization to higher dimensional labels Label Pairwise Interaction Unary Term Submodular Proposals Depth 1st derivative Depth Constant functions Tangent plane 2nd derivative Depth, 1st derivative Constant 1st derivative 2nd-order approximation 3rd derivative Depth, 1st, 2nd derivative Constant 2nd derivative . . . . . . . . . . . . Fig. 2: Characterization of pairwise interactions, unary terms and submodular proposals for different types of labels. Results Image Only data term With regularization. Fig. 3: Result using regular cameras, picture of Skansen Lejonet in Gothenburg, Sweden. Image Only data term With regularization. Fig. 4: Result using regular cameras, picture of ¨ Orebro castle, Sweden. (a) Image (b) Our (c) Woodford (d) Woodford 1op (e) Ground truth (b) Our unlabelled (c) Woodford unlabelled (d) Woodford 1op unlabelled Fig. 5: Result using rectified cameras. (b-d) are estimated disparity maps after fusing the 14 SegPln proposals. In (f-h) we present the unlabelled variables summed over all 14 proposals scaled 0–14. A white pixel would mean that fusing a proposal for this pixel failed for every single proposal. Tsukuba Venus Teddy Cones Our 0.065 % 0.0264 % 0.127 % 0.0847 % Woodford 30.0 % 30.6 % 27.6 % 27.3 % Woodford 1op 0 % 0% 0% 0.0411 % Fig. 6: Unlabelled for the 14 SegPln proposals on Middlebury. Tsukuba Venus Teddy Cones Average Our 21.3 25.5 29.4 36.5 28.2 Woodford 106 139 143 181 142 Woodford /Our 4.96 5.47 4.87 4.96 5.07 Fig. 7: Running time (s) using the convergence criteria in Woodford [1]. Tsukuba Venus Teddy Cones Average Non occ All Disc Non occ All Disc Non occ All Disc Non occ All Disc Our 4.49 5.52 12.3 0.298 0.648 3.99 7.71 11.2 17.8 9.78 15.4 18.3 8.95 Woodford 4.83 5.99 13.9 0.536 0.921 6.39 8.16 11.8 19.3 9.74 15.6 18.4 9.63 Fig. 8: Scores on Middlebury using the same proposals, lower is better. All values are % of pixels being ≥ 1 pixel incorrect for each of the three classes. The classes are non occluded regions, all pixels and regions near depth discontinuities. References [1] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon, “Global stereo reconstruction under second order smoothness priors,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. [2] G. Li and S. Zucker, “Differential geometric inference in surface stereo,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 1, pp. 72–86, 2010. IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013