Computer Vision October 2002 L1.1 © 2002 by Davi Geiger Binocular Stereo Binocular Stereo Left Image Right Image
Dec 21, 2015
Computer Vision October 2002 L1.1© 2002 by Davi Geiger
Binocular Stereo
Binocular Stereo
Left Image Right Image
Computer Vision October 2002 L1.2© 2002 by Davi Geiger
Each potential match is represented by a square. The black ones represent the most likely scene to “explain” the image, but other combinations could have given rise to the same image (e.g., red)
Stereo Correspondence: Ambiguities
What makes the set of black squares preferred/unique is that they have similar disparity values, the ordering constraint is satisfied and there is a unique match for each point. Any other set that could have given rise to the two images would have disparity values varying more, and either the ordering constraint violated or the uniqueness violated. The disparity values are inversely proportional to the depth values
Computer Vision October 2002 L1.3© 2002 by Davi Geiger
Rig
ht
boundary
no m
atc
h
Boundary no matchLeft
depth discontinuity
Surface orientation
discontinuity
A BC
DE F
AB
A
CD
DC
F
FE
Stereo Correspondence: Matching Space
F D C B A
AC
D
E
F
Computer Vision October 2002 L1.4© 2002 by Davi Geiger
Smoothness or similar depth values: In nature most surfaces are smooth compared to their distance to the observer, but depth discontinuities also occur. Uniqueness: Given a point in the left image there will be only one point in the right image to match, i.e. there should be only one disparity value associated to each point. Ordering Constraint (Monotonicity): Points to the right of ql match points to
the right of qr. In the matching space this implies a monotonic non-decreasing
curve to represent the matches.
Stereo Correspondence: Constraints
w=2
w=-2w=0
w=4
Left Epipolar Line
Right Epipolar Line
w=2
w=-2w=0
w=4
Left
Right
Computer Vision October 2002 L1.5© 2002 by Davi Geiger
Cooperative Stereo Algorithm: Data
C0(e,j,t) Є [0,1] representing how good is a match between a point (e,j) in the left
image and a point (e,t) in the right image (t= j+dj , where dj is the disparity at j.)
The epipolar lines are indexed by e. ntjefor
teIjeIteIjeIVtjeC RLRL
,...,1,,
)5,,,(ˆ)5,,,(ˆ,)5,0,,(ˆ)5,0,,(ˆmin),,(0
otherwise0
2 if)exp(
)(
xxxV
In order to account for occlusions, we extend the matrix C0(e,j,t) to include
elements for j=0 and t=0, representing the total mismatch of a pixel (a half-occlusion), e.g., if C0(e,0,t)=1 then pixel (e,t) is likely to be half occluded.
),,(max1)0,,(),,(max1),0,( 0),...,(00),...,(0 tjeCjeCtjeCteC DjDjtDtDtj
Computer Vision October 2002 L1.6© 2002 by Davi Geiger
The stereovision algorithm produces a series of matrices Cn, which converges to a
good solution for many cases, with 0 <
but such an update excludes t=0 and j=0 nodes. The positive feedback is given by the two neighbors of node (e,j,t) with matches at the same disparity d=t-j.
Cooperative Stereo: Smoothing and Limit Disparity
)1()1,1,()1,1,(2
1)1(),,(),,(),,( 01 tjeCtjeCtjeCtjeCtjeC nnnn
w=2
w=-2w=0
w=4
Left, j
Right, t D=4
D=4
The matrix is updated only within a range of disparity : 2D+1
i.e.
The rational is:
(i) Less computations
(ii) Larger disparity matches imply larger errors in 3D estimation.
Djt ||
Djt ||
Computer Vision October 2002 L1.7© 2002 by Davi Geiger
We use Sinkhorn algorithm to normalize Cn along j and along t simultaneously
and produce a double stochastic matrix Cn (sum over row and columns add up
to 1). We index Cn by and loop for k (typically 6 times)
DjttforwtjeCjeC
tjeCtjeC Dtj
Dtjwnn
nn
||&0),,()0,,(
),,(),,(1
DtjjfortwjeCteC
tjeCtjeC Djt
Djtwnn
nn
||&0),,(),0,(
),,(),,(1
Cooperative Stereo: Uniqueness
),,(0 tjeC kn
Computer Vision October 2002 L1.8© 2002 by Davi Geiger
)2(
|)(|)1,1,(max
,|)(|)1,1,(maxmax
|)(|)1,1,(max
,|)(|)1,1,(maxmax
2
)1(
),,(),,(),,(
4/1),...,1(
4/1),...,0(
4/1),...,1(
4/1),...,0(
01
wVtwjeC
wVwtjeC
wVtwjeC
wVwtjeC
tjeCtjeCtjeC
DnDtjw
DnDjtw
DnDjtw
DnDtjw
nn
Cooperative Stereo: Smoothing and Discontinuities
w=2
w=-2w=0
w=4
Left Epipolar Line
Right Epipolar Line
j-1 j j+1
t-1 tt+1
Note that each term in (2) has been normalized to 1 so that 0 < .where
otherwise0
2 if)4
exp()(4/1
DxD
xxV D
Computer Vision October 2002 L1.9© 2002 by Davi Geiger
Cooperative Stereo: Epipolar Lines
)3(
|)(|),,1(max
|)(|),,1(max
|)(|),,1(max
|)(|),,1(max
4
|)(|)1,1,(max
|),(|)1,1,(maxmax
|)(|)1,1,(max
|),(|)1,1,(maxmax
2
)1(
),,(),,(),,(
4/1),...,(
4/1),...,(
4/1),...,(
4/1),...,(
4/1),...,1(
4/1),...,0(
4/1),...,1(
4/1),...,0(
01
wjtVtwteC
wjtVtwteC
wjtVwjjeC
wjtVwjjeC
wVtwjeC
wVwtjeC
wVtwjeC
wVwtjeC
tjeCtjeCtjeC
DnDDw
DnDDw
DnDDw
DnDDw
DnDtjw
DnDjtw
DnDjtw
DnDtjw
nn
Computer Vision October 2002 L1.10© 2002 by Davi Geiger
Cyclopean Coordinate System
Let us assume N >> D, typical in stereo images. Then, for efficiency, the simplest representation for C(e,j,t) is C(e,x,w+D), with an increase in resolution (subpixel),
x=t+j/2 and w=t-j, with w varying in the range (-D, …, D), and x varying in the range (1, 1.5, …, N-0.5, N), subpixel accuracy. This is known as the cyclopean coordinate system. We can recover (j,t) from (x,w) via t= (2x +w)/2 and j = (2x – w)/2 .
x occluded units
x+vxx xx
x
x
x
x
x
x
x
Hypothesis: match at blue circle “ ” and blue “x”, i.e., horizontal jump of 4 units (v=2.5) along x.
w=-4
w=0
w=4
Left Epipolar Line
Right Epipolar Line
t=5
x=t+j/2
w=t-j/2 t+1
t-1
x
x
x xx xx
x x xx
x x xx xx
x x xx xxx
x x xx xxx
x x xx xxx
x x xx xxx
x x x xxx
x
x
x
xx
x
w
Computer Vision October 2002 L1.11© 2002 by Davi Geiger
Cyclopean Coopeative Stereo
end
unitocclusionDwxeCDxeC
end
end
teIDjeIDV
DwxeC
oddwxelse
teIjeIteIjeIV
DwxeC
evenwxif
wxjwxt
DDwfor
NxNefor
www
RL
RLRL
)(),,(max1)12,,(
)5,0,,2
1,(ˆ)5,0,
2
1,(ˆ
),,(
?)2(
)5,,,(ˆ)5,,,(ˆ,)5,0,,(ˆ)5,0,,(ˆmin
),,(
?)2(
2/2;2/2
,...,
;6,...,6;6,...,6
0),...,(0
0
0
The initialization of C0(e,x,w+D) can now use the gradient information at subpixel resolution and be written as
where C0(e,x,w+D) is of size N x 2N x 2D+1 and typically, since N >> D, this is much smaller than N x N x N .
Computer Vision October 2002 L1.12© 2002 by Davi Geiger
)(),,1(max
)(),,1(max
2
)12,1,(),1,(max
),12,1,(),1,(max
),,1,(
max
)12,1,(),1,(max
),12,1,(),1,(max
),,1,(
max
2
)1(
),,(),,(),,(
4/1),...,(
4/1),...,(
120),...,1(
10),...,1(
10),...,1(
10),...,1(
01
vwVDvxeC
vwVDvxeC
DrxeCDvwvxeC
DrxeCDvwvxeC
DwxeC
DrxeCDvwvxeC
DrxeCDvwvxeC
DwxeC
DwxeCDwxeCDwxeC
DnDDv
DnDDv
nv
rnwDw
nvrnwDv
n
nvrnwDv
nvrnwDv
n
nn
We can update Cn as follows:
Cyclopean Coopeative Stereo (cont.)
D
Dvnn
nn
nDDwn
DvxeCDxeC
DwxeCDwxeC
DDDwfor
DwxeCDxeC
),,()12,,(
),,(),,(
1,,...,
),,(max1)12,,(
1
1),(1
The occlusion units and normalization becomes simpler as we focus on each x coordinate to obtain it.
Computer Vision October 2002 L1.13© 2002 by Davi Geiger
o o o o o o o
o o o o o o o
o o o o o o o
o o o o o o o
o o o o o o o
o o o o o o o
o o o o o o oj-1 j j+1
t+1
t
t-1
Computer Vision October 2002 L1.14© 2002 by Davi Geiger
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o j-1/2 j j+1/2
t+1
t+1/2
t
t-1/2
t-1
w
w=2
Computer Vision October 2002 L1.15© 2002 by Davi Geiger
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o
x x x x x x x
o o o o o o o j-1/2 j j+1/2
t+1
t+1/2
t
t-1/2
t-1
w
w=2
occluded units
Hypothesis: match at orange unit (“o” marked) followed by another match at the other orange unit (“x” marked), i.e., horizontal jump of 3 units (v=4) along x.
Computer Vision October 2002 L1.16© 2002 by Davi Geiger
)5,2
,,1(ˆ)7,2
3,,(ˆ),,(
)5,2
,,1(ˆ)7,2
3,,(ˆ),,(
,
,
teIteIVtjeV
jeIjeIVtjeV
RRR
LLL