Top Banner
Signal Processing: Image Communication 11 (1998) 205–230 Disparity eld and depth map coding for multiview 3D image generation 1 Dimitrios Tzovaras * , Nikos Grammalidis, Michael G. Strintzis Electrical and Computer Engineering Department, Information Processing Laboratory, Aristotle University of Thessaloniki, Thessaloniki 54006, Greece Received 8 August 1996 Abstract In the present paper techniques are examined for the coding of the depth map and disparity elds for stereo or multiview image communication applications. It is assumed that both the left and right channels of the multiview image sequence are coded using block- or object-based methods. A dynamic programming algorithm is used to estimate a disparity eld between each stereo image pair. Depth is then estimated and occlusions are optionally detected, based on the estimated disparity elds. Spatial interpolation techniques are examined based on the disparity= depth information and the detection of occluded regions using either stereoscopic or trinocular camera congurations. It is seen that the presence of a third camera at the transmitter site improves the estimation of disparities, the detection of occlusions and the accuracy of the resulting spatial interpolation at the receiver. Various disparity eld and depth map coding techniques are then proposed and evaluated, with emphasis given to the quality of the resulting intermediate images at the receiver site. Block-based and wireframe modeling techniques are examined for the coding of isolated depth or disparity map information. Further, 2D and 3D motion compensation techniques are evaluated for the coding of sequences of depth or disparity maps. The motion elds needed may be available as a byproduct of block-based or object-based coding of the intensity images. Experimental results are given for the evaluation of the performance of the proposed coding and spatial interpolation methods. c 1998 Elsevier Science B.V. Keywords: Disparity= depth estimation; Multiview 3D image generation; Spatial interpolation; Disparity eld and depth map coding 1. Introduction Depth understanding is an important element of enhanced perception and telepresence in image communica- tion [2, 15]. Stereo vision [4, 23] provides a direct way of inferring the depth information by using two images (stereo pair) destined for the left and right eye, respectively. More than two images are used in multiview * Corresponding author. E-mail: [email protected]; tel.: (+30-31) 996-359; fax: (+30-31) 996-398. 1 This work was supported by the European Commission ACTS projects AC092 (PANORAMA) and AC057 (VIDAS). 0923-5965/98/$19.00 c 1998 Elsevier Science B.V. All rights reserved. PII S0923-5965(97)00029-5
26

Disparity eld and depth map coding for multiview 3D image generation1

May 16, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Disparity eld and depth map coding for multiview 3D image generation1

Signal Processing: Image Communication 11 (1998) 205–230

Disparity �eld and depth map coding for multiview3D image generation1

Dimitrios Tzovaras∗, Nikos Grammalidis, Michael G. StrintzisElectrical and Computer Engineering Department, Information Processing Laboratory, Aristotle University of Thessaloniki,

Thessaloniki 54006, Greece

Received 8 August 1996

Abstract

In the present paper techniques are examined for the coding of the depth map and disparity �elds for stereo or multiviewimage communication applications. It is assumed that both the left and right channels of the multiview image sequenceare coded using block- or object-based methods. A dynamic programming algorithm is used to estimate a disparity �eldbetween each stereo image pair. Depth is then estimated and occlusions are optionally detected, based on the estimateddisparity �elds. Spatial interpolation techniques are examined based on the disparity=depth information and the detectionof occluded regions using either stereoscopic or trinocular camera con�gurations. It is seen that the presence of a thirdcamera at the transmitter site improves the estimation of disparities, the detection of occlusions and the accuracy of theresulting spatial interpolation at the receiver. Various disparity �eld and depth map coding techniques are then proposedand evaluated, with emphasis given to the quality of the resulting intermediate images at the receiver site. Block-basedand wireframe modeling techniques are examined for the coding of isolated depth or disparity map information. Further,2D and 3D motion compensation techniques are evaluated for the coding of sequences of depth or disparity maps. Themotion �elds needed may be available as a byproduct of block-based or object-based coding of the intensity images.Experimental results are given for the evaluation of the performance of the proposed coding and spatial interpolationmethods. c© 1998 Elsevier Science B.V.

Keywords: Disparity=depth estimation; Multiview 3D image generation; Spatial interpolation; Disparity �eld and depthmap coding

1. Introduction

Depth understanding is an important element of enhanced perception and telepresence in image communica-tion [2, 15]. Stereo vision [4, 23] provides a direct way of inferring the depth information by using two images(stereo pair) destined for the left and right eye, respectively. More than two images are used in multiview

∗ Corresponding author. E-mail: [email protected]; tel.: (+30-31) 996-359; fax: (+30-31) 996-398.1 This work was supported by the European Commission ACTS projects AC092 (PANORAMA) and AC057 (VIDAS).

0923-5965/98/$19.00 c© 1998 Elsevier Science B.V. All rights reserved.PII S0923-5965(97)00029 -5

Page 2: Disparity eld and depth map coding for multiview 3D image generation1

206 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

Fig. 1. The multiview image sequence coder. Uses spatial interpolation at the receiver on the basis of intensity images.

systems where a number of views of a scene from di�erent viewpoints are provided. In such systems, theobserver is able to watch the scene from various optical angles.In other applications, the generation of intermediate images is needed even with simple monoscopic displays

at the receiver. For example, simulated eye-contact is known to enhance the ‘telepresence’ which is desirablein advanced videoconferencing schemes [17].For the communication of the intermediate images, the options include:

(a) Spatial interpolation at the transmitter: The position of the observer is communicated to the coder whichresponds by transmitting the appropriate view. This implies that a single spatially interpolated view isbroadcast to all spectators. Furthermore, a delay intervenes between the change of the observer positionand the change of view. This delay may be nonnegligible and have an unpleasant e�ect.

(b) Spatial interpolation at the receiver on the basis of intensity images alone: In this mode, it is requiredthat the receiver be a powerful workstation with the ability to perform both disparity estimation andspatial interpolation from two or more intensity images (Fig. 1).

(c) Spatial interpolation at the receiver on the basis of intensity images and transmitted disparity or depthmaps: In this mode, the spatial interpolation is easily e�ected, hence a simpler receiver may be used.O�setting this, is the cost of the disparity or depth map transmission which may be considerable (Fig. 2).This mode is also advantageous when the transmitter has available cameras providing more views thanare available at the receiver. Then the quality of the computed disparity or depth �elds may be assessedbefore the transmission, and occlusions may be identi�ed, thus enabling the transmission of only reliabledepth or disparity information as well as occlusion maps. As a result, the quality of the intermediateimages produced at the receiver site is improved, as veri�ed by visual inspection, as well as in terms ofPSNR (when the latter is measurable, as when a true intermediate image is available).

The purpose of the present paper is the investigation of the requirements and potential advantages of a systemimplementing the third above option. This implementation consists of the following successive stages:(a) Disparity estimation. A depth map which gives the distance of each imaged 3D point from each camera

center may be subsequently produced.(b) Coding for the transmission of the sequences of either the disparity �elds or depth maps.(c) Spatial interpolation generating all intermediate views by using the transmitted pair of intensity images

and the available disparity �eld or depth map.Each one of the above is a separate task, which in theory may be carried out independently of the others.However in practice, the outcome of each stage will obviously a�ect the next.Techniques for disparity estimation were recently examined in [3, 10, 22, 25, 30], while depth may be evalu-

ated either from the estimated disparity �elds or by using object motion analysis in monoscopic images [32].Disparity estimation techniques based on segmentation and 3-D modeling of objects from multiview imagesequences were proposed in [1, 19]. The disparity estimation technique adopted in this paper is based on

Page 3: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 207

Fig. 2. The multiview image sequence coder. Uses spatial interpolation at the receiver on the basis of intensity images and depthmaps.

a dynamic programming algorithm similar to that proposed in [9]. A hierarchical version of this algorithm isemployed so as to speed up its execution. If the transmission of depth maps is preferred, the depth map isevaluated directly from the disparity �eld.The dense disparity=depth information required for spatial interpolation, which is performed at the de-

coder site, must then be coded for transmission. For the coding of isolated (I-frame) depth maps, JPEGcoding or a wireframe modeling technique may be used. Isolated disparity �elds likewise may be codedusing JPEG or subband techniques. If motion compensation is used for the coding of intensity images, themotion �elds may also be used for the interframe compression of depth maps or disparity �elds. More specif-ically, in several stereo and multiview image communication systems proposed in the literature, MPEG or theS-MPEG (proposed as an MPEG standard for the coding of stereoscopic image sequences [33, 34]) encodersuse full or partial transmission of sparse block-based 2D motion �elds [21, 33]. Interframe depth map ordisparity �eld encoding may then be based on the motion information available in the MPEG or S-MPEGbitstream. Likewise, in region-based or object-based stereo [6, 14, 28, 29, 35] and multiview encoders=decoders,the available 3D object-based motion information may also be used for the e�cient coding of depth maps ordisparity �elds.Techniques for spatial interpolation have recently attracted considerable attention [7, 8, 18] for use in emerg-

ing applications such as virtual reality and enhanced-telepresence videophone and video-conference systems[16, 24]. Techniques for the synthesis of intermediate views based on both motion and disparity analysis wereexamined in [7, 8].In this paper, various spatial interpolation techniques are developed based on the availability of the left–right

(LR) and=or the right–left (RL) disparity �eld or the depth map corresponding to one of the stereo images.Improved results are obtained if occlusion-inconsistency maps corresponding to the left and right views arealso available and are incorporated in the spatial interpolation methodology. Further, improved versions ofthe disparity estimation and occlusion detection algorithms are considered for combination with the proposedspatial interpolation techniques, when a central camera image is also available to the transmitter.The paper is organised as follows. In Section 2 the camera geometry of the multiview system is described.

Section 3 presents the hierarchical dynamic programming technique used for disparity=depth estimation. InSection 4 techniques are examined for spatial interpolation depending on the availability of the depth map orone or two disparity �elds. An occlusion-inconsistency detection algorithm is described in Section 5, while inSection 6 improved disparity estimation and occlusion detection techniques are examined in the special case ofa trinocular camera system. Sections 7–9 describe techniques for e�cient intra- and inter-frame disparity=depthcoding. Experimental results are given in Section 10 in order to compare the performance of the describedspatial interpolation techniques when using coded depth or disparity �elds.

Page 4: Disparity eld and depth map coding for multiview 3D image generation1

208 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

2. Geometric description

It will be assumed that the pick up equipment consists of cameras with converging optical axes and that allgeometrical parameters (baseline distance B, focal length f and convergence angle �) are known, or estimatedusing a camera calibration technique [26]. The camera con�guration is shown in Fig. 3. At the receiver, theintermediate view will be generated on the basis of two transmitted intensity images and one transmitted depthmap or disparity �eld.The relation between the coordinates (xl; yl; zl) and (xr ; yr ; zr) of a 3D point, as seen from the coordinate

systems attached to the left and the right intensity images available at the receiver is

xlylzl

=

cos � 0 − sin �

0 1 0sin � 0 cos �

xryrzr

+

B cos 12�0B sin 12�

: (1)

By assuming a perspective projection, the same 3D point projects into the left and right channel image planesat points (Xl; Yl; f) and (Xr ; Yr ; f), respectively, where

[XlYl

]=fzl

[xlyl

]and

[XrYr

]=fzr

[xryr

]: (2)

By combining Eqs. (1) and (2) we obtain a relation between the projections of the 3D point in the two imageplanes if the depth zr of the point is known:

Xl = fcos � Xr − f sin �+ B cos(�=2) (f=zr)sin � Xr + f cos �+ B sin(�=2) (f=zr)

; Yl = fYr

sin � Xr + f cos �+ B sin(�=2) (f=zr): (3)

In order to construct an intermediate view by a virtual camera located at a baseline distance B′ from theleft camera with convergence angle �′ (see Fig. 3) the following relation is used for the evaluation of the

Fig. 3. Convergent camera geometry. Left, right and intermediate views.

Page 5: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 209

intermediate (Xi; Yi) pixel position:

Xi = fcos �′Xr − f sin �′ + B′ cos(�′=2) [f=zr]sin �′Xr + f cos �′ + B′ sin(�′=2) [f=zr]

; Yi = fYr

sin �′Xr + f cos �′ + B′ sin(�′=2) (f=zr): (4)

In the following we will denote as dUV (X; Y ) for U; V : {R;L; I} the disparity vector at the position (X; Y )in the camera U that points to the camera V . Following this notation, the RL disparity dRL = (dRLx ; d

RLy )

corresponding to a speci�c point (Xr ; Yr) in the right channel image is then given by

dRLx (Xr ; Yr),Xl − Xr = f cos � Xr − f sin �+ B cos(�=2) [f=zr]sin � Xr + f cos �+ B sin(�=2) [f=zr]− Xr ; (5)

dRLy (Xr ; Yr), Yl − Yr = f Yrsin � Xr + f cos �+ B sin(�=2) [f=zr]

− Yr : (6)

The intermediate-left (IL) disparity d IL = (dILx ; dILy ) between the intermediate and the left channel image is

given by

dILx ,Xl − Xi; dILy , Yl − Yi: (7)

Similarly, the corresponding LR disparity dLR = (dLRx ; dLRy ) corresponding to the point (Xl; Yl) in the leftchannel image is

dLRx (Xl; Yl),Xr − Xl = f cos � Xl + f sin �− B cos(�=2) [f=zl]− sin � Xl + f cos �+ B sin(�=2) [f=zl] − Xl; (8)

dLRy (Xl; Yl), Yr − Yl = f Yl− sin � Xl + f cos �+ B sin(�=2) [f=zl] − Yl: (9)

In this case, the intermediate pixel position (Xi; Yi) is estimated based on depth zl:

Xi = fcos �′ Xl + f sin �′ − B′ cos(�′=2) [f=zl]− sin �′ Xl + f cos �′ + B′ sin(�′=2) [f=zl] ; Yi = f

Yl− sin �′ Xl + f cos �′ + B′ sin(�′=2) [f=zl] :

(10)

Also, the intermediate–right (IR) disparity d IR = (dIRx ; dIRy ) between the intermediate and the right channel

image is given by

dIRx ,Xr − Xi; dIRy , Yr − Yi: (11)

3. Disparity and depth estimation using hierarchical dynamic programming

A dynamic programming algorithm, minimising a combined cost function for two corresponding lines ofthe stereoscopic image pair, was used for the estimation of each of the RL, LR disparity �elds. The basicalgorithm adapts the results of [9, 20] using blocks rather than pixels. Furthermore, a novel hierarchical versionof this algorithm was implemented so as to speed up its execution. The cost function takes into considerationthe displaced frame di�erence (DFD) as well as the smoothness of the resulting vector �eld in the followingway:Due to the epipolar line constraint [2] the search area for each pixel (k; i) of the right channel image is

the interval Ski = {(l; j) : k − dx6l6k + dx; i− dy6j6i+ dy} in the left channel image, where d = (dx; dy)is the maximum allowed disparity and dy � dx. If (l; j) ∈ Ski is a matching pixel in left channel image to

Page 6: Disparity eld and depth map coding for multiview 3D image generation1

210 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

pixel (k; i) and Vki is the vector that corresponds to this matching, the following cost function is minimisedwith respect to Vki for each line k of the right channel image:

Ck =∑

(i∈line k)

∑((l; j)∈Sk i)

(DFD(Vki) + � SMF(Vki)): (12)

The �rst term in (12) contains the absolute di�erence of two corresponding image intensity blocks, centeredat the working pixels (k; i) and (l; j) in the right and left channel images, respectively:

DFD(Vki) =∑

(x; y)∈W

‖ Ir(k + x; i + y)− Il(l+ x; j + y)‖; (13)

where W is a rectangular window.The second term is the smoothing function,

SMF(Vki)=N∑n=1

‖Vki − Vn‖R(Vn); (14)

where Vn; n = 1; : : : ; N , are neighbouring vectors to Vki. Multiplication with the reliability function R(Vn)relaxes the smoothing weight, keeping only the �rst term active in regions where the matching reliability ishigh. More speci�cally,

R(Vn) ={0 if disparity vector is reliable,

1 if disparity vector is not reliable.

The disparity vector is considered reliable whenever it corresponds to a pixel on an edge or in a highlytextured area. For the detection of edges and textured areas a variant of the technique in [12] was used, basedon the observation that highly textured areas exhibit high local intensity variance in all directions while onedges the intensity variance is higher across the direction of the edge. The dynamic programming algorithmsearches for the optimum path minimising the cost function (12).A hierarchical version of this approach was utilized in order to speed up the estimation process and to

produce a smooth disparity �eld without discontinuities. In this version, the dynamic programming algorithmis applied at the coarse resolution level and an initial estimate for the disparity vectors is produced. Thedisparity information is then propagated to the next resolution level where it is corrected so that the costfunction is further minimised. This process is iterated until full resolution is achieved.The depth zr , giving the distance of the imaged point from the optical center of the right camera, may be

estimated from disparity values using Eqs. (5), (6) and the least squares techniques as follows:

zr = (ATA)−1ATB =a[0] ∗ b[0] + a[1] ∗ b[1]

a[0]2 + a[1]2; (15)

where

A=

[a[0]

a[1]

]=

[(Xr + dx)(Xr sin(�) + f cos(�))− f(Xr cos(�)− f sin(�))

(Yr + dy) (Xr sin(�) + f cos(�))− fYr

]; (16)

and

B=

[b[0]

b[1]

]=

[Bf (f cos(�=2)− (Xr + dx) sin(�=2))

−Bf(Yr + dy) sin(�=2)

]; (17)

where B; f; � are the parameters de�ned in Section 2.

Page 7: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 211

4. Generation of intermediate views

Intermediate views are generated on the basis of dense depth or disparity maps such as those produced usingthe dynamic programming disparity estimation method described in Section 3. Depending on the application,the depth map or one or both of the forward (RL) and reverse (LR) disparity �elds may be transmitted.As will be seen, in order to generate intermediate images based on the transmitted depth maps, the camera

parameters must be known at the decoder. If a parameter is dynamically modi�ed, e.g. the focal lengthduring a zoom process, these changes must also be transmitted. However, when one or two disparity �eldsare transmitted, the pick-up parameters are not required in the interpolation procedure. In fact, the onlyparameters needed in this case for the production of the intermediate images are the baseline ratios whichmay be determined at the decoder, based on the speci�cations of the display system.

4.1. Depth map transmission

It is assumed that both left and right channel images and the depth zr with respect to the optical center ofthe right channel image are available. Since in general the depth zi of the pixel (Xi; Yi) of the intermediateimage is unknown to the receiver, it is not possible to use the equivalents of (3) to estimate the pixel in theright view corresponding to (Xi; Yi). Proceeding in the converse way, for each pixel (Xr ; Yr) in the right viewthe corresponding position in the intermediate view (Xi; Yi) is found using

(Xi; Yi) = (r(Xi); r(Yi)); (18)

where Xi, Yi are found using Eq. (4) and r(c) rounds c to the nearest integer. Then, the luminance correspond-ing to this position is estimated from either one or both the left and right channel images. The latter approachassigns to the pixel in the intermediate image a weighted average of the luminances of the correspondingpixels in the left and right images, i.e.,

Ii(Xi; Yi) = (1− k)Il(Xl(RL); Yl(RL)) + kIr(Xr ; Yr); (19)

where

(XlRL; YlRL) = (r(Xl); r(Yl)); (20)

and (Xl; Yl) are found from (3). In all above equations Il; Ii; Ir denote the luminances at in the left, intermediateand right channel image, respectively. If no occlusion information is available, the parameter k may be simplychosen as the baseline ratio B′=B.The intermediate view thus produced, contains ‘holes’, i.e. areas not covered by the pixels generated by

(18). These may be covered by use of interpolation. Since a luminance-based interpolation would resultto undesirable blurring of the resulting image, the approach used in this paper is based on disparity-basedinterpolation. Speci�cally, IL disparities corresponding to ‘non-hole’ positions in the intermediate view areinitially calculated using Eqs. (4) and (7). Then, IL disparity vectors corresponding to ‘hole’ positions arecalculated using linear interpolation between the disparity vectors at the two nearest pixels in the same scanline– to the left and to the right of the pixel under consideration – that have already been assigned a disparityvalue. The dense disparity �eld thus produced, is used to estimate the intensities at the ‘hole’ positions (Xi; Yi)of the intermediate image using

Ii(Xi; Yi) = Il(r(Xi + dILx ); r(Yi + dILy )): (21)

Page 8: Disparity eld and depth map coding for multiview 3D image generation1

212 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

4.2. Transmission of a single disparity �eld

If instead of the depth map, one (e.g. the right–left (RL)) disparity �eld is available, for each pixel (Xr ; Yr)in the right view, the corresponding position in the intermediate view (Xi; Yi) is found using

(Xi; Yi) = (r(Xr + kdRLx ); r(Yr + kdRLy )); (22)

where dRLx and dRLy are the components of the RL disparity vector. Then, the luminance corresponding to thisposition is estimated from (19) as above with

(XlRL; YlRL) = (Xr + dRLx ; Yr + dRLy ): (23)

Again, the intermediate view thus produced, contains ‘holes’, i.e. areas not covered by the pixels generatedby (22). These may again be covered by use of interpolation. Speci�cally, IL disparity vectors correspondingto ‘non-hole’ positions in the intermediate view are initially calculated using

dILx (Xi; Yi) = kdRLx (Xr ; Yr); dILy (Xi; Yi) = kd

RLy (Xr ; Yr); (24)

where again k = B′=B. Then, disparity vectors corresponding to ‘hole’ positions are calculated using linearinterpolation as was previously described. The disparity �eld thus produced, is used to estimate the intensitiesat the ‘hole’ positions of the intermediate image using (21).

4.3. Transmission of two disparity �elds

Improved performance of the spatial interpolation technique, both in terms of visual inspection as well asin terms of PSNR, when measurable, may be achieved if an additional ‘reverse’ disparity vector �eld, i.e. theLR disparity �eld, is available. In this case, two sets of luminance estimates for the pixels in the intermediateimage can be formed by using both disparity �elds. The �rst estimate is formed using (21) and (19) as aboveand the second estimate is formed using

Ii (X ′i ; Y

′i ) = (1− k)Il(Xl; Yl) + kIr(XrLR ; YrLR); (25)

where

(XrLR ; YrLR) = (Xl + dLRx ; Yl + dLRy ); (26)

for each pixel Xl; Yl in the left channel image, where dLRx , dLRy are the components of the LR disparity vector.Thus, for each pixel (Ki; Li) in the intermediate image four separate cases must be examined:

1. Only an estimation of the luminance of the pixel (Ki; Li) using the RL disparity �eld exists. In other words,there exists (Xi; Yi) so that (Ki; Li) = (Xi; Yi), while there does not exist (X ′

i ; Y′i ) so that (Ki; Li) = (X

′i ; Y

′i ).

Then Ii(Ki; Li) = Ii(Xi; Yi).2. Only an estimation of the luminance of the pixel (Ki; Li) using the LR disparity �eld exists. In other words,there exists (X ′

i ; Y′i ) so that (Ki; Li)=(X

′i ; Y

′i ) while there does not exist (Xi; Yi) so that (Ki; Li)=(Xi; Yi).

Then Ii(Ki; Li)= Ii(X ′i ; Y

′i ).

3. Pixel (Ki; Li) is characterised as a ‘hole’, i.e. neither the LR nor the RL disparity �eld produces an estimatefor its luminance. An interpolation procedure, similar to the one described above is used to estimate disparityvectors at the ‘holes’ in the IL and IR disparity �eld. Then, the average of the estimated luminance valuesusing the IL and the IR disparity �elds is assigned to the pixel (Ki; Li) in the intermediate image:

Ii(Ki; Li)=12[Ir(r(Ki + dIRx ); r(Li + d

IRy )) + Il(r(Ki + d

ILx ); r(Li + d

ILy ))]: (27)

4. Both LR and RL disparity �elds produce estimates for the luminance of the pixel (Ki; Li). In this case thepixel (Ki; Li) is assigned the value that is closer in the mean square sense to the mean of the luminances

Page 9: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 213

of the pixels in the intermediate image (that have already been assigned a value) in a neighbourhood Naround (Ki; Li). In the present paper N was chosen to be a window of size 7× 7.

5. Occlusion-inconsistency checking

The detection of occluded points or points where the RL, and LR �elds are inconsistent in the left orthe right channel image, provides useful information that may be exploited to improve the quality of theintermediate images.In [31], occlusions from a stereo pair were detected by forming the set Sl ={y∈ Il: y=x+ dRL(x); x∈ Ir}

where dRL(x) is the RL disparity vector, which contains the set of points in the left frame Il which are atthe same time fully visible in the right view. The remaining points in the left channel image, lying in

Ol = Il∼Sl; (28)

are occluded in the right view.A generalisation of the above method was used, detecting points in both the left and the right view, on

which the RL and LR disparity �elds are inconsistent. These include, but are not limited to, occluded points.Iterative application of this method leads to the identi�cation of inconsistent RL and LR vectors which arethen excluded from the intermediate view generation procedure described by Eqs. (19), (25).More speci�cally, relative to the RL �eld found, a set Sr of points on Ir with consistent LR disparities and

a set Or where the RL, LR �elds are inconsistent are found using the LR disparity vector �eld dLR(x′):

Sr ={y∈ Ir: y=x′ + dLR(x′); x′∈ Il}; (29)

Or = Ir∼Sr : (30)

For the implementation of the inconsistency detection algorithm initially both Or and Ol are initialised asempty sets. Then the following steps are iterated for a speci�c number of iterations or until the sequence ofocclusion areas Or and Ol converges:i. the areas Ol in the right view are determined using Eq. (28);ii. the LR disparity vectors corresponding to points in Ol are discarded i.e. the set Sr is replaced by thefollowing set: S ′r ={y∈ Ir: y=x′ + dLR(x′); x′∈ Il∼Ol};

iii. Eq. (30) is used to determine the area Or in the right view;iv. the RL disparity vectors corresponding to points in Or are discarded, i.e. the set Sl is replaced by the

following set: S ′l ={y∈ Il: y=x+ dRL(x); x∈ Ir∼Or}.Experimental results show a clear improvement of the accuracy of the resulting occlusion-inconsistency de-tection algorithm compared to the approach in [31].

6. Improved disparity estimation and occlusion detection when one or more intermediate imagesare available to the encoder

In the case where three or more cameras are present at the transmitter site the extra views may be usedfor evaluation of disparity �elds found and for occlusion-inconsistency detection. As a result, both disparityestimation and spatial interpolation results are signi�cantly improved. In this case, a modi�cation of thehierarchical dynamic programming scheme described in Section 3, was used.Speci�cally, the cost function described by Eq. (12) was modi�ed by introducing an additional term:

Ck=∑

(i∈line k)

∑((l; j)∈Sk i)

(DFD(Vki) + � SMF(Vki) + �DFDint(Vki)): (31)

Page 10: Disparity eld and depth map coding for multiview 3D image generation1

214 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

The extra term DFDint contains the absolute di�erences between the corresponding image intensity blockscentered at the working pixel in the right channel image and the corresponding pixel in the intermediateimage. In other words,

DFDint(Vki)=∑

(x; y)∈W

∥∥∥∥Ir(k + x; i + y)− Ii(r(l+ k2

+ x); r(j + i2

+ y))∥∥∥∥: (32)

Following the disparity estimation, occlusions in the right channel image were determined as follows:A position (k; i) in the right channel image, corresponding to the pixel location (l; j) in the left channelimage was deemed to be occluded if

DFDint(Vki)� DFD(Vki): (33)

After determining the two occlusion maps, the spatial interpolation procedure described in Section 4 may beused.The above algorithm may likewise be used for the estimation of the reverse (LR) disparity �eld, correspond-

ing to disparity vectors from the left to the right channel image and also for the determination of occlusionsin the left channel image.

7. Coding of isolated (I-frame) depth=disparity maps

Two methods were evaluated for the coding of isolated depth maps. In the �rst, depth maps are quantisedand coded as if they were intensity images (I-frames) using the JPEG coding technique.In the second method, a wireframe model is adapted to the depth map information. Initially the wireframe

model is �t in each B×B block of the object area. The block size is successively reduced as shown in Fig. 4so as to obtain an adequate approximation of the depth information in this area. In other words, a block issplit into 4 blocks if

E=∑

(X;Y )∈block(z(X; Y )− z(X; Y ))2 (34)

exceeds a threshold. In the above, z(X; Y ) is the original depth value corresponding to pixel (X; Y ), and z(X; Y )is the approximated depth value produced by the wireframe model.The wireframe model is composed of triangular patches which are characterised by the coordinates of their

respective vertices [5, 13]. Given the (X; Y ) coordinates of the vertices of a patch, along with the depth z(X; Y )for each vertex, we can write the equation of the plane containing this patch. Let P1= (X1; Y1; z1(X1; Y1)),

Fig. 4. Wireframe models adapted to each block. (a) Original wireframe model. (b) Wireframe adapted to a block that is split.

Page 11: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 215

P2=(X2; Y2; z2(X2; Y2)) and P3=(X3; Y3; z3(X3; Y3)) denote the vertices of a patch, and P=(X; Y; z(X; Y )) beany point on this patch. Then

−→PP1 · ( −→P1P2 ×

−→P1P3)=0 (35)

gives the equation of the plane containing P1, P2 and P3. This equation can be expressed in the form

z(X; Y )=pX + qY + c; (36)

where

p= − (Y2 − Y1)(z3(X3; Y3)− z1(X1; Y1))− (z2(X2; Y2)− z1(X1; Y1))(Y3 − Y1)(X2 − X1)(Y3 − Y1)− (Y2 − Y1)(X3 − X1) ; (37)

q= − (X3 − X1)(z2(X2; Y2)− z1(X1; Y1))− (X2 − X1)(z3(X3; Y3)− z1(X1; Y1))(X2 − X1)(Y3 − Y1)− (Y2 − Y1)(X3 − X1) ; (38)

c=z1(X1; Y1)− pX1 − qY1: (39)

Using (36), the depth z of any point on a patch can be expressed in terms of the parameters p, q and theX and Y coordinates of that point. Hence, full depth information will be available if only the depths of thenodes of the wireframe are transmitted.For the coding of disparity �elds, each component of the vector �eld is quantised in a number of levels

and coded using DPCM following by entropy coding. Alternately, each component may be coded accordingto the JPEG standard. Wavelet=subband techniques [11, 27] for the coding of disparity �eld information wereexamined as a third alternative. More speci�cally the �lters described in [11] were used to obtain a hierarchicaldecomposition of each disparity �eld into three levels.

8. Depth=disparity interframe coding using 2D motion compensation

Interframe coding techniques may be applied for the coding of depth or disparity �eld sequences. A temporalDPCM technique may be applied directly to remove temporal correlation between consequent depth maps.However, this is not e�cient when the sequence contains strong motion.Alternately, a 2D motion compensation procedure may be used for the coding of sequences of depth maps

or disparity �elds. If the 2D motion �eld is already available to the decoder, as a byproduct of intensitycoding using, e.g. MPEG, an MPEG-like scheme for the coding of depth maps or disparity values may beemployed, with the initial depth or disparity map transmitted losslessly or by use of the techniques in Section 7and the subsequent depth or disparity maps obtained using block-based 2D motion compensation and errortransmission.More speci�cally, if m(r)=[m(r)x ; m

(r)y ]T denotes the 2D motion vector for the block containing (Xr ; Yr),

estimated from the right channel, the motion compensated depth value is provided by the relation

z(Xr ; Yr ; t)=z(Xr + m(r)x ; Yr + m(r)y ; t − 1); (40)

or likewise in the case of RL disparity compensation,

dRL(Xr ; Yr ; t)=dRL(Xr + m(r)x ; Yr + m(r)y ; t − 1); (41)

and similarly in the case of LR disparity compensation,

dLR(Xl; Yl; t)=dLR(Xl + m(l)x ; Yl + m(l)y ; t − 1); (42)

Page 12: Disparity eld and depth map coding for multiview 3D image generation1

216 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

where m(l)=[m(l)x ; m(l)y ]T is the 2D motion vector for the block containing (Xl; Yl), estimated from the left

channel. The error depth maps z(X; Y; t) − z(X; Y; t) or error disparity �elds are also coded using DCT andHu�man coding and transmitted. A signi�cant advantage of this method is that it uses only the alreadyavailable motion vectors. Thus, extra bit-rate is only spent to code the prediction errors.Alternately, the series of depth maps or disparity values may be directly coded using an MPEG encoder

and transmitted. This necessitates the use of additional coder equipment and additional bit-rate for the fulltransmission of the depth or disparity map displacement �eld.

9. Interframe depth=disparity coding using 3D motion compensation

If an object-based coder such as described in [28] is used for the coding of the left channel sequence, thetransmitted object outlines and 3D motion parameters may be used for coding of depth or disparity maps aswell as intensity images. Speci�cally, if (x(t); y(t); z(t)) are the coordinates of a point at time instant t, thecoordinates (x(t − 1); y(t − 1); z(t − 1)) of this point at the time instant t − 1 are given by

x(t − 1)y(t − 1)z(t − 1)

=

1 −wz wywz 1 −wx−wy wx 1

x(t)y(t)z(t)

+

txtytz

; (43)

where three translational (tx, ty, tz) and three rotational (wx, wy, wz) parameters describe the motion of theunderlying object.The depth=disparity information corresponding to the �rst frame of the image sequence is assumed trans-

mitted as described in Section 7. A prediction of the current depth map is then formed using the previous intime depth map along with the 3D motion parameters. Since the rotation matrix is orthogonal, (43) may beequivalently rewritten as

x(t)y(t)z(t)

=

1 wz −wy−wz 1 wxwy −wx 1

x(t − 1)− txy(t − 1)− tyz(t − 1)− tz

: (44)

In this way an MPEG-like sequence is formulated with full depth or disparity map transmission in thebeginning of each group of 7 frames. For the remaining frames (P-frames in MPEG terminology) the depthmap z(t) is evaluated from (44) using the previously estimated depths z′=z(X (t−1); Y (t−1); t−1), at points(X ′; Y ′)=(X (t − 1); Y (t − 1)) and the 3D motion parameters for each object in the scene, as follows:

z(t)=z′

f

(wy

(X ′ − f tx

z′)− wx

(Y ′ − f ty

z′

)+ f − f tz

z′

); (45)

where the coordinates (X (t); Y (t)) of a point in the time instant t are given by

X (t)=z′

z(t)

(X ′ − f tx

z′+ wz

(Y ′ − f ty

z′

)− wy

(f − f tz

z′))

(46)

and

Y (t)=z′

z(t)

(−wz

(X ′ − f tx

z′)+ Y ′ − f ty

z′+ wx

(f − f tz

z′)): (47)

Due to the oating point format of the 3D motion parameters, the mapping from time instant t−1 to the timeinstant t may point to positions outside the sampling grid of the frame. Therefore, an interpolation proceduremust be used to assign depth values at integer pixel locations. A procedure similar to the one described in

Page 13: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 217

Section 4 was used based on linear interpolation of the depth values from the nearest pixels in the samescanline that have already been assigned a depth value.In the case of disparity �eld coding, Eqs. (45)–(47) are used to generate an estimate of the disparity �eld

in the time instant t by applying (5) and (6) for the case of RL disparity or (8) and (9) for the case of LRdisparity. The error maps are coded and transmitted using DCT and Hu�man coding techniques.

10. Experimental results

The performance of the proposed coding methods was investigated in application to the compression of theinterlaced stereoscopic sequences ‘Anne’, ‘Tunnel’ (MPEG-4 test sequence) and the trinocular image sequences‘Robert’ and ‘Claude’ all of size 720× 576. 2 The original �rst frame left and right channel images of ‘Anne’,‘Claude’, ‘Robert’ and ‘Tunnel’ are shown in Figs. 5(a, b), 9(a, b), 13(a, b) and 13(c, d), respectively.

Fig. 5. (a) Original left channel image ‘Anne’. (b) Original right channel image ‘Anne’. (c) X -component of the LR disparity �eld forthe frame 1 of ‘Anne’. (d) X -component of the RL disparity �eld for the frame 1 of ‘Anne’.

2 These sequences were prepared by the Heinrich Hertz Institute (HHI) (‘Anne’, ‘Robert’), the Centre Commun D’ �Etudes deT�el�edi�usion et T�el�ecommunications (CCETT) (‘Tunnel’) and the THOMPSON BROADCASTING SYSTEMS (‘Claude’) for use inthe DISTIMA RACE project.

Page 14: Disparity eld and depth map coding for multiview 3D image generation1

218 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

Fig. 6. Lossless coding of depth=disparity �elds. (a) Interpolated image using the RL disparity �eld. (b) Interpolated image using the LRdisparity �eld. (c) Interpolated image using both the RL and LR disparity �elds. (d) Interpolated image using both RL and LR disparity�elds as well as the two occlusion-inconsistency maps.

The camera baseline in the set up for shooting the examined sequences was 50 cm and the convergencedistance 1.2 m. These parameters were chosen experimentally by the manufacturers, so as to provide theobserver with a good 3D feeling of the examined scene. The results however are sequences with very largedisparities (up to ±100 pixels in the foreground and ±30 pixels in the background). Accurate and consistentestimation of the LR and RL disparity �elds is very di�cult for such sequences. This di�culty is accentuatedby the homogeneity of the background and uneven lighting, conditions which often characterise realisticvideoconference image sequences such as the ‘Anne’ and ‘Robert’ sequences. In such sequences the disparity�elds corresponding to the background are highly inconsistent.Disparity was estimated using the hierarchical dynamic programming (HDP) approach described in

Section 3. The x-component of the LR and RL disparity �elds obtained by performing the HDP algorithmwith 2 levels of hierarchy and default search area are shown in Figs. 5(c) and 5(d), respectively, for ‘Anne’,in Figs. 9(c) and (d) for ‘Claude’, and in Figs. 15(a) and 15(b), respectively, for ‘Tunnel’. Note, that thedisparity map shown is biased by 128 to include both positive and negative values of disparity. The use ofthe dy component of the disparity �eld improves only modestly the quality of the reconstructed intermediate

Page 15: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 219

Table 1Disparity �eld coding using various coding methods. Average bit-rate needed to the transmission of the x- and y-components of thedisparity �elds. The average PSNR of the reconstruction quality of the x- and y-components of the disparity �elds was 41 and 43 dB,respectively. The average �gures for the RL and LR disparity �elds of the sequence ‘Anne’ with dimensions 720 × 576 at 25 frames=swere used as a typical example

Bit-rate dx Bit-rate dy Total bit-rateCompressionmethod (Mbps) (bpp) (Mbps) (bpp) (Mbps) (bpp)

Intraframe DPCM 8.489 0.818 2.275 0.219 10.764 1.038Motion-JPEG 1.971 0.190 0.371 0.035 2.342 0.225Motion-Subband 1.855 0.179 0.855 0.082 2.710 0.2612D MC 1.140 0.110 0.260 0.025 1.400 0.1353D MC 1.053 0.101 0.215 0.020 1.268 0.122

Fig. 7. Interpolated images generated using lossy RL+LR disparity �eld coding and occlusion-inconsistency maps. (a) Interpolated imageusing JPEG disparity �elds coding. (b) Interpolated image using wavelet=subband coded disparity �elds. (c) Interpolated image using 2Dmotion compensated coded disparity �elds. (d) Interpolated image using 3D motion compensated coded disparity �elds.

Page 16: Disparity eld and depth map coding for multiview 3D image generation1

220 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

Table 2Depth map coding using various coding methods. Average bit-rate versus average SNR of the reconstruction quality of the depthmap

Compression Bit-rate SNRmethod (Mbps) (dB)

Motion-JPEG 1.86 33.1Wireframe 1.92 33.32D MC 1.18 33.23D MC 1.06 33.1

Fig. 8. (a) Reconstructed frame 2 using JPEG-coded depth map. (b) Reconstructed frame 2 using wireframe depth modeling. (c)Reconstructed frame 2 using 2D motion compensation of the frame 1 depth map. (d) Reconstructed frame 2 using 3D motion compensationof the frame 1 depth map.

Page 17: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 221

Table 3Comparison between depth map and disparity �eld coding techniques.The minimum bit-rate necessary for (subjectively) acceptable intermedi-ate image quality for the ‘Anne’ image sequence is shown. The bit-rateneeded for the coding of the left channel of intensity images is alsoshown for comparison

Information Bit-ratecoded (Mbps)

X -component of the disparity �eld (2D MC) 1.14X -component of the disparity �eld (3D MC) 1.05Depth map (2D MC) 1.18Depth map (3D MC) 1.067Intensity image (MPEG) 3.50Both X and Y components of the disparity �eld (2D MC) 1.400Both X and Y components of the disparity �eld (3D MC) 1.268

Fig. 9. (a) Original left channel image ‘Claude’. (b) Original right channel image ‘Claude’. (c) X -component of the LR disparity �eldcorresponding to the left channel image ‘Claude’. (d) X -component of the RL disparity �eld corresponding to the right channel image‘Claude’.

Page 18: Disparity eld and depth map coding for multiview 3D image generation1

222 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

Fig. 10. Lossless coding of disparity �elds. (a) Interpolated image ‘Claude’ using the RL disparity �eld. (b) Interpolated image ‘Claude’using the LR disparity �eld. (c) Interpolated image ‘Claude’ using both the RL and LR disparity �elds. (d) Interpolated image ‘Claude’using both RL and LR disparity �elds as well as the two occlusion-inconsistency maps.

image. Experimental results have shown that the use of a search area of ±3 pixels for the y-component ofdisparity for the ‘Claude’ image improved only about 0.5 dB the quality of the intermediate image comparedto the result obtained using only the x-component of the disparity �eld.The occlusion maps corresponding to the left and the right channel images of ‘Tunnel’ after 10 iterations

of the algorithm described in Section 5 are shown in Fig. 15(c, d). The corresponding occlusion-inconsistencymaps for the right and left channel images of ‘Claude’ computed after 10 iterations of the same algorithm areshown in Fig. 11(a, b), respectively. In Fig. 11(c, d) the corresponding maps obtained using only one iteration,as suggested in [31], are depicted. As seen, the use of more iterations increases the number of inconsistenciesin the RL and LR disparity �elds that are detected and eliminated. This is the case both for sequences withmoderate inconsistency between the RL and LR �elds, such as ‘Claude’, as well as sequences where thisinconsistency is high, such as ‘Anne’.

Page 19: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 223

Fig. 11. (a) Left channel image occlusion map generated after 10 iterations of the occlusion detection algorithm. Black pixels indicateinconsistent RL and LR disparity �elds. (b) Right channel image occlusion map generated after 10 iterations of the occlusion detectionalgorithm. (c) Left channel image occlusion map generated after one iteration of the occlusion detection algorithm. (d) Right channelimage occlusion map generated after one iteration of the occlusion detection algorithm.

The intermediate images of ‘Anne’, ‘Claude’ and ‘Tunnel’, using losslessly coded disparity informationare shown in Figs. 6, 10 and 16, respectively. Figs. 6, 10 and 16 show the intermediate image generatedusing the RL disparity �eld, the LR disparity �eld, both RL and LR �elds, or both RL and LR �elds andthe inconsistency maps, respectively. As seen, especially with the ‘Anne’ sequence, the use of both LR andRL �elds and the identi�cation of occlusions and inconsistencies leads to a signi�cant improvement of theresulting intermediate view and provides a robust algorithm for intermediate image generation.Lossless and lossy techniques were then compared for the coding of the disparity �eld. Table 1 shows the

bit-rate needed (in Mbps or bits=pixel, respectively) along with the reconstruction quality (in PSNR terms)for the compression of the RL and LR disparity �elds. Motion JPEG and motion-subband coding techniques(without interframe prediction) are seen to require large bitrates for disparity �eld coding. The techniquesusing 2D and 3D motion compensation of the disparity �elds followed by prediction error transmission, appearsuitable. The 2D motion �elds used for the RL and LR disparity �eld coding were obtained respectively from

Page 20: Disparity eld and depth map coding for multiview 3D image generation1

224 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

Fig. 12. (a) X -component of the LR disparity �eld for ‘Claude’ estimated using the algorithm described in Section 6. (b) X -componentof the RL disparity �eld for ‘Claude’ estimated using the algorithm described in Section 6. (c) Left channel image occlusion mapcorresponding to the disparity �elds estimated using the intermediate image at the encoder. (d) Right channel image occlusion mapcorresponding to the disparity �elds estimated using the intermediate image at the encoder.

the MPEG coders 3 of the right and left intensity images while the 3D motion �elds were obtained from theobject-based coder described in [29]. The intermediate images generated using JPEG, subband, 2D motioncompensation or 3D motion compensation coding of the RL disparity �eld corresponding to the second frameof ‘Anne’ are shown in Fig. 7(a–d), respectively.Techniques are also compared for the coding of the depth map information corresponding to the right

channel image of the ‘Anne’ sequence. Depth maps are �rst quantised into 256 levels. Table 2 shows thebit-rate required for the coding of the quantised depth maps versus SNR of reconstruction. The e�ects of depthcompression methods on the quality of the generated intermediate view image are seen in Fig. 8. The bit-ratesnecessary for the transmission of depth and disparity maps resulting in the same quality of the interpolatedimages are shown in Table 3. As expected, the interframe techniques based on motion compensation produce

3 An implementation of the MPEG-1 standard, available from ftp:==havefun.stanford.edu=pub=mpeg was used.

Page 21: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 225

Fig. 13. Original left channel image ‘Robert’. (b) Original right channel image ‘Robert’. (c) Original left channel image ‘Tunnel’.(d) Original right channel image ‘Tunnel’ .

lower bit-rates. However, a drawback of these methods is that visual artifacts may occur in the generatedintermediate image, especially in areas where motion estimation fails.The original �rst frame left and right channel images of ‘Claude’ are shown in Fig. 9.Results were also obtained using the existing intermediate image for the trinocular image sequence ‘Claude’.

In this case, the existing intermediate image (Fig. 10) is used in the disparity estimation algorithm as describedin Section 6. The occlusion-inconsistency maps for the right and left channel images are shown in Fig. 11.The computed left and right disparity �elds and the corresponding occlusion maps for ‘Claude’ are shown inFig. 12(a–d). The coe�cients � and � in (31) were chosen equal to 0.5 and 1.5, respectively.Finally, the intermediate view generation algorithm was applied to the trinocular image sequence ‘Robert’

(Fig. 13). The disparity �eld was �rst estimated without the use of the existing intermediate view. Fig. 14(a)presents the actual central image of ‘Robert’, while Fig. 14(b) shows the intermediate image generated usingboth the obtained RL and LR disparity �elds along with the occlusion-inconsistency information. Fig. 14(c, d)shows the generated intermediate images of ‘Robert’ using the disparity �eld estimated by the algorithmpresented in Section 6 using only the RL or both RL and LR �elds and the occlusion maps, respectively. It isseen that the use of a third intermediate camera improves signi�cantly the quality of the resulting intermediateimage at the receiver. Similar results for ‘Tunnel’ are shown in Figs. 15 and 16.

Page 22: Disparity eld and depth map coding for multiview 3D image generation1

226 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

Fig. 14. (a) Actual central view of image ‘Robert’ (after luminance equalisation). (b) Reconstruction of the intermediate image usingthe left and right views plus the RL and LR disparity �elds and the occlusion-inconsistency maps. These disparity �elds were obtainedwithout the help of the intermediate view available at the transmitter. (c) Reconstruction of the intermediate view for ‘Robert’ usingthe RL �eld estimated with the help of the middle view available to the encoder. (d) Reconstruction of intermediate view for ‘Robert’using both RL and LR �elds estimated with the help of the middle view available to the encoder, and the occlusion-inconsistency mapinformation.

The quality of the intermediate images generated using both the LR and RL disparity �elds along withthe occlusion maps, for the trinocular images ‘Claude’ and ‘Robert’ may also be evaluated objectively. Theaverage PSNR of the reconstructed intermediate image was 27.34 for ‘Robert’ and 27.60 for ‘Claude’. Asdiscussed earlier, the noisy disparity �elds in the background contribute signi�cantly to the overall errors in thereconstructed image. If computed only for the foreground images of ‘Robert’ and ‘Claude’ the correspondingaverage PSNR values increase to 29.32 and 29.65, respectively.

Page 23: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 227

Fig. 15. (a) LR disparity �eld corresponding to the left channel image ‘Tunnel’. (b) RL disparity �eld corresponding to the right channelimage ‘Tunnel’. (c) Occlusion map corresponding to the left channel of ‘Tunnel’ after 10 iterations of the occlusion-inconsistencydetection algorithm. (d) Occlusion map corresponding to the left channel of ‘Tunnel’ after 10 iterations of the occlusion-inconsistencydetection algorithm.

11. Conclusions

In this paper, techniques were examined for the coding of the depth map and disparity �elds for multiviewimage communication applications. Intraframe and interframe techniques were evaluated and compared onthe basis of the intermediate images generated using the reconstructed depth=disparity information. Varioustechniques were also examined for the generation of intermediate views on the basis of either stereoscopicor trinocular camera con�gurations. The performance of the spatial interpolation techniques used is seen todepend on the performance of the methods used for disparity=depth map coding. Experimental results weregiven for the evaluation of the performance of the proposed coding methods.In summary, the experimental results show that

– The transmission of both the x- and y-components of the disparity vectors results in intermediate imagequality modestly higher than that obtained via the transmission of the x-components alone. However, thisenhanced quality is achieved at the cost of signi�cant bit-rate increases.

Page 24: Disparity eld and depth map coding for multiview 3D image generation1

228 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

Fig. 16. (a) Interpolated image ‘Tunnel’ using the RL disparity �eld. (b) Interpolated image ‘Tunnel’ using the LR disparity �eld. (c)Interpolated image ‘Tunnel’ using both the RL and LR disparity �elds. (d) Interpolated image ‘Tunnel’ using both RL and LR disparity�elds as well as the two occlusion-inconsistency maps.

– No signi�cant advantage in intermediate image quality achieved with the same bit-rate, attaches to eitherdepth map or disparity �eld (x-component �eld only) transmission. However, the mechanism for the con-struction of intermediate views is considerably simpler with the use of disparity transmission, which forthis reason should generally be preferable. In many important application areas though, such as medicalimage transmission, the use of depth map transmission could be indicated, because in such areas, depthinformation is by itself useful for purposes other than the generation of intermediate views.

– For either depth map or disparity �eld interframe coding, 2D or 3D motion compensation using the motion�elds obtained in intensity image interframe coding, give acceptable results. These are considerably betterthan those obtained either without interframe prediction or with non-compensated prediction methods.

– The estimation of both LR and RL disparities and the detection of occlusions improves very signi�cantlythe quality of the intermediate view synthesised by the receiver.

– The use of a third intermediate camera at the transmitter site improves signi�cantly the estimation of thedisparity �elds, the detection of occlusions and the accuracy of the resulting spatial interpolation at thereceiver.

Page 25: Disparity eld and depth map coding for multiview 3D image generation1

D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230 229

References

[1] H. Aydinoglou, F. Kossentini, M.H. HI, A new framework for multiview image coding, in: Proc. IEEE Internat. Conf. on Acoustics,Speech Signal Process., Detroit, USA, 1995, pp. 2173–2176.

[2] S. Barnard, W. Tompson, Disparity analysis of images, IEEE Trans. Pattern Anal. Machine Intell. 2 (1980) 333–340.[3] R.M. Bolle, B.C. Vemuri, Geometric modeling and computer vision, IEEE Trans. Pattern Anal. Machine Intell. 13 (1991) 1–13.[4] K.L. Boyer, A.C. Kak, Structural stereopsis for 3-D vision, IEEE Trans. Pattern Anal. Machine Intell. 10 (1988) 144–166.[5] G. Bozdagi, A.M. Tekalp, L. Onural, 3-D motion estimation and wireframe adaptation including photometric e�ects for model-based

coding of facial image sequences, IEEE Trans. on Circuits and Systems for Video Technology, June 1994, pp. 246–256.[6] B. Choquet, J.L. Dugelay, D. Pele, A coding scheme for stereoscopic television sequences based on motion estimation-compensation

using a 3D approach, IPA Conf. (IEE 95), submitted for publication.[7] B. Chupeau, A multiscale approach to the joint computation of motion and disparity: Application to the synthesis of intermediate

views, in: Proc. 4th European Workshop on 3DTV, Rome, Italy, October 1993.[8] B. Chupeau, Synthesis of intermediate pictures for autostereoscopic multiview displays, in: International Workshop on HDTV’94,

Torino, Italy, October 1994.[9] I.J. Cox, S. Hingorani, B.M. Maggs, S.B. Rao, Stereo without regularization, Tech. Rep., NEC Research Institute, Princeton, USA,

October 1992.[10] V.R. Dhond, J.K. Aggarwal, Structure from stereo – A review, IEEE Trans. Systems Man Cybernet. 19 (1989) 1489–1510.[11] S.N. Efstratiadis, D. Tzovaras, M.G. Strintzis, Hierarchical partition priority wavelet image compression, IEEE Trans. on Image

Processing 5 (July 1 1996).[12] W.L.O. Egger, M. Kunt, High compression image coding using an adaptive morphological subband decomposition, Proc. IEEE 83

(1995) 272–287.[13] J.D. Foley, A.V. Dam, S.K. Feiner, J.F. Hughes, Computer Graphics: Principle and Practice, 2nd edition. Addison-Wesley, Reading,

MA, 1990.[14] R.E.H. Franich, R.L. Lakendijk, J. Biemond, Object-based stereoscopic coding: Vector �eld estimation and object segmentation, in:

EUSIPCO ’94, Edinburgh, September 1994.[15] B.K.P. Horn, B. Shunck, Robot Vision, MIT Press, Cambridge, MA, 1986.[16] J. Liu, R. Skerjanc, Construction of intermediate pictures for a multiview 3D system, in: SPIE=IS&T Conf. on Electronic Imaging,

Science and Technology, San Jose, California, February 1992.[17] H. Li, A. Lundmark, R. Forchheimer, Image sequence coding at very low bitrates – A review, IEEE Trans. Image Processing 3

(1995) 589–609.[18] D. Martinez, J. Lim, Spatial interpolation of interlaced television pictures, in: Proc. IEEE Internat. Conf. Acoustics, Speech Signal

Process., New York, NY, April 1988, pp. 1886–1889.[19] T. Naemura, M. Kaneko, H. Harashima, 3-D object based coding of multiview images, in: Proc. Internat. Picture Coding Symp.

(PCS), Melbourne, Australia, March 1996, pp. 459–464.[20] S. Panis, M. Ziegler, Object based coding using motion and stereo information, in: Proc. Picture Coding Symp. (PCS’94),

Sacramento, California, September 1994, pp. 308–312.[21] V.E. Seferidis, Stereo image coding using generalized block matching and quad-tree structured spatial decomposition, IEEE Trans.

on Image Processing (1996).[22] V.E. Seferidis, D.V. Papadimitriou, Improved disparity estimation in stereoscopic television, Electron. Lett. 29 (April 1 1993).[23] R.Y.C. Shah, R.B. Mahani, A new technique to extract range information from stereo images, IEEE Trans. Pattern Anal. Machine

Intell. 11 (1989) 768–773.[24] R. Skerjanc, J. Liu, A three camera approach for calculating disparity and synthesizing intermediate pictures, Signal Processing:

Image Communication 4 (1991) 55–64.[25] A. Tamtaoui, C. Labit, Constrained disparity and motion estimators for 3DTV image sequence coding, Signal Processing: Image

Communication 4 (1991) 45–54.[26] R. Tsai, An e�cient and accurate camera calibration technique for 3D machine vision, in: Proc. IEEE Conf. on Computer Vision

and Pattern Recognition, Miami Beach, FL, 1986, pp. 364–374.[27] D. Tzovaras, S.N. Efstratiadis, M.G. Strintzis, Wavelet image compression using IIR minimum variance �lters, partition priority, and

multiple distribution entropy coding, in: Proc. SPIE Conf. on Visual Communications and Image Processing, Chicago, September1994, pp. 489–500.

[28] D. Tzovaras, N. Grammalidis, M.G. Strintzis, Object-based coding of stereo image sequences using joint 3-D motion=disparitycompensation, IEEE Trans. on Circuits and Systems for Video Technology 7 (2) (1997) 312–328.

[29] D. Tzovaras, N. Grammalidis, M.G. Strintzis, 3-D Motion=disparity segmentation for object-based image sequence coding, OpticalEngineering, Special Issue on Visual Communications and Image Processing 35 (1996) 137–145.

[30] D. Tzovaras, M.G. Strintzis, H. Sahinoglou, Evaluation of multiresolution block matching techniques for motion and disparityestimation, Signal Processing: Image Communication 6 (1994) 59–67.

Page 26: Disparity eld and depth map coding for multiview 3D image generation1

230 D. Tzovaras et al. / Signal Processing: Image Communication 11 (1998) 205–230

[31] J. Weng, N. Ahuja, and T.S. Huang, Matching two perspective views, IEEE Trans. Pattern Anal. Machine Intell. 14 (1992) 806–825.[32] J. Weng, T. Huang, N. Ahuja, Motion and structure from two perspective views: Algorithms, error analysis and error estimation,

IEEE Trans. Pattern Anal. Machine Intell. 11 (1989) 451–476.[33] M. Ziegler, Digital stereoscopic imaging and application: A way towards new dimensions, The RACE II project DISTIMA, in: IEE

Colloq. on Stereoscopic Television, London, 1992.[34] M. Ziegler et al., Digital stereoscopic television – State of the art of the European Project DISTIMA, in: Proc. 4th European

Workshop on 3DTV, Rome, 1993.[35] M. Ziegler, S. Panis, An object-based stereoscopic – Coder, in: Proc. Internat. Workshop on Stereoscopic and 3D Imaging, Santorini,

Greece, September 1995, pp. 40–45.