Top Banner
Full-reference quality assessment of stereopairs accounting for rivalry Ming-Jun Chen a,n , Che-Chun Su a , Do-Kyoung Kwon b , Lawrence K. Cormack c , Alan C. Bovik a a LIVE Laboratory, Department of Electrical and Computer Engineering, University of Texas, Austin, TX 78712, USA b Systems and Applications R&D Center, Texas Instrument, Dallas, TX 75243, United States c The Center for Perceptual Systems and the Department of Psychology, University of Texas at Austin, Austin, TX 78712, USA article info Article history: Received 1 November 2012 Received in revised form 31 March 2013 Accepted 31 May 2013 Available online 21 June 2013 Keywords: Binocular rivalry Image quality Stereoscopic quality assessment Stereo algorithm abstract We develop a framework for assessing the quality of stereoscopic images that have been afflicted by possibly asymmetric distortions. An intermediate image is generated which when viewed stereoscopically is designed to have a perceived quality close to that of the cyclopean image. We hypothesize that performing stereoscopic QA on the intermediate image yields higher correlations with human subjective judgments. The experimental results confirm the hypothesis and show that the proposed framework significantly outperforms conventional 2D QA metrics when predicting the quality of stereoscopically viewed images that may have been asymmetrically distorted. & 2013 Elsevier B.V. All rights reserved. 1. Introduction Stereoscopic vision was first systematically studied by Wheatstone [1] in the early 1800s and the production of 3D films can be dated back to 1903 [2]. Since then, numerous 3D films have been produced, culminating in the breakout success of Avatar in 2009, which went on to become the highest-grossing film of all time. The success of Avatar has since greatly inspired further efforts in 3D film production and improved technologies and methods for 3D content capture and display. Indeed, the wave of 3D has not been limited to the theater screen. In 2011, mobile phones supporting 3D capture and viewing are now available, the number of released 3D films has tripled compared to the number in 2008 [2], and broadcast 3D content over the Internet is becoming common [3]. With the release of 3D phones and 3D broadcast services, it is reasonable to believe that the amount of 3D content that is delivered by wireless and wireline will follow the trend of consumer video and increase exponentially over the next few years. Given an increasing clogged communication infrastructure, being able to monitor and maintain the integrity of 3D video streams is of high interest. However, our understanding of the perceptual factors that determine the quality of stereo- scopic viewed 3D videos remains limited. The perceptual quality of a stereoscopically viewed 3D image is the topic of this study, which can be very different from the perceived quality of each 2D image in the stereo- pair. The additional dimension of depth, along with unwanted side effects induced by geometry or poor stereo- graphy, leading to visual discomfort or fatigue, can affect the experience of viewing a stereoscopic image in both positive and negative ways. Conversely, a variety of factors need to be considered when creating 3D content, in order to be able to deliver a pleasant stereoscopic 3D viewing experience [4]. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication 0923-5965/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.image.2013.05.006 n Corresponding author. Tel.: +1 512 471 2887. E-mail addresses: [email protected] (M.-J. Chen), [email protected] (C.-C. Su), [email protected] (D.-K. Kwon), [email protected] (L.K. Cormack), [email protected] (A.C. Bovik). Signal Processing: Image Communication 28 (2013) 11431155
13

Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Contents lists available at ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 28 (2013) 1143–1155

0923-59http://d

n CorrE-m

ccsu@utcormackbovik@e

journal homepage: www.elsevier.com/locate/image

Full-reference quality assessment of stereopairs accountingfor rivalry

Ming-Jun Chen a,n, Che-Chun Su a, Do-Kyoung Kwon b, Lawrence K. Cormack c,Alan C. Bovik a

a LIVE Laboratory, Department of Electrical and Computer Engineering, University of Texas, Austin, TX 78712, USAb Systems and Applications R&D Center, Texas Instrument, Dallas, TX 75243, United Statesc The Center for Perceptual Systems and the Department of Psychology, University of Texas at Austin, Austin, TX 78712, USA

a r t i c l e i n f o

Article history:Received 1 November 2012Received in revised form31 March 2013Accepted 31 May 2013Available online 21 June 2013

Keywords:Binocular rivalryImage qualityStereoscopic quality assessmentStereo algorithm

65/$ - see front matter & 2013 Elsevier B.V.x.doi.org/10.1016/j.image.2013.05.006

esponding author. Tel.: +1 512 471 2887.ail addresses: [email protected] (M.-J. Cheexas.edu (C.-C. Su), [email protected] (D.-K. [email protected] (L.K. Cormack),ce.utexas.edu (A.C. Bovik).

a b s t r a c t

We develop a framework for assessing the quality of stereoscopic images that have beenafflicted by possibly asymmetric distortions. An intermediate image is generated whichwhen viewed stereoscopically is designed to have a perceived quality close to that of thecyclopean image. We hypothesize that performing stereoscopic QA on the intermediateimage yields higher correlations with human subjective judgments. The experimentalresults confirm the hypothesis and show that the proposed framework significantlyoutperforms conventional 2D QA metrics when predicting the quality of stereoscopicallyviewed images that may have been asymmetrically distorted.

& 2013 Elsevier B.V. All rights reserved.

1. Introduction

Stereoscopic vision was first systematically studied byWheatstone [1] in the early 1800s and the production of3D films can be dated back to 1903 [2]. Since then,numerous 3D films have been produced, culminating inthe breakout success of Avatar in 2009, which went on tobecome the highest-grossing film of all time. The successof Avatar has since greatly inspired further efforts in 3Dfilm production and improved technologies and methodsfor 3D content capture and display.

Indeed, the wave of 3D has not been limited to thetheater screen. In 2011, mobile phones supporting 3Dcapture and viewing are now available, the number ofreleased 3D films has tripled compared to the number in

All rights reserved.

n),won),

2008 [2], and broadcast 3D content over the Internet isbecoming common [3]. With the release of 3D phones and3D broadcast services, it is reasonable to believe that theamount of 3D content that is delivered by wireless andwireline will follow the trend of consumer video andincrease exponentially over the next few years. Given anincreasing clogged communication infrastructure, beingable to monitor and maintain the integrity of 3D videostreams is of high interest. However, our understanding ofthe perceptual factors that determine the quality of stereo-scopic viewed 3D videos remains limited.

The perceptual quality of a stereoscopically viewed 3Dimage is the topic of this study, which can be very differentfrom the perceived quality of each 2D image in the stereo-pair. The additional dimension of depth, along withunwanted side effects induced by geometry or poor stereo-graphy, leading to visual discomfort or fatigue, can affect theexperience of viewing a stereoscopic image in both positiveand negative ways. Conversely, a variety of factors need to beconsidered when creating 3D content, in order to be able todeliver a pleasant stereoscopic 3D viewing experience [4].

Page 2: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–11551144

A considerable amount of research has been conductedon the complex relationship between visual comfort andstereography. Such factor as accommodation-convergenceconflicts, the distribution of disparities, binocular mis-matches, depth inconsistencies, perceptual and cognitiveinconsistencies, and the quality of the images, may allaffect the degree of visual comfort experienced whenviewing stereoscopic content. Lambooij et al. [5] and Tamet al. [6] provide comprehensive reviews of these topics,and some general rules, such as the zone of comfortableviewing [7,8], have been proposed to predict potentialvisual discomfort, and to guide the production of satisfac-tory stereo content.

Regarding the visual quality of stereoscopic content,while a number of studies on the perception of distortedstereoscopic and 3D QA models have been conducted,methods for predicting 3D image quality remain limitedin capability, and indeed none has been shown to outper-form 2D QA models when predicting the quality ofstereoscopically viewed 3D images. For example, humanstudies [9–11] on distorted stereoscopic content haveshown that the perceptual quality of stereo content cannotbe expressed as simply as the average quality of the leftview and the right views.

Research on 3D QA models can be divided into twoclasses based on whether computed disparity informationis considered. The first group directly applies 2D QA modelto the 3D QA problem. The methods in [12,13] do not usedisparity information and apply 2D QA algorithms to theleft and right views independently, then combine (byvarious means) the two scores into a predicted 3D qualityscore. The models in this class are based on the hypothesisthat the quality of a binocularly viewed image may bededuced from the quality of the 2D images withoutaccessing disparity or the third dimension. However, otherstudies provide evidence that the quality of stereoscopi-cally viewed images is generally different than a simplecombination of the qualities of the 2D viewed images. Forexample, Meegan et al. [10] claimed that the binocularsense of the quality of asymmetric MPEG-2 distortedstereo images is approximately the average of the qualityof the two views, but that the perception of asymmetricblur distorted stereo images is dominated by the higherquality view.

The second class of models takes depth informationinto account, typically by applying 2D quality assessment(QA) algorithms on both stereo images and also on theestimated disparity map [14–20]. A 3D quality score isthen generated using a combination of the various pre-dicted 2D scores. The hypothesis underlying these QAmodels is that 3D viewing quality is correlated with depthquality. In this direction, Seuntiens [21] coined the termviewing experience to describe the overall sensation ofviewing stereoscopic content. This author opines that thequality of a stereoscopic viewing experience is chieflydetermined by three factors: image quality, depth quality,and naturalness. However, it is difficult to assess thequality of perceived depth or disparity, since ground truthdisparity or depth is generally not available. Such modelscan only assess the depth quality using estimated disparitymaps (computed from a pristine stereopair and/or from a

distorted stereopair). Hence 3D QA performance may besubstantially affected by the accuracy of the disparityestimation algorithm that is used. Moreover, benchmarktests on stereo algorithms [22] utilize high-quality stereoimages, and the performance of stereo algorithms ondistorted stereo images is rarely considered.

3D QA models that utilize models of binocular percep-tion are also available. Bensalma et al. [23] proposed a 3DQA algorithm that measures the difference of binocularenergy between the reference and tested stereopairs, andthus considers the potential influence of binocularity onperceived 3D quality. However, in their experiment, theyonly compared the performance of their model with Peak-Singal-to-Noise-Ratio (PSNR) and the Structural SIMilarity(SSIM) index, which perform significantly worse than highperformance 2D FR QA models, such as Multi-Scale Struc-tural SIMilarity (MSSSIM) index [24] and visual informationfidelity (VIF) [25]. Thus, it is still questionable whether theirmodel can outperform high performance 2D FR QA modelsin prediction the quality of stereoscopic 3D images. Inaddition, they only provided performance numbers for JPEGdistortion stereo images. Wang et al. [26] proposed a 3D QAmodel that is based on the suppression theory of binocularfusion. The basic tenet of the suppression theory is that thebinocular visual signal is, in fact, a spatial patchwork ofmonocular inputs. In other words, in any given spatialregion, vision is dominated by either one eye or the other.Within this framework, it is conceivable that spatial detailcould be carried by one eye's signal, with the other eyecontributing only what is necessary for disparity computa-tions. Unfortunately, as summarized in Howard and Rogers[27], a vast body of the literature has rendered the suppres-sion theory untenable. Thus, any QA metric based on thesuppression theory cannot be based what the human visualsystem actually does. Ryu et al. [28] also proposed a 3D FRQA model that utilized research on binocular perception. Intheir model, the 3D quality score is a weighted summationof the quality scores from the left and right views.

We take steps towards ameliorating some of theseshortfalls by introducing a 3D QA framework that is basedon biologically plausible visual processing. Our proposedmodel is motivated by the results of studies on maskingand facilitation effects experienced when viewing stereo-scopic images. In particular, we make a model of theinfluence of binocular rivalry between the left-right views.Evidence from a series of directed human studies is shownto support the ideas embodied in this new 3D QA frame-work. A practical 3D QA algorithm is derived and shownto perform well on a large 3D image quality databasecontaining both symmetric and asymmetric distortedstereopairs equipped with associated ground truth dis-parity maps.

The reminder of this paper is organized as follows.Section 2 describes related work on relevant aspects of 3Dperception which helps to motivate the models that areused. The overall 3D QA framework is described in Section3 including the derivation of a practical 3D QA algorithmfrom the models. Section 4 describes the experimentsconducted on the 3D QA database and analyzes the modelperformance. Finally, Section 5 concludes the paper with adiscussion of ideas for future work.

Page 3: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–1155 1145

2. Distortion perception on stereo images

In order to explain the ideas underline the 3D QAmodeling framework, we first review some relevant find-ings on 3D distortion perception (Section 2.1). We thenfocus on the highly relevant phenomenon of binocularrivalry (Section 2.2) and how it affects the quality ofdistorted stereo images.

Fig. 2. Illustration of binocular rivalry: Two different stimuli are pre-sented to the left eye (an arrow) and the right eye (a star). The blue lineindicates that the stimulus is perceived by a human observer inside thattime interval. (For interpretation of the references to color in this figurelegend, the reader is referred to the web version of this article.)

2.1. Masking/facilitation of distorted stereo images

Meegan et al. [10] founded that the task of conductingsubjective 3D QA on MPEG compressed video content canbe reduced to conducting 2D spatial QA, but on blurdistorted content conducting 2D QA is insufficient. Seun-tiens et al. [9] later found similar behavior on JPEGcompression distorted images, reinforcing the idea thatthe impact of 2D distortion on 3D content is highlydistortion-dependent.

Of course, the interaction of distortion with content cangreatly influence distortion visibility and annoyance level,e.g., the well-known masking effect [30]. However, thereexists only a vanishingly small literature on depth/dispar-ity masking, and no prior work on the effect of depth/disparity content on distortion visibility. Thus, to gaininsight into possible depth masking or binocular effectsof stereo 3D distortions, a study wherein human subjectswere asked to identify local distortions embedded instereopairs was conducted [29]. Fig. 1 illustrates a localdistorted stimulus in a 2D image. By varying the positionof the local distortion on both views, symmetric andasymmetric distorted stereopair were created. The studydesign provided both subjective quality ratings and datadirectly linked to subject performance (success rate andresponse times when asked to find distortions). Fourdiverse distortion types were studied: white noise, blur,JPEG compression, and JP2K compression distortions.

Two main observations of use arose from this study.First, we found that the perceived quality of a stereoscopicimage cannot be accurately characterized by the averagequalities of the left and right images for blur, JPEG, and

Fig. 1. Image with local white noise distortion. The boundary wasblended using a Gaussian blending window. When the image waspresented, the subject was requested to point out the distortion byclicking the mouse cursor on the distortion.

JP2K distorted stereo images, but that the average may bea good quality predictor for white noise distorted stereoimages. Our findings on JPEG distorted images weredifferent than those reported in prior work [9], but arestatistically significant. Second, we did not observe anydepth masking effect for stereoscopic images, i.e., maskingof distortion by depth activity. However, for stereoscopicimages distorted by blur, JPEG, or JP2K distortions, wefound the surprising opposite: that distortions co-locatedwith high depth variations are more easily found by thehuman subjects, i.e., there exists a facilitation effect.

Given that the stereoscopically viewed quality of astereoscopic image cannot generally be accurately pre-dicted by the average qualities of the two stereo images forall distortion types; and that it depends on the type ofdistortion, then it is sensible to incorporate this observa-tion into a 3D stereoscopic QA model.

2.2. Binocular rivalry

Binocular rivalry is a perceptual effect that occurs whenthe two eyes view mismatched images at the same retinallocation(s). Here, mismatch means that the stimuli receivedby the two eyes are sufficiently different from each other tocause match failures or to otherwise affect stereopercep-tion. Failures of binocular matching trigger binocular rivalry,which is experienced in various ways, i.e., a sense of failedfusion or a bi-stable alternation between the left and righteye images. Fig. 2 shows an example of binocular rivalrywhen mismatched stimuli are present. In Fig. 2, in theinterval (t0, t1), the observer saw the stimulus from the lefteye (the arrow). Then, the stimulus from the right eye (thestar) dominated until time t2, after which the observeragain saw an arrow. This fluctuation continues when anobserver is experiencing binocular rivalry. The fluctuationperiod may vary from a fraction of a second to severalseconds, and it may depend on the color, shape, and textureof the stimuli. Binocular suppression [31] is a special case ofbinocular rivalry. When binocular suppression1 is experi-enced, no rivalrous fluctuations occur between the twoimages when viewing the mismatched stereo stimulus.Instead, only one of the images is seen while the other is

1 Binocular suppression does not equal to the suppression theory ofbinocular fusion used by Wang's work [26].

Page 4: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Fig. 3. Illustration of binocular suppression: two different stimuli arepresented to the left eye (an arrow) and the right eye (a star). An observeronly sees the arrow when s/he experiences binocular suppression.

Fig. 4. The proposed framework for 3D quality assessment.

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–11551146

hidden from conscious awareness. Fig. 3 shows an exampleof binocular suppression.

Numerous studies have been conducted towardsunderstanding binocular rivalry/suppression. Currently,three different models are prevalent: early suppression,late (high-level) suppression, and a hybrid model includ-ing both early and late processes. The early suppressionmodel [32–34,31,35] suggests that binocular rivalry is theresult of competition between the eyes. This model viewsrivalry as an early visual process involving reciprocalinhibition between the monocular channels.

Research that supports a high level suppression model[36,37], on the other hand, argues that there is very littlecorrelation between neural activity and perceptual alter-nations in area V1 of visual cortex. Moreover, there aresome early cortical neurons whose activity is anti-correlated with binocular perception; this means thatthese neurons fire more when acting on suppressedstimuli. Hence, it has been claimed that the rivalry modelshould be high-level. For example, Alais and Blake [38]showed that grouping information may contribute tobinocular rivalry. Finally, since both early and late modelsare supported by some evidence, more recent research[39] suggests that a hybrid model may be the bestexplanation. However, these ideas have not previouslybeen applied towards understanding how binocular rivalrymight be related to distortion type. Another importantfinding is that binocular rivalry is a nearly independentlocal process. A series of papers [40,41,38] discuss whetherthe binocular rivalry zones function independently, andtheir findings indicate that binocular rivalry is composedof local processes.

The discussions in Section 2 provide basic concepts thatare used in the 3D QA framework that is introduced in thenext section.

3. A framework for quality assessment of distorted stereoimages

The logical goal of a 3D stereoscopic QA model is toestimate the quality of the true cyclopean image formedwithin an observer's mind when a stereo image pair isstereoscopically presented. Of course, simulating the truecyclopean image [42] associated with a given stereopair isa daunting task, since it would require accounting for thedisplay geometry, the presumed fixation, vergence, andaccommodation. This task is herculean, and is com-pounded by the fact that it is still unclear how a true

cyclopean image is formed! Towards a limited approxima-tion of this goal, however, we seek to synthesize aninternal image having a quality level that is close to thequality of the true cyclopean image. By way of notation,henceforth we still use the term “cyclopean” image torepresent the synthesized image and cyclopean image tomean the one formed in the observer's mind. By perform-ing 3D quality assessment on the synthesized “cyclopean”image we hope to produce accurate estimates of 3D qualityperceived on the true cyclopean image.

The concept underneath the model framework isshown in Fig. 4. Given a stereo image, an estimateddisparity map is generated by a stereo algorithm, whileGabor filter responses are generated on the stereo imagesusing a bandpass filter bank. A cyclopean image is synthe-sized from the stereo image pair, the estimated disparitymap, and the Gabor filter responses. A cyclopean image iscreated from the reference stereopair and another “cyclo-pean” image is calculated from the test stereopair. Finally,full reference 2D QA models are applied to the twocyclopean images to predict 3D quality scores.

3.1. Disparity estimation

Research on stereo algorithm (disparity estimation)design has been a topic of intense inquiry for decades.However, there is no consensus on the type of stereomatching algorithm that should be used in 3D QA otherthan it be of low complexity. Further, there is scarceliterature on the performance of stereo algorithms operat-ing under different distortion regimens. Therefore, wedeploy a variety of these efficient stereo depth-findingalgorithms differing considerably in their operational con-stants along with the framework we described above toassess perceived 3D quality.

In order gain insights into the influence of stereoalgorithms on the performance of 3D QA models, threestereo algorithms were selected based on their complexityand performance. In general, better stereo algorithms(based on results on the Middlebury database [22]) havehigher computational complexity, and we balanced this

Page 5: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–1155 1147

tradeoff in the choice of stereo matching models. The firstalgorithm has the lowest complexity. It uses a very simplesum-of-absolute difference (SAD) luminance matchingfunctional without a smoothness constraint. The disparityvalue of a pixel in a stereopair is uniquely computed byminimizing the SAD between this pixel and its horizontalshifted pixels in the other view with ties broken byselecting the lower disparity solution. The second algo-rithm [43] has the highest complexity among the threemodels. This segmentation-based stereo algorithm deli-vers highly competitive results on the Middlebury data-base [22]. The third is a SSIM based stereo algorithm thatuses SSIM scores to choose the best matches. The disparitymap of a stereopair is generated by maximizing the SSIMscores between the stereopair along the horizontal direc-tion, again resolving ties by a minimum disparity criterion.

3.2. Gabor filter bank

As discussed earlier, when the two images of a stereo-pair present different degrees or characteristics of distor-tion, the subjective quality of the stereoscopically viewed3D image generally cannot be predicted from the averagequality of the two individual images. Binocular rivalry is areasonable explanation for this observation. Levelt [34]conducted a series of experiments that clearly demon-strated that binocular rivalry/suppression was stronglygoverned by low-level sensory factors. He used the termstimulus strength, and noted that stimuli that were higherin contrast, or had more contours, tend to dominate therivalry. Inspired by this result, we use the energy of Gaborfilter bank responses on the left and right images to modelstimulus strength and to simulate rivalrous selection of“cyclopean” image quality.

The Gabor filter bank extracts features from the lumi-nance and chrominance channels. These filters closelymodel frequency-orientation decompositions in primaryvisual cortex and capture energy in a highly localizedmanner in both space and frequency [44]. A complex 2DGabor filter is defined

Gðx; y; sx; sy; ζx; ζy; θÞ

¼ 12πsxsy

e−ð1=2Þ½ðR1=sxÞ2þðR2=syÞ2 �eiðxζxþyζyÞ ð1Þ

where R1 ¼ x cos θ þ y sin θ, and R2 ¼ − sin θ þ y cos θ,sx and sy are the standard deviations of an ellipticalGaussian envelope along x and y axes, and ζx and ζy arespatial frequencies, and θ orients the filter. The design ofthe Gabor filter bank was based on the work conducted bySu et al. [45]. The local energy is estimated by summingGabor filter magnitude responses over four orientations(horizontal, both diagonals, and vertical (901) at a spatialfrequency of 3.67 cycles/degree, under the viewing modeldescribed in Section 4.1.3.

Regarding the choice of the spatial center frequency,Tyler [46] pointed out that the depth signal in humanvision is carried within a much smaller band-width than isthe luminance channel. In addition, Schor et al. [47] foundthat the stereoacuity of human vision normally falls offquickly when seeing stimuli dominated by spatial frequen-cies lower than 2.4 cycles/degree. Based on their findings,

using filters having spatial center frequencies in the rangefrom 2.4 to 4 cycles/degree should produce responses towhich a human observer would be most sensitive.

3.3. Cyclopean image

A linear model was proposed by Levelt [34] to explainthe experience of binocular rivalry in perceived cyclopeanimage when a stereo stimulus is presented. The model heproposed is

wlEl þwrEr ¼ C ð2Þwhere El and Er are the stimuli to the left and the righteye respectively, wl and wr are weighting coefficients forthe left and right eye that are used to describe the pro-cess of binocular rivalry, where wl þwr ¼ 1, and C is thecyclopean image.

Given that a foveally presented monocular stimulusgenerally does not disappear spontaneously, he hypothe-sized that the duration of a period of dominance period ofan eye does not depend on the strength of the stimuluspresented to that eye, but rather on the stimulus strengthpresented to the other eye. Therefore, he concludes thatthe experience of binocular rivalry is not correlated to theabsolute stimulus strength of each view, but is insteadrelated to the relative stimulus strengths of two views.He also proposed a model whereby the weighting co-efficients are positively correlated with the stimulusstrengths, which we embody in a biologically plausiblemodel whereby the local energies of the responses of abank of Gabor filters are used to weight the left and rightimage stimuli. Since binocular rivalry is a local multiscalephenomena (as discussed in Section 2.2), broadeningLevelts model in this manner is a natural way to simulatea synthesized cyclopean image. In our model, as in Levelts;the stereo views used to synthesize to the cyclopean vieware disparity-compensated then the “cyclopean” image ismapped onto the coordinate system of the left view image.Thus the localized linear model that we use to synthesize a“cyclopean” image is

CIðx; yÞ ¼WLðx; yÞ � ILðx; yÞþWRððxþ dÞ; yÞ � IRððxþ dÞ; yÞ ð3Þ

where CI is the simulated “cyclopean” image, IL and IR arethe left and right images respectively, and d is a disparityindex that corresponds pixels from IL to those in IR. Theweights WL and WR are computed from the normalizedGabor filter magnitude responses

WLðx; yÞ ¼GELðx; yÞ

GELðx; yÞ þ GERððxþ dÞ; yÞ ð4Þ

WRðxþ d; yÞ ¼ GERððxþ dÞ; yÞGELðx; yÞ þ GERððxþ dÞ; yÞ ð5Þ

where GEL and GER are the summation of the convolutionresponses of the left and right images to filters of the form(1). Because of the normalization in (5), increased Gaborenergy of either (the left or right) stimulus suppresses thecontribution of the other view when there is binocularrivalry. Finally, the task of 3D QA is performed by applying

Page 6: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Fig. 5. Top: stereo images with local white noise distortion. Bottom: cyclopean image synthesized by the proposed framework.

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–11551148

a full reference 2D QA algorithm on the reference “cyclo-pean” image and on the test “cyclopean” image.

Fig. 5 shows an example of a synthesized “cyclopean”image. The stereopairs in Fig. 5 are locally distorted bywhite noise patches at different locations. Since the whitenoise distortion produces a elevated stimulus strength, thesynthesized cyclopean image is dominated by white noisedistortion, which approximates the experience whenstereoscopically viewing the stereopair.

4. Experiment and discussion

A human study was conducted to construct a subjectivedata set to be used in assessing algorithms of this type.This section describes the human study and experimentsperformed using it.

4.1. Stereoscopic image quality dataset

A stereoscopic image quality dataset annotated withassociated subjective quality ratings was constructed usingthe outcomes of a human study. The details of the datasetand human study are described in the following.

4.1.1. Source imagesThe stereo images used for the study were captured by

members of the LIVE lab. They captured co-registered stereoimages and range data with a high-performance rangescanner (RIEGL VZ-400 [48]) with a Nikon D700 digitalcamera mounted on the top. The stereo images pairs wereshot with a 65 mm camera base distances. Off-line correc-tion was later applied to deal with translations occurringduring capture. The sizes of the images are 640�360 pixels.The eight pristine images are shown in Fig. 6, while Fig. 7shows the ground truth depth map of one of them. Theeight pairs of stereo images to be used in this study were

taken on the campus of The University of Texas at Austinand a nearby park. The ground truth depth map of eachstereopair was transformed to a ground truth disparity mapbased on the captured model described above.

4.1.2. ParticipantsSix females and twenty-seven males participated in the

experiment all aged between 22 and 42 years. A Randotstereo test was used to pre-screen participants for normalstereo vision. Each subject reported normal or correctednormal vision and no acuity or color test was deemednecessary.

4.1.3. Display settingThe study was conducted using a Panasonic 58 in. 3D

TV with active shutter glasses. The viewing distance wasset at 116 in., which is four times the screen height.

4.1.4. StimuliBoth symmetric and asymmetric distortions were gen-

erated. The distortions that were simulated include com-pression using the JPEG and JP2K compression standards,additive white Gaussian noise, Gaussian blur and a fast-fading model based on the Rayleigh fading channel. Thedegradation of stimuli was varied by control parameterswithin pre-defined ranges; the control parameters arereported in Table 1. The ranges of control parameters weredecided beforehand to ensure that the distortions variedfrom almost invisible to severely distorted with goodoverall perceptual separation. For each distortion type,every reference stereopair was distorted to create threesymmetric distorted stereopairs and six asymmetric dis-torted stereopairs. Thus, a total of 360 distorted stereo-pairs were created.

Page 7: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Fig. 6. The eight reference images (only left views are shown) used in the experiment.

Fig. 7. A stereo image (free-fuse the left and right images) and the ground truth disparity maps.

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–1155 1149

Page 8: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Table 1Range of parameter values for distortion simulation.

Distortion Control parameter Range

WN Variance of Gaussian [0.001 0.5]Blur Variance of Gaussian [0.5 30]JP2K Bit-rate [0.04 0.5]JPEG Quality parameter [8 50]FF Channel signal-to-noise ratio [15 30]

Fig. 8. Illustration with 2D image, the study was conducted with 3Dstimuli. The subject was requested to give a overall 3D viewing experi-ence when a 3D stimulus is shown.

Table 2SROCC scores obtained by averaging left and right QA scores (centercolumn) and using the 3D “cyclopean” model (right column).

Algorithm 2D baseline “Cyclopean model”

PSNR 0.672 0.762SSIM 0.796 0.856MS-SSIM 0.78 0.901VIF 0.822 0.864

Table 3LCC scores obtained by averaging left and right QA scores (center column)and using the 3D “cyclopean” model (right column).

Algorithm 2D baseline “Cyclopean model”

PSNR 0.687 0.783SSIM 0.804 0.867MS-SSIM 0.784 0.908VIF 0.844 0.872

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–11551150

4.1.5. ProcedureWe followed the recommendation for a single stimulus

continuous quality scale (SSCQS) [49] to collect the 3Dsubjective image quality of each distorted stereoscopicimage. The instructions given to each participant was: givean overall rating based on your viewing experience whenviewing the stereoscopic stimuli. The ratings were obtainedon a continuous scale labeled by equally spaced adjectiveterms: bad, poor, fair, good, and excellent, i.e. a Liekart scale.The graphic user Interface (GUI) used is shown in Fig. 8. Theexperiment was divided into two sessions; each held to lessthan 30 min to minimize subject fatigue. A training sessionusing six stimuli was conducted before the beginning ofeach study to verify that the participants were comfortablewith the 3D display and to help familiarize them with theuser interface used in the task. The training content wasdifferent from the images in the study and was impairedusing the same distortions. Questions about the experimentwere answered during the training session and a short post-interview was conducted to determine whether the partici-pant experienced visual discomfort during the experiment.Only two participants reported any visual discomfort.

Table 4RMSE values obtained by averaging left and right QA scores (centercolumn) and using the 3D “cyclopean” model (right column).

Algorithm 2D baseline “Cyclopean model”

PSNR 17.67 15.09SSIM 14.43 12.11MS-SSIM 15.09 10.2VIF 13.03 11.89

4.1.6. Subjective quality scoresDifference opinion scores (DOS) were obtaining by

subtracting the ratings that the subject gave each refer-ence stimuli from the ratings that the subjective gave tothe corresponding test distorted stimuli. The remainingsubjective scores were then normalized to Z-scores, andthen averaged across subjects to produce difference meanopinion scores (DMOS).

4.2. Performance against subjective quality ratings

We studied four widely-used full-reference 2D QAmetrics (PSNR, SSIM [50], VIF [51] [43], and MS-SSIMcandidate 2D QA methods to be used within the 3D QAframework. This is the final stage of predicting the qualityof the cyclopean image. We used Spearman's rank orderedcorrelation coefficient (SROCC), the linear (Pearson's) cor-relation coefficient (LCC) and the root-mean-squared error(RMSE) to measure the performance of the 3D QA modelsthus devised. LCC and RMSE were computed after logisticregression through a non-linearity which is described in[25]. Higher SROCC and LCC values indicate good correla-tion (monotonicity and accuracy) with human qualityjudgments, while lower values of RMSE indicate betterperformance.

4.2.1. Performance using ground truth disparity mapWe begin the performance analysis by using ground

truth depth, which minimizes the effects of flaws in thestereo matching algorithms. The performance numbers areshown in Tables 2–4. Also included are the performancenumbers arrived at using the same 2D FR QA algorithms,simply applied to the left and right views and the QAscores averaged. The “cyclopean” QA algorithm does sig-nificantly better than the 2D baseline QA algorithms onthe mixed data set containing both symmetric and asym-metric distorted data.

Page 9: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–1155 1151

It is clear from Tables 2 to 4 that MS-SSIM delivers thebest performance among the four 2D QA algorithms whenembedded in the “cyclopean” model. Fig. 9 breaks downthe performance of the “cyclopean” model using MS-SSIM.Clearly, the QA performance is improved on blur and JP2Kas might be expected, since strong binocular rivalry existsin asymmetric blur and JP2K distorted stereo images. Theimprovement in QA performance for FF distorted images isalso significant for similar reasons. For stereo imagesdistorted by white noise, there is no significant differencebetween the performance of averaged 2D QA and the“cyclopean” mode since binocular rivalry does not occurin white noise distorted stereo images [29]. For JPEGcompression distorted stereo images, the performancenumbers of the averaged 2D QA and the “cyclopean”model are very close. These results strongly suggest that

Fig. 9. SROCC values using MS-SSIM, b

0 20 40 60 80 1000

20

40

60

80

100

120

Predicted Quality Scores

DM

OS

sco

res

MS−SSIM "cyclopean" framework

Fitting errors: RMSE = 10.169

Fitti

ng e

rror

0 20 40 60 80 1000

20

40

60

80

100

120

Predicted Quality Scores

DM

OS

sco

res

Human Subjective Score vs. MS−SSIM baseline

Fitting errors: RMSE = 14.749

Fitti

ng e

rror

s

Human Subjective Score vs.

Fig. 10. Plot of predicted objective scores versus DMOS and predicting errors. Toperrors of MS-SSIM “cyclopean” framework. Bottom left: predicted by MS-SSIM 2

binocular rivalry is an important ingredient in subjectivestereoscopic QA, and our “cyclopean” framework success-fully captures and utilizes binocular rivalry to predictsubjective 3D quality. Fig. 10 plots the predicted qualityscores using MS-SSIM (after logistic regression) versusDMOS. Predicted scores from the proposed “cyclopean”framework are shown on top-left, while the bottom-leftplot shows the scores from the 2D baseline. Clearly, thepredicted scores attained using the “cyclopean” frame-work are better than the scores predicted by the 2Dbaseline. Moreover, the predicting errors which are mea-sured by root mean square error (RMSE) of the “cyclopean”framework are lower than the predicting errors of the 2Dbaseline.

To obtain deeper insights into how the performance ofthe “cyclopean” 3D QAmodel is improved by accounting for

roken down by distortion type.

0 20 40 60 80 100−40

−30

−20

−10

0

10

20

30

40

Predicted Quality Scores

Fitting Residual of MS−SSIM "cyclopean" framework

0 20 40 60 80 100−40

−30

−20

−10

0

10

20

30

40

Predicted Quality Scores

Fitting Residual of MS−SSIM baseline

left: predicted by MS-SSIM “cyclopean” framework. Top right: predictingD baseline. Bottom right: predicting errors of MS-SSIM 2D baseline.

Page 10: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–11551152

binocular rivalry, its performance on the separated sym-metric and asymmetric distorted stereopairs is reported inTables 5–7. The performance numbers in Tables 5–7 indi-cate that the “cyclopean” model did not boost performanceon symmetric distorted stereoscopic images. However,performance was greatly enhanced on the asymmetricdistorted stereopairs. Furthermore, Tables 5–7 indicate thatthe task of predicting the quality of asymmetric distortedstereopairs is more difficult than that of predicting thequality of symmetric distorted data.

4.2.2. Influence of stereo matching algorithmsThe preceding discussion describing the stereoscopic

“cyclopean” QA model assumed that highly accurateground truth depth values are available. Next, we studystereoscopic QA performance when estimated depth isused as computed by stereo algorithms.

Currently, stereo matching algorithms are generallytested on undistorted stereo images and compared usinga simple measure (bad-pixel rate) [22]. The bad-pixel rate

Table 5SROCC scores relative to human subjective scores. Obtained usingaveraged left-right QA scores (2D baseline) and the “cyclopean” modelon symmetric and asymmetric distorted stereopairs.

Symmetric Asymmetric

2Dbaseline

“Cyclopean”model

2Dbaseline

“Cyclopean”model

PSNR 0.781 0.819 0.596 0.698SSIM 0.826 0.85 0.742 0.827MSSSIM 0.912 0.929 0.687 0.854VIF 0.916 0.902 0.737 0.804

Table 6LCC scores relative to human subjective scores obtained using averagedleft-right QA scores (2D baseline) and the “cyclopean” model on sym-metric and asymmetric distorted stereopairs.

Symmetric Asymmetric

2Dbaseline

“Cyclopean”model

2Dbaseline

“Cyclopean”model

PSNR 0.791 0.825 0.625 0.737SSIM 0.845 0.882 0.767 0.850MSSSIM 0.924 0.937 0.709 0.879VIF 0.924 0.906 0.772 0.822

Table 7Fitting errors measured by RMSE obtained using averaged left-right QAscores (2D baseline) and the “cyclopean” model on symmetric andasymmetric distorted stereopairs.

Symmetric Asymmetric

2Dbaseline

“Cyclopean”model

2Dbaseline

“Cyclopean”model

PSNR 16.42 15.15 16.83 14.58SSIM 14.35 12.65 13.85 11.37MSSSIM 10.23 9.37 15.20 10.29VIF 10.23 11.35 13.69 12.27

(BR) is defined as

BR¼ 1N

∑ðx;yÞ

dCðx; yÞ−dT ðx; yÞ 4δd��

�� ð6Þ

where δd is a disparity error tolerance, dC is the computeddisparity map, and dT is a ground truth depth map. We useδd ¼ 1 as suggested by the authors of [22].

However, we believe that such metrics provide little orno information regarding perceived 3D image quality.Indeed, there have been no studies conducted to deter-mine the degree to which the quality of an estimateddisparity map is correlated with subjective judgements ofdepth. It is likewise unclear whether distortions of stereo-pairs affects perceived depth quality [11,21].

The bad-pixel rates of the three selected stereo algo-rithms against ground truth are reported in Table 8.Clearly, all perform equally poorly when applied to dis-torted images. This lack of robustness is not unexpectedowing to the ill-posedness of the stereo problem, and sincenone of these (or any other) stereo algorithms has beendesigned to excel in the presence of distortions. Forexample, as shown in Fig. 11, white noise confuses theSSIM-based stereo matching algorithm, yet a humanobserver easily fuses the stereopair. In addition, theground truth maps that we used were obtained using ahigh-resolution laser range scanner. The ground truthmaps have relatively fine disparity resolution over bothsmooth and depth-textured regions.

Next, we discuss the influence of poor disparity estima-tion performance on 3D stereo QA. The performance of the“cyclopean” model using ground truth disparity, estimateddisparity, and no disparity information are reported inTable 9. Table 9 shows that there is no significant differ-ence in the performance attained using the ground truthand estimated disparities, although the performance of thevery simple SAD-based stereo algorithm is slightly lowerthan the other two stereo algorithms. All three signifi-cantly outperform the no-disparity case indicating thatestimated disparities provide useful information whenpredicting the quality of the stereo 3D images in thedatabase. These results suggest that we should not usebad-pixel rate to evaluate stereo algorithms in the contextof 3D image quality assessment algorithm design. Notethat the depth signal in human vision occupies a muchnarrower bandwidths than the luminance spatial channel[52–54], suggesting that a low-resolution disparity mapmaybe adequate for the task of 3D quality assessment.

4.2.3. Comparison with existing 3D QA modelsGorley and Holliman [13] proposed a PSNR-based 3D

stereo QA model that does not include depth. Benoit et al.[15] proposed a SSIM-based stereo QA model operating onboth stereopairs and disparity maps. You et al. [19] applied

Table 8Mean bad pixel rate value on 360 distorted stereopairs with standarddeviation (inside the bracket) for three stereo algorithms.

SAD SSIM Klaus

Bad-pixel rate 79.8% (9.24) 79.52% (10.7) 78.04% (11.83)

Page 11: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Fig. 11. Depth estimation using SSIM-based stereo algorithm on noised distorted stereo pairs. Free-fuse the noisy stereo image to see a 3D image.

Table 9SROCC, LCC, and RMSE relative to human subjective scores attained by“cyclopean” model using disparity maps computed using different stereoalgorithms.

Stereo algorithm SROCC LCC RMSE

Ground truth 0.901 0.907 10.2SAD 0.876 0.885 11.29SSIM 0.893 0.901 10.58Klaus 0.890 0.896 10.80No depth information 0.817 0.824 13.73

Table 10SROCC, LCC, and RMSE relative to human subjective scores attained byseveral 3D QA models using SSIM-based stereo algorithm.

Algorithm SROCC LCC RMSE

Proposed (MS-SSIM) 0.893 0.901 10.58Baseline (MS-SSIM) 0.780 0.784 15.09Benoit [15] 0.728 0.745 16.2You [19] 0.784 0.797 14.66Hewage [17] 0.496 0.55 20.29Gorley [13] 0.158 0.511 20.88

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–1155 1153

a variety of 2D QA models on stereopairs and disparitymap and tried a number of ways to combine the predictedquality scores from stereopairs and disparity maps intopredicted quality scores. Their best result is also SSIM-based. Hewage and Martini [17] proposed a PSNR-basedreduced-reference stereo quality model utilizing disparity.In our simulations, since some of these algorithms requireestimated disparity maps from both reference and teststereopairs, we used the SSIM-based stereo algorithm tocreate disparity maps.

Table 10 shows the performances of these 3D QAalgorithms as compared with the “cyclopean” model. The“cyclopean” model using MS-SSIM delivers the highestperformance, followed by the model proposed by Youet al., which yields no significant difference relative tothe performance of left-right averaged 2D QA using MS-SSIM. The performances of the other three algorithms arelower than this 2D baseline. This is another powerfuldemonstration of the importance of accounting for bino-cular rivalry when conducting stereoscopic QA.

4.2.4. Testing with other 3D image datasetTo the best of our knowledge, the number of publicly

available stereo 3D image quality dataset is very limited

[15,18,55]. Among these three datasets, only the MICT stereoimage database [18] includes asymmetrically distorted stereoimages. However, the MICT database is comprised only ofdistorted stereo images and the ground truth disparity mapsare not available. The MICT stereo image database has 480JPEG distorted stereo images and 10 pristine stereo images.The distorted stereo images include both asymmetricallyand symmetrically JPEG distorted stereo images. However,a double stimulus impairment scale (DSIS) protocol and adiscrete scale were used in the subjective study. Subjectswere asked to assess the annoyance they experiencedwhen viewing each distorted stereo image pair against thesimultaneously displayed reference image by choosing arating among the following five options: 5¼ imperceptible,4¼perceptible but not annoying, 3¼slightly annoying, 2¼annoying and 1¼very annoying. The display they used was avery small (10-in.) auto-stereoscopic display, and the viewingdistance was not provided.

The performance numbers (SROCC, LCC, and MSE) of 2Dand 3D QA models on the MICT database are shown inTable 11. From Table 11, it is clear that the 2D FR modelMS-SSIM delivers the best performance among all thecompared QA models on the MICT dataset. Neither ourcyclopean model nor other 3D QA models outperforms 2D

Page 12: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

Table 11Performance numbers tested against MICT database. Italicized algorithmsare 2D IQA algorithm, others are 3D IQA algorithms.

Algorithm SROCC LCC MSE

2D PSNR 0.586 0.554 0.9712D SSIM 0.846 0.862 0.5912D MS-SSIM 0.935 0.935 0.415Benoit 0.902 0.910 0.483You 0.857 0.864 0.586Gorley 0.065 −0.022 1.166Hewage 0.625 0.623 0.912Cyclopean MS-SSIM 0.862 0.864 0.587

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–11551154

MS-SSIM on the MICT dataset. The discrepancy betweenthe experimental results of our dataset and the results ofthe MICT dataset may be caused by the poor performanceof disparity estimation algorithm or on the design orenvironment of the human study. However, further clar-ification is not feasible unless the ground truth disparitymaps of the MICT dataset become available.

5. Conclusion and future work

We presented a new framework for conducting automaticobjective 3D QA that delivers highly competitive performance,with a clear advantage when left-right distortion asymmetriesare present. The design of the framework is motivated bystudies on the perception of distorted stereoscopic images,and recent theories of binocular rivalry. The “cyclopean” 3DQAmodel that we derived was tested on the LIVE Asymmetric3D Image Quality Database, and found to significantly outper-form conventional 2D QA models and well-known 3D QAmodels. The impact of the stereo algorithm used to conduct3D QA was also discussed. We also found that a low-complexity SSIM-based stereo algorithm performs quite wellfor estimating disparity in the “cyclopean” algorithm in thesense that a high level of 3D QA performance is maintained.

An important contribution of this work is the demon-stration that accounting for binocular rivalry can greatlyimprove the performance of 3D QA models. Indeed, most ofthe advantage conveyed by the “cyclopean” model wasobserved on asymmetric distorted stereopairs. The frame-work can, therefore, ostensibly be used to evaluate thequality of stereo content that has been compressed using amixed resolution coding technique [56,57]. Compressedstereo content that is transmitted over the wireless Internetmay be subjected to other asymmetric distortions as well.

To further advance the performance of current 3D QAmodels, we think that the effect of depth masking anddepth quality needs to be further studied and addressed.Regarding depth masking, our prior work [29] revealed nodepth masking effect when viewing distorted stereopairs.However, we do not regard the results of our prior study tobe universal and there remain other distortions to bestudied. Furthermore, while we did not find depth maskingof distortions, we did find evidence of facilitation whichmay prove relevant to 3D QA.

Regarding the role of computed disparity, prior modelsutilizing disparity maps derived from reference and teststereopairs have generally failed to deliver better QA

performance than 2D QA models on the individual stereoimages. Of course, the disparity cue is not the only oneused by the human visual system to perceive depth. Forexample, monocular cues such as occlusion, relative size,texture gradient, perspective distortion, lighting, shading,and motion parallax [58] all affect the perception of depth.It is not yet clear how the brain integrates all these cues toproduce an overall sensation of depth [59].

The influences of distortions on perceived depth qualityalso remain an open question. While Seuntiens et al. [9]claimed that JPEG encoding has no effect on perceiveddepth, other recent research suggested that perceiveddepth quality is affected by both blur and white noisedistortion, although the influence of distortion on per-ceived depth is less than the influence on perceived imagequality [60]. Another recent study showed that, whenviewing stereoscopic videos compressed by an H.264/AVC encoder using a range of QP values, perceived depthquality remained constant for some subjects, but variedwith perceived image quality [61] for others. Subjectagreement on perceived depth quality was much lowerthan on perceived image quality. Clearly, more research ismerited on how perceived depth quality is affected bydifferent distortion types, and on what kinds of depth cuesare most strongly correlated with the reduced quality ofdepth perception when viewing distorted stereopairs.

References

[1] C. Wheatstone, Contributions to the physiology of vision. Part thefirst. On some remarkable, and hitherto unobserved phenomena ofbinocular vision, Philosophical Transactions of the Royal Society ofLondon 128 (1838) 371–394.

[2] List of 3D Movies. URL ⟨http://en.wikipedia.org/wiki/List_of_3-D_films⟩.[3] ESPN, ESPN 3D Broadcasting Schedule. URL ⟨http://espn.go.com/3d/

schedule.html⟩.[4] S.J. Daly, R.T. Held, D.M. Hoffman, Perceptual issues in stereoscopic

signal processing, IEEE Transactions on Broadcasting 57 (2) (2011)347–361.

[5] M.T.M. Lambooij, W.A. Ijsselsteijn, I. Heynderickx, Visual discomfort instereoscopic displays: a review, Proceedings of SPIE 6490 (2007) 17.

[6] W.J. Tam, F. Speranza, S. Yano, K. Shimono, H. Ono, Stereoscopic 3D-TV: visual comfort, IEEE Transactions on Broadcasting 57 (2) (2011)335–346.

[7] N.A. Valius, Stereoscopy, Focal P., New York, London, 1966.[8] T. Shibata, J. Kim, D.M. Hoffman, M.S. Banks, The zone of comfort:

predicting visual discomfort with stereo displays, Vision 11 (8)(2011) 1–29.

[9] P. Seuntiens, L. Meesters, W. Ijsselsteijn, Perceived quality of com-pressed stereoscopic images: effects of symmetric and asymmetricJPEG coding and camera separation, ACM Transactions on AppliedPerception 3 (2) (2006) 95–109.

[10] D.V. Meegan, L.B. Stelmach, W.J. Tam, Unequal weighting of mono-cular inputs in binocular combination: implications for the com-pression of stereoscopic imagery, Journal of ExperimentalPsychology: Applied 7 (2001) 143–153.

[11] W.J. Tam, L.B. Stelmach, P.J. Corriveau, Psychovisual aspects of view-ing stereoscopic video sequences [3295–32], Proceedings of SPIE(3295) (1998) 226–235.

[12] S.L.P. Yasakethu, C.T.E.R. Hewage, W.A.C. Fernando, A.M. Kondoz,Quality analysis for 3D video using 2D video quality models, IEEETransactions on Consumer Electronics 54 (4) (2008) 1969–1976.

[13] P. Gorley, N. Holliman, Stereoscopic image quality metrics andcompression, Proceedings of SPIE 6803 (2008) 05.

[14] Z. Zhu, Y. Wang, Perceptual distortion metric for stereo video qualityevaluation, WSEAS Transactions on Signal Processing 5 (7) (2009)241–250.

[15] A. Benoit, P. Le Callet, P. Campisi, R. Cousseau, Quality assessment ofstereoscopic images, EURASIP Journal on Image and Video Proces-sing 2008 (2009) 1–13.

Page 13: Signal Processing: Image Communicationlive.ece.utexas.edu/publications/2013/Ming 3D FR IQA SPIC.pdfsubjective 3D QA on MPEG compressed video content can be reduced to conducting 2D

M.-J. Chen et al. / Signal Processing: Image Communication 28 (2013) 1143–1155 1155

[16] Y. Jiachen, H. Chunping, Z. Yuan, Z. Zhuoyun, G. Jichang, Objectivequality assessment method of stereo images, in: 3DTV Conference:The True Vision—Capture, Transmission and Display of 3D Video,2009, pp. 1–4.

[17] C. Hewage, M. Martini, Reduced-reference quality metric for 3Ddepth map transmission, in: 3DTV Conference: The True Vision—Capture, Transmission and Display of 3D Video, 2010, pp. 1–4, doi:http://dx.doi.org/10.1109/3DTV.2010.5506205.

[18] R. Akhter, J. Baltes, Z. M. Parvez Sazzad, Y. Horita, No-referencestereoscopic image quality assessment, Proceedings of Proc. SPIE7524, Stereoscopic Displays and Applications XXI, 75240T (February24, 2010); http://dx.doi.org/10.1117/12.838775.

[19] J. You, L. Xing, A. Perkis, X. Wang, Perceptual Quality Assessment forStereoscopic Images Based on 2D Image Quality Metrics andDisparity Analysis, 2010.

[20] A. Maalouf, M.-C. Larabi, Cyclop: A stereo color image qualityassessment metric, in: IEEE International Conference on Acoustics,Speech and Signal Processing, 2011, pp. 1161–1164.

[21] P. Seuntiens, Visual Experience of 3D TV, 2006.[22] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-

frame stereo correspondence algorithms, International Journal ofComputer Vision 47 (2002) 7–42.

[23] R. Bensalma, M.-C. Larabi, A perceptual metric for stereoscopicimage quality assessment based on the binocular energy, Multi-dimensional Systems and Signal Processing (2012) 1–36.

[24] Z. Wang, E.P. Simoncelli, A.C. Bovik, Multiscale structural similarityfor image quality assessment, in: Asilomar Conference on Signals,Systems and Computers, vol. 2, 2003, pp. 1398–1402.

[25] H. Sheikh, M. Sabir, A. Bovik, A statistical evaluation of recent fullreference image quality assessment algorithms, IEEE Transactionson Image Processing 15 (11) (2006) 3440–3451, http://dx.doi.org/10.1109/TIP.2006.881959.

[26] X. Wang, S. Kwong, Y. Zhang, Considering binocular spatial sensi-tivity in stereoscopic image quality assessment, in: Visual Commu-nications and Image Processing (VCIP), 2011, pp. 1–4, doi:http://dx.doi.org/10.1109/VCIP.2011.6116015.

[27] P. lan Howard, B.J. Rogers, Binocular Vision and Stereopsis, OxfordUniversity Press, 1995.

[28] S. Ryu, D.-H. Kim, K. Sohn, Stereoscopic image quality metric basedon binocular perception model, in: IEEE International Conference onImage Processing.

[29] M.-J. Chen, A.C. Bovik, L.K. Cormack, Study on distortion conspicuityin stereoscopically viewed 3D images, in: IEEE 10th IVMSP Work-shop, 2011, pp. 24–29.

[30] A.B. Watson, J.A. Solomon, Model of visual contrast gain control andpattern masking, Journal of the Optical Society of America 14 (9)(1997) 2379–2391, http://dx.doi.org/10.1364/JOSAA.14.002379.

[31] R. Blake, D.H. Westendorf, R. Overton, What is suppressed duringbinocular rivalry? Perception 9 (2) (1980) 223–231.

[32] I.T. Kaplan, W. Metlay, Light intensity and binocular rivalry, Journalof Experimental Psychology 67 (1) (1964) 22–26.

[33] P. Whittle, Binocular rivalry and the contrast at contours, QuarterlyJournal of Experimental Psychology 17 (3) (1965) 217–226.

[34] W.J.M. Levelt, On Binocular Rivalry, Mouton, The Hague, Paris, 1968.[35] M. Fahle, Binocular rivalry: suppression depends on orientation and

spatial frequency, Vision Research 22 (7) (1982) 787–800.[36] N.K. Logothetis, J.D. Schall, Neuronal correlates of subjective visual

perception, Science (New York, NY) 245 (4919) (1989) 761–763.[37] D.A. Leopold, N.K. Logothetis, Activity changes in early visual cortex

reflect monkeys' percepts during binocular rivalry, Nature 379(6565) (1996) 549–553.

[38] D. Alais, R. Blake, Grouping visual features during binocular rivalry,Vision Research 39 (26) (1999) 4341–4353.

[39] R. Blake, N.K. Logothetis, Visual competition, Nature ReviewsNeuroscience 3 (1) (2002) 13–21.

[40] D.J. Field, A. Hayes, R.F. Hess, Contour integration by the humanvisual system: evidence for a local “association field”, VisionResearch 33 (2) (1993) 173–193.

[41] M.K. Kapadia, M. Ito, C.D. Gilbert, G. Westheimer, Improvement invisual sensitivity by changes in local context: parallel studies inhuman observers and in V1 of alert monkeys, Neuron 15 (4) (1995)843–856.

[42] B. Julesz, Foundations of Cyclopean Perception, University of ChicagoPress, 1971.

[43] A. Klaus, M. Sormann, K. Karner, Segment-based stereo matchingusing belief propagation and a self-adapting dissimilarity measure,in: Proceedings of the International Conference on Pattern Recogni-tion, vol. 3, 2006, pp. 15–18.

[44] D.J. Field, Relations between the statistics of natural images and theresponse properties of cortical-cells, Journal of the Optical Society ofAmerica 4 (12) (1987) 2379–2394.

[45] C.-C. Su, A.C. Bovik, L.K. Cormack, Natural scene statistics of colorand range, in: IEEE International Conference on Image Processing.

[46] C.W. Tyler, Stereoscopic depth movement: two eyes less sensitivethan one, Science (New York, NY) 174 (4012) (1971) 958–961.

[47] C. Schor, I. Wood, J. Ogawa, Binocular sensory fusion is limited byspatial resolution, Vision Research 24 (7) (1984) 661–665.

[48] RIEGL, Riegl vz-400 3D Terrestrial Laser Scanner, URL ⟨http://rieglusa.com/products/terrestrial/vz-400/index.shtml⟩.

[49] I.T.U.R. Assembly, U. International Telecommunication, Methodologyfor the Subjective Assessment of the Quality of Television Pictures,International Telecommunication Union, Geneva, Switzerland, 2003.

[50] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image qualityassessment: from error visibility to structural similarity, IEEETransactions on Image Processing 13 (4) (2004) 600–612.

[51] H.R. Sheikh, A.C. Bovik, G. de Veciana, An information fidelitycriterion for image quality assessment using natural scene statistics,IEEE Transactions on Image Processing 14 (12) (2005) 2117–2128.

[52] F.W. Campbell, D.G. Green, Optical and retinal factors affecting visualresolution, Journal of Physiology 181 (1965) 576–593.

[53] B.N. Vlaskamp, G. Yoon, M.S. Banks, Neural and optical constraintson stereoacuity, in: Perception 37 ECVP Abstract Supplement, 2008,p. 2.

[54] F. Allenmark, J. Read, Spatial stereoresolution, Journal of Vision 9 (8)(2009) 262.

[55] A.K. Moorthy, C. Su, A. Mittal, A.C. Bovik, Subjective evaluation ofstereoscopic visual quality, in: Signal Processing: Image Commu-nication, Special Issue on Biologically Inspired Approaches for VisualInformation Processing and Analysis, to be published.

[56] M.G. Perkins, Data compression of stereopairs, IEEE Transactions onCommunications 40 (4) (1992) 684–696.

[57] A. Vetro, A.M. Tourapis, K. Muller, C. Tao, 3D-TV content storage andtransmission, IEEE Transactions on Broadcasting 57 (2) (2011)384–394.

[58] R. Sekuler, R. Blake, Perception, A.A. Knopf, New York, 1985.[59] J. Burge, M.A. Peterson, S.E. Palmer, Ordinal configural cues combine

with metric disparity in depth perception, Vision 5 (6), Journal ofVision June 22, 2005, http://dx.doi.org/10.1167/5.6.5.

[60] M. Lambooij, W. Ijsselsteijn, D.G. Bouwhuis, I. Heynderickx, Evalua-tion of stereoscopic images: beyond 2D quality, IEEE Transactions onBroadcasting 57 (2) (2011) 432–444.

[61] M.-J. Chen, D.-K. Kwon, A.C. Bovik, Study of subject agreement onstereoscopic video quality, in: Proceedings of the IEEE SouthwestSymposium on Image Analysis and Interpretation.