On the Impact of the Error Measure Selection in Evaluating Disparity Maps

On the Impact of the Error Measure Selection in Evaluating Disparity

MapsIvan Cabezas, Victor Padilla, Maria Trujillo and

Margaret [email protected]

June 27th 2012World Automation Congress, ISIAC, Puerto Vallarta - Mexico

Slide 2

Multimedia and Vision Laboratory

MMV is a research group of the Universidad del Valle in Cali, Colombia

Ivan MariaVictor Margaret

On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, MexicoMultimedia and Vision Laboratory Research: http://mmv-lab.univalle.edu.co

CameraSystem

3D World

2D Images

InverseProblem

OpticsProblem

Slide 3

Content

Stereo Vision Application Domains The Impact of Inaccurate Disparity Estimation Quantitative Evaluation Commonly Used Evaluation Measures Error Measure Function Error Measures Purpose and Meaning Research Problem Comparative Performance Scenario

Middlebury's Evaluation Model A* Evaluation Model Research Questions Algorithm to Measure the Consistency Consistency According to Evaluation Models

Conclusions

On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico

Slide 4

Stereo Vision

The stereo vision problem is to recover the 3D structure of a scene

3D ModelStereo Images

Disparity Map

Left Right

Correspondence Algorithm

ReconstructionAlgorithm

Yang Q. et al., Stereo Matching with Colour-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling, IEEE PAMI 2009Scharstein D., and Szeliski R., High-accuracy Stereo Depth Maps using Structured Light, CVPR 2003On the Impact of the Error Measure Selection in Evaluating Disparity maps, WAC – ISIAC, 2012, Puerto Vallarta, Mexico

Points Disparity Values

d: P L 01

dmax

.

.

.

23

p1p2

.

.

.

p3

pn

P L

P

CrCl

πl πrpl pr

Bf

Z

Slide 5

Applications Domains

3D recovering has multiple application domains


Whitehorn M., Vincent T., Debrunner C. and Steele J., Stereo Vision on LHD Automation References, IEEE, Trans on Industry Apps., 2003Van der Mark W., and Gavrila D., Real-Time Dense Stereo for Intelligent Vehicles, IEEE Trans. On Intelligent Transportation Systems, 2006Point Grey Research Inc., www.ptgrey.com

Slide 6

The Impact of Inaccurate Disparity Estimation

Disparity is the distance between corresponding points

Trucco, E. and Verri A., Introductory Techniques for 3D Computer Vision, Prentice Hall 1998

Accurate Disparity Estimation Inaccurate Disparity Estimation

pr’

P ’

P

CrCl

πl πrpl pr

B

f

Z


P

CrCl

πl πrpl pr

B

f

ZZ ’

Slide 7

Quantitative Evaluation

Szeliski, R., Prediction Error as a Quality Metric for Motion and Stereo, ICCV 2000Kostliva, J., Cech, J., and Sara, R., Feasibility Boundary in Dense and Semi-Dense Stereo Matching, CVPR 2007Tomabari, F., Mattoccia, S., and Di Stefano, L., Stereo for robots: Quantitative Evaluation of Efficient and Low-memory Dense Stereo Algorithms, ICCARV 2010Cabezas, I. and Trujillo M., A Non-Linear Quantitative Evaluation Approach for Disparity Estimation, VISAPP 2011Cabezas, I. Trujillo M., and Florian M., An Evaluation Methodology for Stereo Correspondence Algorithms, VISAPP 2012

The use of a methodology allows to:

Assert specific components and procedures

Tune algorithm's parameters

Measure the progress in the field


Slide 8

Commonly Used Evaluation Measures

There are different evaluation measures


Sigma Z Error, SZE

Cabezas, I., Padilla, V., and Trujillo M., A Measure for Accuracy Disparity Maps Evaluation, CIARP 2011

Slide 9

Error Measure Function


Yang Q. et al., Stereo Matching with Colour-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling, IEEE PAMI 2008Scharstein D., and Szeliski R., High-accuracy Stereo Depth Maps using Structured Light, CVPR 2003

all

nonocc

disc

Measure nonocc all discMAE 0,41 1,48 0,70

MSE 1,48 33,97 4,25

MRE 0,01 0,03 0,02

BMP 2,90 8,78 7,79

SZE 71,39 341,55 37,86

Estimated Ground-truth

Test Bed

Error Criteria

Evaluation Measures

× ×

Slide 10

Error Measures Purpose and Meaning


In practice, different error measures are used for a same purpose: find a distance between estimated and ground-truth disparity data

They have different meaning, as well as different properties

Slide 11

Research Problem


The use of different error measures may produce contradictories score errors

Scharstein, D. and Szeliski, R., A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 2002Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012

RDPADCensus

Algorithms

Teddy

Cones

Slide 12

Comparative Performance Scenario


Four stereo image pairs: Tsukuba, Venus, Teddy, Cones

Three error criteria: nonocc, all, disc

112 Stereo Correspondence Algorithms

Two evaluation models: Middlebury and A* k: a threshold for determining the top-performer

algorithms in the Middlebury's evaluation model

Slide 13

Middlebury’s Methodology Evaluation Model

Compute Error Measures

Algorithm nonocc all discObjectStereo 2.20 1 6.99 2 6.36 1

GC+SegmBorder 4.99 5 5.78 1 8.66 5PUTv3 2.40 2 9.11 5 6.56 2

PatchMatch 2.47 3 7.80 3 7.11 3ImproveSubPix 2.96 4 8.22 4 8.55 4

Algorithm Average Rank

FinalRank

ObjectStereo 1.33 1

PatchMatch 3.00 2PUTv3 3.33 3

GC+SegmBorder 3,66 4ImproveSubPix 4.00 5

Apply Evaluation Model

Algorithm nonocc all discObjectStereo 2.20 6.99 6.36

GC+SegmBorder 4.99 5.78 8.66PUTv3 2.40 9.11 6.56

PatchMatch 2.47 7.80 7.11ImproveSubPix 2.96 8.22 8.55

Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012

Middlebury’sEvaluation Model

…


Slide 14

A* Evaluation Model

The A* evaluation model performs a partitioning of the stereo algorithms under evaluation, based on the Pareto Dominance relation

Compute Error Measures

Algorithm nonocc all discObjectStereo 2.20 6.99 6.36

GC+SegmBorder 4.99 5.78 8.66PUTv3 2.40 9.11 6.56

PatchMatch 2.47 7.80 7.11ImproveSubPix 2.96 8.22 8.55

Algorithm nonocc all disc SetObjectStereo 2.20 6.99 6.36 A*

GC+SegmBorder 4.99 5.78 8.66 A*PUTv3 2.40 9.11 6.56 A’

PatchMatch 2.47 7.80 7.11 A’ImproveSubPix 2.96 8.22 8.55 A’

Apply Evaluation Model

, GC+SegmBorder

PatchMatch

ObjectStereo

PUTv3 ImproveSubPix,,


…

A* Evaluation Model

Slide 15

Research Questions


What is the impact of using an error measure instead of other? Different evaluation results are obtained using different evaluation measures

Scharstein, D. and Szeliski, R., A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms, IJCV 2002Scharstein, D. and Szeliski, R., http://vision.middlebury.edu/stereo/eval/, 2012

Middlebury's Model A* Model

Slide 16

Automati c is computed without human intervention

Reliable I has to operate without being influenced by external factors, and in a

deterministic way

Meaningful is intended for a particular purpose, has a concise interpretation and does not lead to

ambiguous results

Unbiased is capable of accomplish the measurements for which is was conceived, and its use allow to

perform impartial comparisons

Consistent The scores produced by an error measure should be compatible with produced scores by another error

measure with a common particular purpose

Research Questions (ii)

How does an error measure have to be choose ? A characterisation of error measures may serve as selection criteria An error measure:

AUTOMATIC

RELIABLE

CONSISTENT

MEANINGFUL

UNBIASED


Slide 17

Algorithm to Measure the Consistency


Consistency is measured by determining the percentages of agreements in obtained results by varying the used error measure

Slide 18

Consistency According to Evaluation Models


The MRE, followed by the MSE error measures shown the highest percentages of consistency using the Middlebury's model

The SZE, followed by the MRE error measures shown the highest percentages of consistency using the A* model

Middlebury's Model A* Model

Slide 19

Conclusions


Using the Middlebury’s evaluation model the MRE and the MSE shown a high consistency

Using the A* evaluation model the SZE and the MRE shown a high consistency

The BMP shown a low consistency in both used evaluation models

A characterisation of error measure was presented in order to support the selection of an error measure

It includes the following attributes: automatic, reliable, meaningful, unbiased, and consistent

Experimental evaluation was focused on measuring consistency

The selection of an error measure is not a trivial issue since it impacts on obtained results during a disparity maps evaluation process

On the Impact of the Error Measure Selection in Evaluating Disparity

MapsIvan Cabezas, Victor Padilla, Maria Trujillo and

Margaret [email protected]

June 27th 2012World Automation Congress, ISIAC, Puerto Vallarta - Mexico

On the Impact of the Error Measure Selection in Evaluating Disparity Maps

Education

On the Impact of the Error Measure Selection in Evaluating Disparity Maps