Laboratorio di M acchine I ntelligenti per il riconoscimento di V ideo, I mmagini e A udio M I VI A Stereo Vision for Obstacle Detection: a Graph-Based.

Laboratorio di Macchine Intelligenti per il riconoscimento di Video, Immagini e Audio

M

I

VI

A

Stereo Vision for Obstacle Detection: Stereo Vision for Obstacle Detection: a Graph-Based Approacha Graph-Based Approach

P. Foggia, Jean-Michel Jolion, A. Limongiello, M. Vento

6th IAPR – TC-15 Workshop onGraph-based Representations

in Pattern Recognition (GbR 2007)Alicante June 11-13, 2007

DIIIE - University of Salerno

INSA - Lyon

Alessandro Limongiello – “Stereo Vision for Obstacle Detection: a Graph-Based Approach” – GbR 07 – Alicante June 11-13, 2007

Automatic analysis of a dynamic video streaming Automatic analysis of a dynamic video streaming acquired from a Mobile Platform for acquired from a Mobile Platform for obstacle detection obstacle detection in in unstructured environmentunstructured environment

Obstacle Detection

Why an hard task:Why an hard task: mechanical mechanical vibrations of the vibrations of the camerascameras light changinglight changing no information about no information about the environmentthe environment no information about no information about the obstaclesthe obstacles low execution timelow execution time


Obstacle Detection


Obstacle Detection is possible after a Obstacle Detection is possible after a goodgood representation of the environmentrepresentation of the environment

A good representation must be A good representation must be related to our goalrelated to our goal

Obstacle Detection

To have a 3D representation of the scene we To have a 3D representation of the scene we consider the consider the Stereo Vision paradigm [Stereo Vision paradigm [3,,4]]

We can obtain information on the deepness of the pixels We can obtain information on the deepness of the pixels starting from two different views of the same scenestarting from two different views of the same scene


Outline

Related works:Related works: A comparison A comparison Open problemsOpen problems

Our approach:Our approach: The rationaleThe rationale The AlgorithmThe Algorithm

ResultsResults

ConclusionsConclusions


Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Traditionally, the researches have applied Traditionally, the researches have applied 3D reconstruction3D reconstruction approaches in the Autonomous Navigation framework, based approaches in the Autonomous Navigation framework, based on a punctual matching between the image pairon a punctual matching between the image pair

For Obstacle detection it is not very important to have a good For Obstacle detection it is not very important to have a good reconstruction of the surfaces, but it is important to identify reconstruction of the surfaces, but it is important to identify adequately the adequately the space occupiedspace occupied by each object by each object

Related Works

A A good taxonomy has been done by Scharstein and Szeliski good taxonomy has been done by Scharstein and Szeliski (IJCV 2002) (IJCV 2002) [[33]] and by Cha Zhang (2002) and by Cha Zhang (2002) [[44]]

Selected Algorithms:Selected Algorithms:Sum of Squared Differences (SSD)Sum of Squared Differences (SSD)

Dynamic Programming (DP)Dynamic Programming (DP)

Graph Cut (GC)Graph Cut (GC)

Dense Features (DF)Dense Features (DF)


Feature-basedFeature-based approaches provide a correspondence approaches provide a correspondence between feature-points (as corners, edges, etc.)between feature-points (as corners, edges, etc.)

Recently, they have been disregarded because they Recently, they have been disregarded because they produce a produce a sparsesparse depth map only for the feature points, depth map only for the feature points, that is not so much useful in real applications that is not so much useful in real applications

They are normally They are normally fastfast and enough and enough stabilestabile in real contexts in real contexts

Area-basedArea-based techniques provide a correspondence techniques provide a correspondence between each point of the image pair, so they produce a between each point of the image pair, so they produce a dense depth map not right necessary in AMRdense depth map not right necessary in AMR

They are generally They are generally time consumingtime consuming

They suppose They suppose strong geometrical constraints strong geometrical constraints (i.e. (i.e. horizontal epipolar line)horizontal epipolar line)

Related Works

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s


Local Local approaches provide the solution for a pixel without approaches provide the solution for a pixel without considering the solution on the rest of the imageconsidering the solution on the rest of the image

They are They are fasterfaster than the Global approaches than the Global approaches

They have problems in case of They have problems in case of repetitiverepetitive and and uniformuniform patternspatterns

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Related Works

Global Global approaches provide the solution for the whole approaches provide the solution for the whole image trying to optimize the global solution image trying to optimize the global solution

They are They are slowerslower than the Local approaches (no Real than the Local approaches (no Real Time)Time)

They are more robust in case of They are more robust in case of repetitiverepetitive and and uniformuniform patterns and in case of little local perturbationspatterns and in case of little local perturbations


ApproachApproach Repetitive/ Repetitive/ UniformUniform

ConstraintConstraints on the s on the

inputinput

Low Low Execution Execution

TimeTime

Local Local perturbationperturbation

SSDSSD NoNo YesYes YesYes NoNo

DPDP middlemiddle YesYes middlemiddle middlemiddle

GCGC YesYes YesYes NoNo YesYes

DFDF YesYes YesYes middlemiddle YesYes

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Related Works: a comparison


Punctual matchingPunctual matching between the image pair is unsuitable in between the image pair is unsuitable in some realistic framework (texture-less regions)some realistic framework (texture-less regions)

The motion of the robot produces mechanical vibrations of the The motion of the robot produces mechanical vibrations of the cameras with a consequent cameras with a consequent loss of epipolar line constraintloss of epipolar line constraint

????

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Related Works: open problems


Left and right image could have different acquiring conditions Left and right image could have different acquiring conditions (lighting, focus, digitalization noise etc.)(lighting, focus, digitalization noise etc.)

?

It is typically It is typically time consumingtime consuming: matching is performed for each : matching is performed for each pixel of the image and good results are possible defining time pixel of the image and good results are possible defining time consuming optimization functionsconsuming optimization functions

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Related Works: open problems


Our approach: The Rationale

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

?

?

In some cases we can not have enough information to find In some cases we can not have enough information to find the correspondence looking just at a single pixel. the correspondence looking just at a single pixel.

For example, pixels inside For example, pixels inside homogeneous areashomogeneous areas, or pixels suffering , or pixels suffering from from perspective or photometric distortions, digitalization errorsperspective or photometric distortions, digitalization errors, , vibration of the camerasvibration of the cameras..

Our idea is to face the stereo matching problem as a Our idea is to face the stereo matching problem as a matching between homologous regionsmatching between homologous regions (instead of pixels)(instead of pixels)

OK

?


Our approach: The Rationale

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

We determinate the disparity value for the whole region, We determinate the disparity value for the whole region, so we define an so we define an approximation of disparity propertyapproximation of disparity property::

the the horizontal displacementhorizontal displacement between the regions between the regions

We start from the We start from the projection of a regionprojection of a region (and not of a (and not of a point) on the stereo pairpoint) on the stereo pair

Cl

Cr

Il

Ir

Rl

Rr

R

f


Our approach: The Algorithm

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

?

The algorithm is based on a graph representation of The algorithm is based on a graph representation of the stereo pair and a stereo registration of regions the stereo pair and a stereo registration of regions using graph matchingusing graph matching


Our approach: The Algorithm The left and right images are The left and right images are segmentedsegmented and each area and each area identifies a node of a graphidentifies a node of a graph

A A bipartite graph matchingbipartite graph matching between the two graphs is between the two graphs is computed in order to match each computed in order to match each area of the left image with only area of the left image with only one area of the right imageone area of the right image

This process yields a list of This process yields a list of reliably matched areasreliably matched areas and a list and a list of so-called of so-called don’t care areasdon’t care areas..

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Segmentation

Left Image Right Image

Segmentation

Graph Representation

Graph Representation

Recursive Weighted Bipartite Graph Matching

Disparity Computation

disparity map

matched areas

don’t care areas

performancemap

The Outputs of the algorithm The Outputs of the algorithm are the are the disparity mapdisparity map and the and the performance mapperformance map



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

SegmentationSegmentation The segmentation process is The segmentation process is simplesimple and and very fastvery fast: we are : we are

not interested in a fine segmentation (multi-threshold not interested in a fine segmentation (multi-threshold segmentation)segmentation)

We have We have similar segmentssimilar segments between the left and right between the left and right images because:images because:

the stereo imagesthe stereo images represent two different view points of the represent two different view points of the same scenesame scene

we process an adaptive quantization for each image we process an adaptive quantization for each image according to its lighting conditionaccording to its lighting condition

The segmentation process does The segmentation process does not influence the rest of not influence the rest of algorithmalgorithm, because a recursive definition of the matching , because a recursive definition of the matching and a performance function guarantee a recovery of some and a performance function guarantee a recovery of some segmentation problemssegmentation problems



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Graph RepresentationGraph Representation

Each 4-connected area from segmented image is a node of Each 4-connected area from segmented image is a node of an attributed graph:an attributed graph:

colMeancolMean: the RGB mean value of the blob : the RGB mean value of the blob

sizesize: the number of pixels in a connected area: the number of pixels in a connected area

coordcoord: the coordinates of the box containing the blob: the coordinates of the box containing the blob

blobMaskblobMask: a binary mask for the pixels belonging to the blob: a binary mask for the pixels belonging to the blob

Let GLet GL L = {N= {N00LL,…,N,…,Nnn

LL} and GR = {N} and GR = {N00RR,…,N,…,Nmm

RR} be the } be the two two

graphsgraphs representing the left and right image respectively representing the left and right image respectively

NnL

N0L

Nm

R

N0R

N1R…

…



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Recursive Weighted Bipartite Graph MatchingRecursive Weighted Bipartite Graph Matching

3

posCostmCostidcolCostCost

256*3

_,_,_

bmgmrmi

Ri

Li colMeancolMean

colCost

heightwidth

jiji

mCostidlefttopj

rightbottomi

RRLL

,,

)()(

)(*2

,,,

heightwidth

ii

posCostlefttoprightbottomi

RL

Each edge (NEach edge (NiiLL, N, Njj

RR) of the ) of the

complete bipartite graph has a complete bipartite graph has a costcost, depending of color, , depending of color, dimension and position:dimension and position:

The lower is the cost, the The lower is the cost, the more suitable is that edgemore suitable is that edge If the cost of an edge is If the cost of an edge is higher than a higher than a thresholdthreshold, the , the edge is considered unprofitable edge is considered unprofitable and is removed from the graphand is removed from the graph

The matching with the lowest The matching with the lowest cost among the ones with cost among the ones with maximal cardinality is selected maximal cardinality is selected as the as the best solutionbest solution



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Recursive Weighted Bipartite Graph MatchingRecursive Weighted Bipartite Graph Matching

The matching is generally The matching is generally time-consumingtime-consuming

For this reason the search area (that is the subset of For this reason the search area (that is the subset of possible couples of nodes) is bounded by the possible couples of nodes) is bounded by the epipolarepipolar and and disparity bandsdisparity bands

These These constraintsconstraints come from stereo vision geometry, but in come from stereo vision geometry, but in our case they represent a our case they represent a generalizationgeneralization

The The epipolar bandepipolar band is a generalization for epipolar line, that is a generalization for epipolar line, that is the maximum vertical displacement of two corresponding is the maximum vertical displacement of two corresponding nodes nodes

TheThe disparity band disparity band is the maximum horizontal displacement is the maximum horizontal displacement of two corresponding nodes of two corresponding nodes



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

epipolar band

disparity band

max disparity

Two nodes of the right image that do not belong

to the search area (bounded from epipolar

and disparity band)

RL

R



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

RecursiveRecursive Weighted Bipartite Graph MatchingWeighted Bipartite Graph Matching

The graph matching process yields a list of The graph matching process yields a list of reliably matched reliably matched areasareas and a list of so-called and a list of so-called don’t care areasdon’t care areas

The matched areas are considered for the disparity The matched areas are considered for the disparity computationcomputation

The list of the don’t care areas is processed in order to The list of the don’t care areas is processed in order to group group adjacent blobsadjacent blobs in the left and right image and consequently in the left and right image and consequently reduce split and merge artifacts of the segmentation process, reduce split and merge artifacts of the segmentation process, a new matching of these nodes is founda new matching of these nodes is found

The The recursive definitionrecursive definition of this phase assures a reduction of of this phase assures a reduction of the don’t care areas in few steps, but sometimes this process the don’t care areas in few steps, but sometimes this process is not needed because don’t care areas are very smallis not needed because don’t care areas are very small



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Disparity ComputationDisparity Computation

The disparity computation is faced superimposing the The disparity computation is faced superimposing the corresponding nodes until the corresponding nodes until the maximum coveringmaximum covering occurs occurs

The The horizontal displacementhorizontal displacement, corresponding to the best fitting , corresponding to the best fitting of the matched nodes, is the disparity value for the node in of the matched nodes, is the disparity value for the node in the reference imagethe reference image



Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

OutputOutputDisparity Map Graphic Perform

Hole closing using Hole closing using the mean of the the mean of the contourcontour

Enlargement of the Enlargement of the contour for each nodecontour for each node

Two post-filters have been applied:Two post-filters have been applied:


Our approach: The Results

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

Advantages:Advantages: Texture-less regionsTexture-less regions: no problems : no problems

to match uniform regions because to match uniform regions because of region based approach of region based approach

Vibration of cameras: Vibration of cameras: more more robustness because we compute robustness because we compute the disparity matching between the disparity matching between regions and use a generalized regions and use a generalized epipolar line constraintepipolar line constraint

Low Execution TimeLow Execution Time : : no no matching for each pixel; it is matching for each pixel; it is defined a searching area for defined a searching area for graph matching graph matching Left/Right lack of homogeneity: Left/Right lack of homogeneity: the cost function in the WBGM is the cost function in the WBGM is enough independent from local and global perturbation between the enough independent from local and global perturbation between the two imagestwo images

Graph matchingGraph matchingPunctual matchingPunctual matching

OUR (1.7 sec)OUR (1.7 sec)

OUR (1.1 sec)OUR (1.1 sec)DP (2 sec)DP (2 sec)

SSD (<1 sec)SSD (<1 sec)



We report some results obtained on a realistic video We report some results obtained on a realistic video acquired from our mobile platform (100 frames): acquired from our mobile platform (100 frames): camera camera vibration, light changing, uniform obstaclesvibration, light changing, uniform obstacles

The comparison is made between our method and two of The comparison is made between our method and two of the most used methods in the literature: the most used methods in the literature: SSDSSD [ [Kanade et al., PAMI, 99] and ] and SSD MultiscaleSSD Multiscale [ [Konolige et al., 2005Konolige et al., 2005]]

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s



The performance index are:The performance index are:

• recallrecall = = subset of regions correctly detected as obstacles (RI)subset of regions correctly detected as obstacles (RI) obstacle regions in the Ground Truth (RG)obstacle regions in the Ground Truth (RG)

• precisionprecision = = subset of regions correctly detected as obstacles (RI)subset of regions correctly detected as obstacles (RI) detected obstacle regions (RD)detected obstacle regions (RD)

RG

RD

RI

• Relative Distance ErrorRelative Distance Error = = | detected distance – real distance | . | detected distance – real distance | .

real distancereal distance

disparity

lenghtfocalbaselinekZ mpx

/

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s



algorithm recall precisionour 0.886 0.439SSD 0.208929 0.477767

SSDMultiscale 0.469375 0.348906

algorithm Relative Distance Errorour 0.046SSD 0.191169769

SSDMultiscale 0.180161282

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s

OUR APPROACHOUR APPROACH

SSDSSD

SSD MULTISCALESSD MULTISCALE


ConclusionsWe have presented a stereo matching algorithm We have presented a stereo matching algorithm providing a providing a fastfast and and robustrobust detection of object detection of object positions positions insteadinstead of of a a detaileddetailed but slowbut slow reconstruction reconstruction of the 3D sceneof the 3D scene

The algorithm has been experimentally validated The algorithm has been experimentally validated showing an showing an encouraging performanceencouraging performance when compared when compared to the most commonly used matching algorithms, to the most commonly used matching algorithms, especially on especially on real-world imagesreal-world images

Future works are oriented to test our method in Future works are oriented to test our method in outdoor environmentoutdoor environment and to develop a and to develop a temporal temporal coherencecoherence of the solution in the video sequence of the solution in the video sequence

Th

e R

ation

ale

Th

e A

lgo

rithm

Resu

lts

Co

nclu

sion

s

Related

W

ork

s


ReferencesReferences[1] M. Bertozzi, A. Broggi, A. Fascicoli: “Vision-based intelligent vehicles: State of art and perspectives”. Robotics and Autonomous Systems, Vol. 32, pp. 1-16, October 1, 1999.

[2] G. N. DeSouza, A. C. Kak: “Vision for Mobile Robot Navigation: A Survey”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24-2, February 2002.

[3] D. Scharstein, R. Szeliski: “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms”. International Journal of Computer Vision, Vol. 47-1, pp. 7-42, May 2002.

[4] C. Zhang: “A Survey on Stereo Vision for Mobile Robots”. Dept. of Electrical and Computer Engineering, Carnegie Mellon University. 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA. 2002.

[5] O. Faugeras, B. Hotz, H. Mathieu, T. Viéville, Z. Zhang, P. Fua, E. Théron, L. Moll, et al.: “Real Time Correlation-Based Stereo: Algorithm, Implementations and Applications”. INRIA Technical Report 2013, 1993.


ReferencesReferences[6] S. Denasi, C. Lanzone, P. Martinese, G. Pettiti, G. Quaglia, L. Viglione, Real-time system for road following and obstacle detection, in: Proceedings of the SPIE on Machine Vision Applications, Architectures, and Systems Integration III, October 1994, pp. 70–79.

[7] M. Lützeler, E.D. Dickmanns, Road recognition with MarVEye, in: Proceedings of the IEEE Intelligent Vehicles Symposium ’98, Stuttgart, Germany, October 1998, pp. 341–346.

[8] H. C. Longuet-Higgins, “A computer algorithm for reconstruction a scene from two projections”, Nature, vol. 293, pp. 133-135, 1981.

[9] M. E. Spetsakis, J. Aloimonos, “Structure from motion using line correspondences”, International Journal Computer Vision, vol. 4, pp.171-183, 1990.

[10] R.Y. Tsai, T.S. Huang, “Uniqueness and estimation of three dimensional motion parameters of rigid objects with curved surfaces”, IEEE Transactions PAMI, vol. 6, pp. 13-27, 1984.


ReferencesReferences[11] G. Halevy, D. Weinshall: “Motion of disturbances: Detection and tracking of multi-body nonrigid motion”. Machine Vision Application, Vol. 11-3, pp. 122–137, 1999.

[12] B.K.P. Horn: “Robot Vision”. MIT Press, Cambridge, Massachusetts, 1986.

[13] T. Kanade, and M. Okutomi: “A stereo matching algorithm with an adaptive window: theory and experiment”. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 16, pp. 920-932. 1999.

[14] Agrawal, M and Konolige, K and Iocchi, L. Real-time detection of independent motion using stereo, in Proceedings IEEE workshop on visual motion, 2005.

[15] D. Marr, T. Poggio: “A computational theory of human stereo vision”. Proc. R. Soc., Vol. 204-B, pp. 301-328, 1979.

[16] http://mars.sgi.com/default1.html

[17] R. Nevatia, K. Babu: “Linear feature extraction and detection”. Computer Graphics Image Processing, Vol. 13, pp. 257-269, 1980.

Laboratorio di M acchine I ntelligenti per il riconoscimento di V ideo, I mmagini e A udio M I VI A Stereo Vision for Obstacle Detection: a Graph-Based.

Documents

graphbased approach

d reconstruction approaches

good reconstruction

global solution

graphbased representations

case of repetitive

mechanical vibrations

real timethey