Top Banner
Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains Annalisa Milella Institute of Intelligent Systems for Automation (ISSIA), Italian National Research Council (CNR), 70126 Bari, Italy Email: [email protected] Giulio Reina Department of Innovation Engineering, University of Lecce, 73100 Lecce, Italy Email: [email protected] Roland Siegwart Autonomous Systems Lab (ASL), Swiss Federal Institute of Technology Zurich (ETHZ), Zurich, Switzerland Email: [email protected] Abstract— External perception based on vision plays a critical role in developing improved and robust localization algorithms, as well as gaining important information about the vehicle and the terrain it is traversing. This paper presents two novel methods for rough terrain-mobile robots, using visual input. The first method consists of a stereovision algorithm for real-time 6DoF ego-motion estimation. It integrates image intensity information and 3D stereo data in the well-known Iterative Closest Point (ICP) scheme. Neither a-priori knowledge of the motion nor inputs from other sensors are required, while the only assumption is that the scene always contains visually distinctive features which can be tracked over subsequent stereo pairs. This generates what is usually referred to as visual odometry. The second method aims at estimating the wheel sinkage of a mobile robot on sandy soil, based on edge detection strategy. A semi-empirical model of wheel sinkage is also presented referring to the classical terramechanics theory. Experimental results obtained with an all-terrain mobile robot and with a wheel sinkage test bed are presented to validate our approach. It is shown that the proposed techniques can be integrated in control and planning algorithms to improve the performance of ground vehicles operating in uncharted environments. Index Terms—rough terrain-mobile robots, computer vision, vehicle localization, wheel sinkage estimation. I. INTRODUCTION Future cross-country mobile robots will have to explore larger and larger areas, performing difficult tasks, while preserving, at the same time, their safety. This will primarily require advanced sensing and perception capabilities. Vision is our most powerful sense through which we can get knowledge of the environment and interact intelligently with our surroundings. Similarly, mobile robots can take advantage of visual capabilities. Video sensors supply contact-free, precise measurements and are flexible devices that can be easily integrated with multi-sensor robotic platforms. Hence, they represent a potential answer to the need of new and improved perception capabilities for autonomous vehicles [1]. One of the main applications of vision in mobile robotics is localization, i.e. the vehicle’s capability to estimate its pose in the environment. Accurate localization is especially challenging for mobile robots operating in rough terrain situations. Conventional dead reckoning techniques are not well suited to rough terrain, since wheel slipping, sinkage, and sensor drift may cause localization errors that accumulate without bound while the vehicle travels [2], [3], [4], [5]. Conversely, since video sensors are exteroceptive devices, i.e. they acquire information from the robot’s environment, vision-based motion estimates are independent of the knowledge of terrain properties and wheel-terrain interaction. Indeed, like dead reckoning, vision could lead to accumulation of errors; however, it has been proved that, compared to dead reckoning, it allows more accurate results and can be considered as a promising solution to the problem of robust robot positioning in high-slip environments [6], [7], [8]. As a result, in the last few years, several localization systems, usually referred to as visual odometry [9], have been developed that rely on feature tracking algorithms for vehicle motion estimation. Nevertheless, in rough terrain situations, methods to sense the dynamic ill effects occurring at the wheel- terrain interface are highly desirable, since these effects compromise the vehicle’s traction performance and lead to danger of entrapment with consequent mission failure [10]. A key variable in estimating vehicle-terrain interaction is wheel sinkage [11]. The knowledge of the amount of sinkage a wheel is experiencing would allow a better understanding of the effective rolling radius and more accurate position estimate. Sinkage measurements are also valuable for terrain identification according to classical terramechanics theory [12]. Based on “Stereo-Based Ego-Motion Estimation Using Pixel Tracking and Iterative Closest Point,” by Milella A. and Siegwart R., which appeared in the Proceedings of the Fourth IEEE International Conference on Computer Vision Systems 2006, NY, USA, January 2006. © 2006 IEEE. In this paper, two novel vision-based methods for rough terrain mobile robots are developed: a localization JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 49 © 2006 ACADEMY PUBLISHER
13

Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

May 12, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

Annalisa Milella

Institute of Intelligent Systems for Automation (ISSIA), Italian National Research Council (CNR), 70126 Bari, Italy Email: [email protected]

Giulio Reina

Department of Innovation Engineering, University of Lecce, 73100 Lecce, Italy Email: [email protected]

Roland Siegwart

Autonomous Systems Lab (ASL), Swiss Federal Institute of Technology Zurich (ETHZ), Zurich, Switzerland Email: [email protected]

Abstract— External perception based on vision plays a critical role in developing improved and robust localization algorithms, as well as gaining important information about the vehicle and the terrain it is traversing. This paper presents two novel methods for rough terrain-mobile robots, using visual input. The first method consists of a stereovision algorithm for real-time 6DoF ego-motion estimation. It integrates image intensity information and 3D stereo data in the well-known Iterative Closest Point (ICP) scheme. Neither a-priori knowledge of the motion nor inputs from other sensors are required, while the only assumption is that the scene always contains visually distinctive features which can be tracked over subsequent stereo pairs. This generates what is usually referred to as visual odometry. The second method aims at estimating the wheel sinkage of a mobile robot on sandy soil, based on edge detection strategy. A semi-empirical model of wheel sinkage is also presented referring to the classical terramechanics theory. Experimental results obtained with an all-terrain mobile robot and with a wheel sinkage test bed are presented to validate our approach. It is shown that the proposed techniques can be integrated in control and planning algorithms to improve the performance of ground vehicles operating in uncharted environments. Index Terms—rough terrain-mobile robots, computer vision, vehicle localization, wheel sinkage estimation.

I. INTRODUCTION

Future cross-country mobile robots will have to explore larger and larger areas, performing difficult tasks, while preserving, at the same time, their safety. This will primarily require advanced sensing and perception capabilities. Vision is our most powerful sense through which we can get knowledge of the environment and interact intelligently with our surroundings. Similarly, mobile robots can take advantage of visual capabilities. Video sensors supply contact-free, precise measurements

and are flexible devices that can be easily integrated with multi-sensor robotic platforms. Hence, they represent a potential answer to the need of new and improved perception capabilities for autonomous vehicles [1].

One of the main applications of vision in mobile robotics is localization, i.e. the vehicle’s capability to estimate its pose in the environment. Accurate localization is especially challenging for mobile robots operating in rough terrain situations. Conventional dead reckoning techniques are not well suited to rough terrain, since wheel slipping, sinkage, and sensor drift may cause localization errors that accumulate without bound while the vehicle travels [2], [3], [4], [5]. Conversely, since video sensors are exteroceptive devices, i.e. they acquire information from the robot’s environment, vision-based motion estimates are independent of the knowledge of terrain properties and wheel-terrain interaction. Indeed, like dead reckoning, vision could lead to accumulation of errors; however, it has been proved that, compared to dead reckoning, it allows more accurate results and can be considered as a promising solution to the problem of robust robot positioning in high-slip environments [6], [7], [8]. As a result, in the last few years, several localization systems, usually referred to as visual odometry [9], have been developed that rely on feature tracking algorithms for vehicle motion estimation.

Nevertheless, in rough terrain situations, methods to sense the dynamic ill effects occurring at the wheel-terrain interface are highly desirable, since these effects compromise the vehicle’s traction performance and lead to danger of entrapment with consequent mission failure [10]. A key variable in estimating vehicle-terrain interaction is wheel sinkage [11]. The knowledge of the amount of sinkage a wheel is experiencing would allow a better understanding of the effective rolling radius and more accurate position estimate. Sinkage measurements are also valuable for terrain identification according to classical terramechanics theory [12]. Based on “Stereo-Based Ego-Motion Estimation Using Pixel

Tracking and Iterative Closest Point,” by Milella A. and Siegwart R.,which appeared in the Proceedings of the Fourth IEEE InternationalConference on Computer Vision Systems 2006, NY, USA, January2006. © 2006 IEEE.

In this paper, two novel vision-based methods for rough terrain mobile robots are developed: a localization

JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 49

© 2006 ACADEMY PUBLISHER

Page 2: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

algorithm and a method for estimating wheel sinkage. The visual localization algorithm integrates image

intensity information and 3D stereo data using Iterative Closest Point (ICP). The main application of ICP, as originally introduced by Besl and McKay [13], is the registration of digitized data from a rigid object with its idealized geometric model. Here, the potentialities of ICP for vehicle motion estimate are investigated, using stereovision. In algorithm development, two basic problems of ICP were addressed: the failure when dealing with large displacements and its inability to segment input data [13]. Typical solutions rely on odometric information for predicting the displacement between consecutive frames and providing initial motion estimate before ICP registration [14]. Our method, instead, allows overcoming both problems, using the information deriving from a single stereo device, without previous knowledge of the motion. The only assumption is that the scene always contains visually distinctive features, which can be tracked over subsequent images.

Experimental results obtained with an all-terrain rover, the Shrimp mobile robot [15], equipped with a stereo head, are presented. Tests were performed in laboratory environment proving the effectiveness of the proposed method in different contexts.

This paper also presents an innovative algorithm for visual estimation of wheel sinkage for a mobile robot driving across soft soil. We call it the Visual Sinkage Estimation (VSE) module. A semi-empirical model of wheel sinkage is also introduced serving as an analytical means of comparison. The VSE assumes the presence of a camera mounted on the vehicle body, with a field of view containing the wheel–terrain interface. A pattern of equally spaced concentric black circumferences on a white background is attached to the wheel in order to determine the contact angle with the terrain using edge detection.

Experiments performed with a single-wheel test bed are reported that prove the VSE algorithm to be effective under different operating conditions, including non-flat terrain and lighting variations.

Related Work Visual odometry is an emerging and promising

solution to the problem of mobile robot localization. The key idea of visual odometry is that of estimating the motion of the robot by visually tracking landmarks, opportunely selected in the environment, using an on-board camera [9], [16].

In the last years, a number of visual odometry algorithms have been proposed that use either single cameras [6], [16], [17], [18] or stereovision [7], [16], [19], [20]. They mainly differ depending on the feature tracking method adopted and on the transformation applied for estimating the camera motion.

For instance, in [6], the visual module uses a variation of Benedetti and Perona’s algorithm [22] for feature detection, and correlation for feature tracking. Robustness is obtained integrating visual data and Inertial Measurement Unit (IMU) by a Kalman filter. In [7], odometry provides an estimation of the approximate

robot motion that allows selecting a search area for improved feature tracking using a maximum-likelihood formulation for motion computation. In [16], robust visual motion estimation is achieved using preemptive RANSAC [21], followed by iterative refinement.

In this paper, we propose a visual odometry algorithm for real-time 6DoF ego-motion estimation, which integrates image intensity information and 3D stereo data in the well-known Iterative Closest Point (ICP) frame.

ICP is suited for aligning point clouds where the correspondences are not known, and consists of a two-step kernel: the first step searches for corresponding points between the two point clouds based on the nearest neighbors concept; the second step determines the transformation that minimizes the distance between the nearest neighbors. The process is iterated until a convergence criterion is satisfied.

ICP has been extensively studied in literature and many variants have been proposed to improve both accuracy and computational time [23], [24], [25]. Several applications have been developed that use ICP for surface registration and mapping. Most of them employ laser scanners data. However, relatively little work has been published in the domain of ICP-based visual odometry [26], [27]. Approaches using stereo vision and ICP registration can be found in [28], [29] for Simultaneous Localization and Modeling (SLAM), and in [30] for the reconstruction of 3D partial surface models. In this paper, an approach similar to [28] is adopted, using correlation for initial matching and approximate motion estimation, followed by ICP for motion estimate refinement. However, our work is different in that it deals with the visual odometry issue.

The original contribution of the proposed method mainly relies on an efficient combination of various image processing and 3D registration techniques that allows robust outlier rejection in both stereo matching and feature tracking phases. Therefore, accurate motion estimates can be achieved using a few interesting points and preserving real-time constraints.

For mobile robots driving across soft soil, such as sand, loose dirt, or snow, it is critical that the dynamic ill effects occurring at the wheel-terrain interface be taken into account. One the most prevalent of these effects is wheel sinkage [11]. Iagnemma et al. [31] described an online visual sinkage estimation algorithm that relies on the analysis of grayscale intensity along the wheel rim. Assuming that the wheel has a different color than the terrain, the location of the terrain interface is computed as the point of maximum change in intensity. This method is relatively simple and computationally efficient, but it is very sensitive to lighting variations and shadows. Moreover, it is based on the assumption that the wheel has a different gray level than the terrain, which implies previous knowledge of the soil appearance characteristics. Conversely, our method does not require any a priori information about the environment, while preserving computational efficiency.

This paper is structured as follows. Section II presents the visual odometry algorithm. Section III introduces the

50 JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006

© 2006 ACADEMY PUBLISHER

Page 3: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

Time t1

Time t2

Figure 1 Block diagram of the visual odometry algorithm using two consecutive image pairs corresponding to time t and t . 1 2

VSE module and the wheel sinkage model. In Section IV, detailed experimental results and discussions are provided for both methods. Section V concludes the paper.

A. Feature detection The algorithm starts by acquiring a stereo pair and

generating a dense disparity map to obtain 3D points. The SRI Stereo Engine algorithm is employed [32]. It consists of an area correlation-based matching process, followed by a post-filtering operation that uses a combination of a confidence filter and left/right check to reject areas with insufficient texture, where bad matches are very likely to appear. The Shi-Tomasi feature detector [33] is then applied to the left image of the stereo frame to select interesting points. Only the points with an associated high stereo-confidence level 3D point are retained for further processing. Two point clouds are in the end available for each stereo pair: the pixel point cloud and its associated 3D point cloud.

II. VISUAL ODOMETRY USING ITERATIVE CLOSEST POINT

An algorithm for real-time 6DoF ego-motion estimation is presented, which enables a mobile robot to self-localize using only the data acquired by a stereo head mounted on-board. The method employs image intensity information for feature tracking and initial motion estimation, and Iterative Closest Point (ICP) [13], [25] for motion estimate refinement. In algorithm development, two basic problems of standard ICP were taken into account: the susceptibility to gross statistical outliers, and the failure when dealing with large displacements. As an extension of these issues, another drawback of ICP was addressed, i.e. its inability to perform the segmentation of input data points: if data points from two shapes are intermixed and matched against the individual shapes, registration fails [13]. These limitations are intrinsic in ICP basic concept and become particularly restrictive for robot self-localization and navigation purposes, as, while the sensor moves, different parts of the scene become occluded and, conversely, new objects may appear. Therefore, vast regions may be present in only one of two consecutive point clouds, and, if an outlier region is too close to a valid region, there is no possibility for ICP to perform a correct matching process [26].

B. Feature tracking The tracking of visual landmarks between consecutive

frames is performed using an algorithm based on Normalized Cross-Correlation (NCC). NCC allows determining the degree of similarity between two image portions f and w of dimension L × K by means of the coefficient C defined as

( )( ) ( )( )

( )( ) ( )( )2/1

1

0

1

0

22/1

1

0

1

0

2

1

0

1

0

,,

,,

⎥⎦

⎤⎢⎣

⎡−⋅⎥

⎤⎢⎣

⎡−

−⋅−=

∑∑∑∑

∑∑−

=

=

=

=

=

=

L

x

K

y

L

x

K

y

L

x

K

y

fyxfwyxw

fyxfwyxwC

(1)

where (x, y) represent the coordinates of an image point, f(x, y) and w(x, y) are the intensity value of f and w at the point (x, y), and

The method presented in this work involves three main phases: feature detection, feature tracking, and motion estimation. Figure 1 shows the steps of the algorithm as a flow chart. Each step is detailed in the remainder of this section. Results obtained for a test case are also shown as an example of the proposed approach.

f and w are the average intensity in f and w. C ranges between 0 and 1; the greater the value of C, the greater the similarity between f and w [34].

Based on this criterion, corresponding points are established according to the following procedure.

Let us denote with {L1} and {L2} the visual landmarks

JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 51

© 2006 ACADEMY PUBLISHER

Page 4: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

detected in two subsequent images I and Il,1 l,2 acquired by the left camera of the stereo device at time t1 and t2 respectively. Each point in {L1} is paired with the point in {L2} that generates the maximum value of the coefficient C in a 5 × 5 pixels window centered at the point. To speed up and improve the searching process, only features within a certain pixel distance from each other are matched. A minimum value for the cross-correlation coefficient is also established. False matches are then rejected using two strategies: mutual consistency check and robust statistics. The former consists in applying the cross-correlation-based pairing from both {L1} to {L2} and {L2} to {L1}. Only pairs that mutually have each other as preferred mate are accepted as valid matches [16] and are stored together with their correlation value. A final selection is accomplished based on the median [28] and the standard deviation from median of the computed correlation coefficients. Pairs whose correlation deviates from the median by more than two times the standard deviation from median are rejected.

(a)

This process brings two principal advantages: first of all, features which do not belong to both frames are discarded, i.e. the segmentation of input data is performed; furthermore, a set of corresponding 3D points is selected which can be used for the successive motion estimation stage, providing initial alignment [28].

C. Motion estimation The problem of estimating the motion that the camera

has undergone between two consecutive stereo acquisitions can be expressed as finding the 3D rotation matrix R and the translation vector t that minimize the mean-squares objective function

(b)

Figure 2 Left images (a) before and (b) after rotation, with selected features superimposed.

∑=

−+=N

iii ptRp

NF(R,t)

1

2

,2,11 (2)

where and denote two corresponding 3D points and N is the number of pairs.

ip ,2ip ,1

Motion estimation is performed, at first, using point pairs established through cross-correlation, as it is described in the previous section. Then, ICP registration is applied for motion estimate refinement. The rejection scheme proposed by Zhang [25] is employed, which allows setting adaptively the value of the maximum distance between corresponding points using the statistics of the distances (i.e. mean value and standard deviation). Least-squares rotation and translation are computed using the dual number quaternion method [35]. The process stops when the change in motion estimate between two successive iterations is less than 1%.

A sample case Here, results obtained for a test case are reported as an

example. In this experiment, the algorithm was applied to 320 × 240 px stereo images, after the camera rotated 10° around the pan axis (x). Figure 2(a) and 2(b) show the left frames of the two successive stereo pairs with the selected visual landmarks superimposed. Each feature has an associated 3D point.

Once the features in two consecutive stereo frames

have been selected, the problem of finding corresponding points has to be solved. This is done using both pixel intensity and 3D stereo information. In Figure 3, the left image before rotation is shown along with the correspondences determined using intensity information only. Specifically, Figure 3(a) displays the correspondences after normalized cross-correlation-based pairing. Features at a maximum distance of 100 pixels were matched and a correlation threshold of 0.85 was fixed. Figure 3(b) reports the result of the rejection process based on mutual consistency check and robust statistics, showing a reduction of false matches of about 30%. False matches (about 20% of total matches) are still present; that indicates the necessity of a refinement process. Nevertheless, correspondences established based on pixel intensity information can be employed to obtain an approximate motion estimate.

Stereo data are then used, applying ICP. Final pairs are plotted in the image plane in Figure 4(a). After seven iterations (see Figure 4(b)-(c)), the absolute position error remains stable at 0.78 cm along the pan axis (x), 2.8 cm along the tilt axis (y), and 0.68 cm along the swing axis (z), while the absolute error in rotation is of 1.10°, 0.39°, and 0.04° for pan, tilt and swing angles, respectively.

In Figure 5(a) and 5(b), final selected 3D pairs are displayed, before and after registration. Figure 6 reports, instead, the results obtained by applying ICP directly to the 3D point clouds, without previous processing. Evidently, no good motion estimate would be achieved.

52 JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006

© 2006 ACADEMY PUBLISHER

Page 5: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

(a) (b)

Figure 3. Point pairs (a) after normalized cross-correlation-based tracking, and (b) after false matches rejection using mutual consistency check and robust statistics.

(a) (a)

0

4

8

12

16

20

1 2 3 4 5 6 7Iterations

Posi

tion

Erro

r [cm

]

Error x

Error y

Error z

(b)

(b)

Figure 5. Final pairs in 3D space (a) before and (b) after registration, using correlation and ICP. At the end, the red square points overlap the

corresponding black round points.

0

1

2

3

4

5

6

1 2 3 4 5 6 7Iterations

Orie

ntat

ion

Erro

r [°]

Swing Error

T ilt Error

Pan Error

(c)

Figure 4. Result of Iterative Closest Point (ICP): (a) final correspondences plotted in the image plane; (b)-(c) absolute position and orientation errors

during iteration.

JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 53

© 2006 ACADEMY PUBLISHER

Page 6: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

(a) (b)

Figure 6. Final point pairs estimated by applying ICP directly to 3D points, (a) re-projected onto the image plane and (b) in 3D space. In (b), black arrows indicate the positions of the red square points after ICP registration. Evidently, no good motion estimate would be achieved.

n

cs

kbkz

1

⎟⎟⎟⎟

⎜⎜⎜⎜

+=

ϕ

σIII. SINKAGE ESTIMATION

(5) In this section, we present a theoretical analysis of wheel sinkage on soft terrain and we explain our approach for visual sinkage estimation.

where A. Theoretical Analysis k – cohesive modulus of terrain deformation; cA driven rigid wheel rolling on sandy soil (see Figure

7) undergoes a certain amount of sinkage z depending on the vertical load W acting on the wheel and the wheel slip i defined as

k – frictional modulus of terrain deformation; φn – exponent of terrain deformation; σ - normal stress at the wheel-terrain interface; b – wheel width.

RVi⋅

−=ω

1 Whereas, z (3) j can be evaluated as

( )critjcrit

pzj

p′>=

′−σ

στmax (6) with

V – linear speed of the wheel; ω – angular rate of the wheel; being τmax the maximum shear stress that a given terrain

can bear according to the Coulomb-Mohr soil failure criterion

R – radius of the wheel.

A semi-empirical model of sinkage z was proposed by Bekker [12], according to ϕτ tanσc maxmax += (7)

js zzz += (4) with

where zs is the value of sinkage due to static load only and z

c – cohesion of the soil; j is the counterpart due to slip. zs can be estimated as ϕ – internal friction angle of the soil;

– the maximum normal stress at the wheel-terrain interface.

σmax

critp′and being the “Terzaghi bearing capacity” [36] given by

( )( )γγ NbzzNcNp jsqccrit ⋅++++=′ 5.0 (8)

where N , Nc q, and Nγ are constants, and γ is the density of the soil. The parameter j refers to the shear displacement which is related to wheel slippage i and angle θ to

( )( )[ ]θθθθθ sinsin-1--)( 11 −= irj (9)

being θ1 the so-called wheel entry angle or contact angle (see Figure 7).

The accuracy of the sinkage model depends on the accuracy of many empirically found constants. Based on traditional soil parameters for sand [12] and tuning the

Figure 7. Wheel-soil interaction model (adapted from

[12]).

54 JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006

© 2006 ACADEMY PUBLISHER

Page 7: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

TABLE I. SAND PARAMETERS USED FOR SIMULATIONS

30 ϕ [deg] c [kPa] 1.0

k 0.1 [kN/mn+1]

k 55 φ [kN/mn+2]

b 0.09 [m]

R [m] 0.08

γ [Kg/ m3 1633 ] Nq [m] 0.48 N 0 c

Nγ 10 Figure 9. Rigid wheel sinking into deformable terrain.

n 1

remaining constants using experimental data (see Table I), we created Figure 8, which shows the relationship between wheel slip and total sinkage for three different values of vertical load.

B. Visual Sinkage Estimation In order to estimate wheel sinkage, we developed the

VSE module using a camera attached the vehicle body. We assume that the location of the wheel relative to the camera is known and fixed during the vehicle travel.

Sinkage z can be evaluated by estimating the contact angle θ1 between wheel and terrain (see Figure 9) using the geometrical relationship

( )1cos1 θRz −⋅= (10)

The VSE algorithm requires a pattern of equally spaced 1-mm thick concentric black circumferences on a white background attached to the wheel in order to determine θ1 using an edge detection-based strategy. This

approach allows algorithmic simplicity and computational efficiency, providing fast, real-time measurements.

In practice, the VSE operates by identifying the wheel radial lines where the number of detected edges is less than that expected when the wheel rolls without sinkage. Those lines can be associated with the part of the wheel obscured by terrain and thus with sinkage.

The VSE consists of the following steps: Region of Interest (ROI) identification, pixel intensity computation, and contact angle estimation. Each step is discussed in detail in the remainder of this section.

ROI Identification – In order to estimate the contact angle θ1, the annular region along the wheel rim including the circumference pattern is the only image area that needs to be examined. Thus, ROI identification is first of all performed, reducing the computational time and improving accuracy. Given the position of the wheel center relative to the camera and the geometry of the wheel, the ROI can be detected using simple geometric projections.

Pixel Intensity Computation – A pixel intensity analysis is performed along radial lines spanning across the selected ROI with an angular resolution of 1°. A typical intensity plot along a radial line is reported in Figure 10 for a test on sand. The VSE differentiates between a so-called “wheel region” where the wheel is not obscured by terrain, and a “soil region” (“sand region” in Figure 10) where the soil is covering the wheel. The wheel region is characterized by high intensity variations that can be classified as “edges”, while the soil region shows an almost uniform intensity value.

Edges are detected based on three factors [37]: contrast: the difference between the average intensity

value of the pixels before the edge and the average intensity value of the pixels after the edge;

steepness: the number of pixels that constitute the edge;

filter width: the number of pixels used for estimating the average intensity values.

Figure 8. Total sinkage as a function of wheel slip for various ground pressures.

These factors were determined by analyzing a typical line intensity profile. An adaptive threshold for selecting

JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 55

© 2006 ACADEMY PUBLISHER

Page 8: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

the appropriate edge intensity contrast along each radial line of inspection was experimentally determined as

2MinMax LLC −

= (11)

where LMax and LMin are the maximum and the minimum intensities measured along a given line. Filtering is applied to reduce noise and small-scale changes in intensity due to reflection, pebbles, etc.

Contact Angle Estimation – The contact angle θ1 is computed as the wheel angle where the transition between the wheel region and the soil region occurs. Pixel information is converted into metric information, using the camera parameters previously obtained by calibration.

Figure 11 The Shrimp mobile robot equipped with a Videre Design

stereo head (adapted from [15])

IV. EXPERIMENTAL RESULTS

A comprehensive set of experiments was performed to validate the proposed methods. The effectiveness of the visual odometry algorithm was tested using an all-terrain mobile robot. A wheel sinkage test bed was employed, instead, to experimentally examine the VSE module and validate the sinkage model.

A. Visual Odometry using Iterative Closest Point The visual odometry algorithm was validated using the

Shrimp robot, equipped with a Videre Design stereo head as shown in Figure 11.

The Shrimp is an off-road rover characterized by a passive non-hyperstatic structure, which makes it able to adapt to a large range of obstacles. It has six motorized wheels and is composed of four main parts: the body, the articulated front fork, and the two side bogies. More details can be found in [15].

Several experiments were performed, in order to test the effectiveness of the method for different motion conditions and environments. Here, results of three different tests are presented. In the first test, the robot was remotely controlled on flat carpet in a typical office-like environment (Figure 12(a)). The other two tests were performed on a rocky surface (Figure 12(b)).

In all the experiments, the robot was driven at 6 cm/s. 3D information is referred to a reference frame attached to the chassis of the robot, as shown in Figure 11. The algorithms were developed in C++ language code.

Note that both Figure 12(a) and 12(b) show as white

squares the points that the algorithm uses as starting tracking features.

Experiments on a flat surface The ability of the system to reach a target position was evaluated, guiding the robot through an L-shaped path of about 1780(x) × 2200(y) mm to a predefined location. Five runs were executed. At each run, the i-th position errors (ei , ei ) for i = 1, 2,…n (n = 5) were computed as px py

( )ieyTy

ipy ppe −=( )i

exTxipx ppe −= (12)

where [p , p ] denotes the position of the target, and Tx Ty

(a)

(b)

Figure 12 (a) Indoor test environment and (b) rocky soil with Figure 10 Sample diagram of pixel intensity along a radial line. selected features superimposed.

56 JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006

© 2006 ACADEMY PUBLISHER

Page 9: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

i[p ex, piey] is the estimated final position of the robot at the

i-th run. The mean percentage errors and standard deviations resulted in 3.4 ± 3.8 cm along x and of 5.3 ± 4.0 cm along y. An average absolute position error (Epx, Epy) along x and y directions was defined as

∑=

⋅=n

i

ipxpx e

nE

1

1 ∑=

⋅=n

i

ipypy e

nE

1

1 (13)

Lastly, the average position error was computed as

22pypxp EEE += (14)

Figure 13(a) and 13(b) show, respectively, the estimated path and the variation of the yaw angle during one typical run. The position errors computed for the five tests are reported in Figure 13(c), corresponding to an average position error Ep of 6.3 cm.

Experiments on a simulated rocky soil Two different tests were performed with the robot moving on a rocky surface. In the first test, after a forward displacement, the robot was guided to climb up a ramp of about 12° of inclination to reach a target position located approximately at a distance of 2200 mm along y and at a quote of 200 mm over the initial position of the robot. The test was repeated five times. Figure 14(a) and 14(b) show, respectively, the trajectory in the (y-z) plane and the pitch angle variation during one run. Position errors were estimated according to (12), (13), and (14) referred to the (y-z) plane, and are shown in Figure 14(c). The mean percentage errors and standard deviations

resulted in 5.3 ± 4.2 cm along y and of 1.2 ± 1.0 cm along z. The computed average position error Ep was of 5.5 cm.

In the second test, the robot was guided to overcome two consecutive steps of 50 mm and 100 mm, at first moving forward for about 1100 mm and then backward to the start position. In this test, the variations of all the six degrees-of-freedom of the vehicle can be clearly observed, as shown in Figure 15(a) and 15(b), representing respectively the estimated 3D positions and the Euler angles during one test. Ten runs were executed. In each run, the robot started at a marked location and was driven back to the same location.

The discrepancy between the actual robot position and the estimated position is the so-called Return Position Error (RPE) [4]. The absolute RPEs for each run are reported in Figure 15(c). The following mean absolute RPEs and corresponding standard deviations were computed: 2.6±3.3 cm (x), 3.1±2.8 cm (y), 6.2±3.8 cm (z). Taking into account, all the error components, i.e. the error along x, y, and z, a total average position error Ep of 7.4 cm was obtained.

B. Sinkage Estimation

Validation of the VSE module

The performance of the VSE module was evaluated using the test bed shown in Figure 16. It consists of a driven 16 cm-diameter wheel mounted on an undriven vertical axis. A low-cost wireless 1-channel analog camera is attached to the wheel with a field of view containing the wheel-terrain interface. The actual sinkage

0

500

1000

1500

2000

0 250 500 750 1000 1250 1500 1750 2000

x [mm]

y [m

m]

-90

-75

-60

-45

-30

-15

00 10 20 30 40 50 60

Robot Positions

Yaw

[°]

(a) (b)

-9

-6

-3

0

3

6

9

-10 -7,5 -5 -2,5 0 2,5 5 7,5 10

epx [cm]

epy

[cm

]

Estimated Positions

Target Position

(c)

Figure 13 Tests in indoor environment. L-shaped path: (a) robot trajectory; (b) yaw angle variation; (c) position errors.

JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 57

© 2006 ACADEMY PUBLISHER

Page 10: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

of the wheel can be estimated from a potentiometer mounted on the vertical axis of the system.

Tests were performed under different operating conditions including non-flat terrains, variable lighting conditions, and terrain with and without rocks.

Representative results are shown in Figure 17(a) for a

set of sample images with different sinkage levels. The error was always less than 13%. No misidentifications were detected in all the experiments due to reflections and shadowing.

These tests prove that the VSE is able to provide real-time estimation of wheel sinkage with minimum

0

50

100

150

0 500 1000 1500 2000y [mm]

z [m

m

200

]

0

2

4

6

8

10

12

14

0 10 20 30 40 50 60Robot Positions

Pitc

h [°

]

(a) (b)

-3

-2

-1

0

1

2

3

-15 -10 -5 0 5 10 15

epy [cm]

epz [

cm]

Estimated PositionsTarget Position

(c)

Figure 14 Tests on a rocky soil. Ramp-like path: (a) displacement in the y-z plane; (b) pitch angle variation; (c) position errors.

0

200

400

600

800

1000

1200

1 6 11 16 21 26 31 36 41 46Robot Positions

[mm

]

xyz

-5

0

5

10

15

20

1 6 11 16 21 26 31 36 41 46

Robot Positions

[°]

rollpitchyaw

(a) (b)

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8 9 10Runs

Abs

olut

e R

PE [c

m]

zxy

(c)

Figure 15. Tests on a rocky soil. Double step trajectory: (a) robot positions; (b) Euler angles; (c) position errors.

58 JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006

© 2006 ACADEMY PUBLISHER

Page 11: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

computational requirements and a sampling rate of 5 Hz. The algorithm also proved to be very robust against

variations of lighting conditions. Figure 17(b) shows that the VSE continues to work accurately even for lighting reduction as much as 90% of the optimal value (L = 0.9).

Experimental validation of the sinkage model

The results obtained from the VSE module were compared with the sinkage model presented in Section III.A for a typical run on soft sand under uniform lighting and with the wheel starting from a standing condition. In

this experiment the wheel was subjected to a ground pressure of p = 13.8 kPa and slip of i = 0.4. The results are shown in Figure 18. The gray solid line is the sinkage as derived by the VSE module, the black line is the smoothed signal using a Kalman filter to compensate for uncertainties of the measurement. The gray dotted line shows the sinkage value predicted by the model for the same values of p and i. The discrepancy between the experimentally determined sinkage at steady state and the calculated sinkage is less than 5% showing the effectiveness of the model in understanding the sinkage phenomenon.

V. CONCLUSIONS

In this paper, two novel vision-based methods for rough terrain-mobile robots were described. First, a stereovision algorithm for real-time 6DoF ego-motion estimation was presented. It integrates image intensity and 3D stereo information in the well-known Iterative Closest Point (ICP) frame, overcoming two basic problems of standard ICP, i.e. its failure in presence of large displacements and its inability to segment input data. Experimental tests with an all-terrain rover were presented that showed this algorithm to be effective for vehicle self-localization in unstructured environments.

Successively, an innovative method for wheel sinkage estimation was proposed and experimentally tested on a single-wheel test bed proving to be computationally efficient, relatively accurate with maximum errors below 13%, and very robust to disturbances and variations in lighting condition. The visual sinkage estimation algorithm allowed to experimentally validate a semi-empirical model proposed to predict the behavior of the wheel on sandy terrains which showed good agreement with the experiments.

The methods described in this paper can be used to gain important information about the vehicle state and its interaction with the soil, improving localization accuracy and traction control of rough-terrain autonomous vehicles.

Figure 16. The test bed for wheel sinkage estimation.

(a)

(b)

Figure 17. (a) Visual estimation of wheel sinkage; (b) influence of lighting variations on the VSE module.

Figure 18. Comparison between the measured and calculated total

sinkage.

JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 59

© 2006 ACADEMY PUBLISHER

Page 12: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

REFERENCES

[1] E. Tunstel and A. Howard, “Sensing and perception challenges of planetary surface robotics,” Proceedings of IEEE Sensors, Orlando, Florida, USA, 2002.

[2] J. Borenstein, B. Everett, and L. Feng, Navigating Mobile Robots: Systems and Techniques, A. K. Peters, Ltd., Wellesley, MA, ISBN 1-56881-058-X, 1996.

[3] P. Lamon and R. Siegwart, “3D-Odometry for rough terrain – Towards Real 3D navigation,” Proceedings of the International Conference on Robotics and Automation, ICRA'03, Taipei, Taiwan, 2003.

[4] L. Ojeda, G. Reina, and J. Borenstein, “Experimental results from FLEXNAV: an expert rule-based dead-reckoning system for Mars rovers,” IEEE Aerospace Conference, Big Sky, MT, USA, 2004.

[5] G. Reina, “Rough Terrain Mobile Robot Localization and Traversability with applications to Planetary Explorations,” PhD Thesis, Politecnico of Bari, Italy, 2004.

[6] S.I. Roumeliotis, A.E. Johnson, and J.F. Montgomery, “Augmenting inertial navigation with image-based motion estimation,” Proceedings of the 2002 IEEE International Conference on Robotics & Automation, Washington, 2002, pp. 4326-4333.

[7] C. Olson, L. Matthies, M. Schoppers, and M. Maimone, “Robust stereo ego-motion for long distance navigation,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2000, pp. 453-458.

[8] D. Helmick, S.I. Roumeliotis, Y. Cheng, D. Clouse, M. Bajracharya, and L. Matthies, “Slip Compensation for a Mars Rover,” in Proc. 2005 IEEE International Conference on Intelligent Robots and Systems, Edmonton, Canada, Aug. 2-6, 2005, pp. 1419-1426.

[9] L.H. Matthies, Dynamic Stereo Vision, PhD thesis, Carnegie Mellon University, 1989.

[10] T.L. Huntsberger, H. Aghazarian, Y. Cheng, E.T. Baumgartner, E. Tunstel, C. Leger, A. Trebi-Ollennu, and P.S. Schenker, “Rover autonomy for long range navigation and science data acquisition on planetary surfaces,” Proceedings of International Conference on Robotics and Automation, Washington, DC, 2002.

[11] G. Reina, L. Ojeda, A. Milella, and J. Borenstein, “Wheel slippage and sinkage detection for planetary rovers”, IEEE/ASME Transactions on Mechatronics, vol. 11, no. 2, April 2006, pp. 185-195.

[12] M.G. Bekker, “Off-Road Locomotion”, the University of Michigan Press, Ann Arbor, Mi, 1960.

[13] P.J. Besl and N.D. McKay, “A method for registration of 3-D shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, February 1992, pp. 239-256.

[14] H. Surmann, A. Nüchter, and J. Hertzberg, “An Autonomous mobile robot with a 3D laser range finder for 3d exploration and digitalization of indoor environments,” Journal Robotics and Autonomous Systems, vol. 45, 2003, pp. 181-198.

[15] R. Siegwart, P. Lamon, T. Estier, M. Lauria, and R. Piguet, “Innovative Design for Wheeled Locomotion in Rough Terrain,” Journal of Robotics and Autonomous Systems, Elsevier, vol. 40/2-3, pp. 151-162, 2002.

[16] D. Nistér, O. Naroditsky, and J. Bergen, “Visual Odometry”, Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2004, pp. 652-659.

[17] A.J. Davison, “Real-time Simultaneous Localization and Mapping with a single camera,” IEEE Int. Conf. on Computer Vision, Nice, 2003, pp. 1403-1410.

[18] P.I. Corke, D. Strelow, and S. Singh, “Omnidirectional visual odometry for a planetary rover,” Proceedings of IROS 2004, Japan, 2004.

[19] M. Dunbabin, K. Usher, and P. Corke, “Visual motion estimation for an autonomous underwater reef monitoring robot,” Field and Service Robotics Conference (FSR 2005), Port Douglas, Qld., 2005, pp. 57-68.

[20] A. Mallet, S. Lacroix, and L. Gallo, “Position estimation in outdoor environments using pixel tracking and stereovision,” IEEE Int. Conf. on Robotics and Automation, San Francisco, CA, USA, 2000, pp. 3519-3524.

[21] D. Nistér, “Preemptive RANSAC for live structure and motion estimation”, IEEE International Conference on Computer Vision, Nice, 2003, pp. 199-206.

[22] A. Benedetti and P. Perona, “Real-time 2-D feature detection on a reconfigurable computer,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, USA, 1998.

[23] J. Diebel, K. Reuterswärd, S. Thrun, J. Davis, and R. Gupta, “Simultaneous Localization and Mapping with active stereo vision,” IEEE/RSJ Conf. on Intelligent Robots and Systems (IROS), Japan, 2004.

[24] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” Proceedings of IEEE 3DIM, Canada, 2001, pp. 145-152.

[25] Z. Zhang, “Iterative Point Matching for registration of free-form curves,” IRA Rapports de Recherche N° 1658 Programme 4 Robotique, Image et Vision, 1992.

[26] I.A.D. Nesnas, M. Bajaracharya, R. Madison, E. Bandari, C. Kunz, M. Deans, and M. Bualat, “Visual target tracking for rover-based planetary exploration,” Proceedings of the 2004 IEEE Aerospace Conference, Big Sky, Montana, 2004.

[27] A. Milella, Vision-Based Methods for Autonomous Mobile Robots, PhD Thesis, Politecnico of Bari, Italy, 2006.

[28] M.A. Garcia and A. Solanas, “3D simultaneous localization and modeling from stereo vision,” Proceedings of the 2004 IEEE International Conference on Robotics & Automation, New Orleans, LA, 2004, pp. 847-853.

[29] J.M. Sáez and F. Escolano, “A global 3D map-building approach using stereo vision,” Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004, pp. 1197-1202.

[30] S. Weik, “Registration of 3-D partial surface models using luminance and depth information,” Proceedings of the First International Conference on Recent Advances in 3-D Digital Imaging and Modeling, 1997.

[31] K. Iagnemma, C. Brooks, and S. Dubowsky, “Visual, tactile, and vibration-based terrain analysis for planetary rovers,” in Proc. of the IEEE Aerospace Conf., Big Sky, MT, USA, 2004.

[32] K. Konolige, “Small Vision Systems: hardware and implementation,” 8th International Symposium on Robotics Research, Japan, 1997.

[33] J. Shi and C. Tomasi, “Good Features to Track,” IEEE Conference of Computer Vision and Pattern Recognition, CA, 1994, pp. 593-600.

[34] R. Gonzalez and R. Woods, Digital image processing, Prentice Hall, 2nd Edition.

60 JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006

© 2006 ACADEMY PUBLISHER

Page 13: Computer Vision Methods for Improved Mobile Robot State Estimation in Challenging Terrains

[35] M.W. Walker, L. Shao, and R.A. Volz, “Estimating 3-D location parameters using dual number quaternions,” CVGIP: Image Understanding, 54, 1991, pp. 358-367.

[36] K. Terzaghi, “Theoretical Soil Mechanics”, Wiley, New York, NY, 1942.

[37] National Instruments, IMAQ Vision Concepts Manual. [Online]. Available: http://www.ni.com/

Biography

Annalisa Milella received the Laurea (summa cum laude) and the Research Doctorate degrees from the Politecnico of Bari, Bari, Italy, in 2002 and 2006, respectively, both in mechanical engineering. In 2005, she was a visiting scholar at the EPFL Autonomous Systems Laboratory. Her research interests include autonomous vehicles and computer vision systems. Currently, she is with the Institute of Intelligent Systems for Automation (ISSIA), Italian National Research Council (CNR) of Bari, Italy. Giulio Reina received the Laurea degree and the Research of Doctorate degree from the Politecnico of Bari, Italy in 2000 and 2004 respectively, both in Mechanical Engineering. From 2002 to 2003, he worked at the University of Michigan Mobile

Robotics Laboratory as a Visiting Scholar. Currently, he is an Assistant Professor in Applied Mechanics with the Department of Innovation Engineering of the University of Lecce, Lecce, Italy. His research interests include ground autonomous vehicles, mobility and localization on rough-terrain, and agricultural robotics. Roland Siegwart is full professor for autonomous systems at ETH Zurich since July 2006. He has a Diploma in Mechanical Engineering (1983) and PhD in Mechatronics (1989) from ETH Zurich. In 1989/90 he spent one year as postdoctoral fellow at Stanford University. After that he worked part time as R&D director at MECOS Traxler AG and as lecturer and deputy head at the Institute of Robotics, ETH Zürich. In 1996 he was appointed as associate and later full professor for autonomous microsystems and robots at the Ecole Polytechnique Fédérale de Lausanne (EPFL). In 2005 he hold a visiting position at NASA Ames and Stanford University. He served as Vice President for Technical Activities (2004/05) and is currently Distinguished Lecturer (2006/07) of the IEEE Robotics and Automation Society. His research interests are in the design and navigation of autonomous robots operating in complex and highly dynamical environments.

JOURNAL OF MULTIMEDIA, VOL. 1, NO. 7, NOVEMBER/DECEMBER 2006 61

© 2006 ACADEMY PUBLISHER