Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging

8/3/2019 Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging

http://slidepdf.com/reader/full/autonomous-segmentation-of-near-symmetric-objects-through-vision-and-robotic 1/6

Autonomous Segmentation of Near-Symmetric Objects through Vision

and Robotic Nudging

Wai Ho Li and Lindsay Kleeman

Abstract— This paper details a robust and accurate segmen-tation method for near-symmetric objects placed on a tableof known geometry. Here we define visual segmentation asthe problem of isolating all potions of an image that belongsto a physically coherent object. The term Near-Symmetricis used as our method can segment objects with some non-symmetric parts, such as a coffee mug and its handle. Usingbilateral symmetry this problem is solved autonomously androbustly through the aid of physical action provided by arobot manipulator. Our proposed approach does not requireprior models of target objects and assumes no previouslycollected background statistics. Instead, our approach relieson a precise robotic nudge to generate the necessary objectmotion to perform segmentation. Experiments performed on tenobjects show that our model-free approach can autonomouslyand accurately segment a variety of objects. These experimentsalso indicate that our segmentation approach is not adverselyaffected when operating in cluttered scenes and can segmentmulti-coloured and transparent objects in a robust manner.

I. INTRODUCTION

A. Motivation

Object segmentation is an important sensory process for

robots using vision. It allows a robot to build accurate

internal models of its surroundings by isolating regions of

images that correspond to objects in the real world. Multi-

scale computer vision object recognition methods, such asSIFT [1] and Haar boosted cascades [2] can imbue a robot

with the ability to robustly detect and classify modeled

objects. However, training such schemes to recognize objects

require many hand-labeled and well segmented images of

positive and negative examples. Precious human resources

are required to obtain this kind of training data. For very large

object sets, the amount of time and effort required can be

prohibitive. The autonomous process described in this paper

attempts to address this problem by obtaining accurate object

segmentations robustly without the need for human aid or

intervention.

Another motivating factor is to provide a segmentation

process that is highly autonomous. By limiting target objectsto those with bilateral symmetry, a model-free approach

can be applied, which allows us to abandon the a priori

assumptions and offline training demanded by other seg-

mentation approaches. For example, our method can operate

on transparent objects as we do not assume any temporal

constancy or colour uniformity in an object’s appearance.

Wai Ho Li and Lindsay Kleeman are with the Department of Electrical and Computer Systems Engineering, Monash University,Clayton Campus, Melbourne, Australia [email protected],[email protected]

This work is intended for use in domestic robotics ap-

plications as there are many objects with symmetry in

most households. However, the sensing parts of the process,

namely locating points of interest using symmetry triangu-

lation and object segmentation by folded frame difference,

are applicable to other robotic tasks. The overall aim is to

provide robots with general methods of dealing with common

household objects such as cups, bottles and cans, without the

burden of mandatory offline training for every new object.

As our approach assumes nothing about the appearance of

the robot manipulator, the actuation of target objects can be

provided by any manipulator capable of performing a roboticnudge as described in Section III, including a human hand.

B. Contributions

Segmentation using robotic action has been explored in

the past, most recently by Fitzpatrick et al [3], [4]. Their

approach uses a poking action, which sweeps the end effector

across the workspace. The presence of an object is detected

when visual motion increases due to contact with the moving

effector. Their segmentation method use frames just before

and after this point of contact. No planning is performed

prior to robotic action. Assuming the target object is not

deformed by the poking action, objects of any shape can be

segmented.The main contributions of our work are as follows. Firstly,

by limiting our scope to near-symmetric objects, locations of

interest are found prior to the application of robotic action.

This is achieved by clustering the intersections between

stereo triangulated symmetry axes and a table plane. By

avoiding dense stereo approaches, we can also localized

transparent objects with bilateral symmetry. Details of our

stereo triangulation approach, including a comparison of

results against dense stereo, can be found in [5].

Limited by the use of elastic acutators in their manipulator,

the approach of Fitzpatrick et al uses applies an imprecise

poking action to objects. In contrast, our method uses a

short, accurate robotic nudge, applied only to locations of interest. In experiments, we show that our method does not

tip over tall objects such as empty bottles and does not

damage fragile objects such as ceramic mugs. This level

of gentleness in object manipulation is not demonstrated in

the work of Fitzpatrick et al. While neither method address

the problem of end effector obstacle avoidance, the small

workspace footprint of the robotic nudge should make path

planning easier.

Finally, while appearing similar at a glance, our approach

to visual segmentation is very different to that of Fitz-



patrick et al. Their approach uses video frames during robotic

action, around the time of contact between the end effector

and object. Due to their motion-based initiation, bad frame

timing with respect to the time of contact can produce

poor segmentations. This is highlighted in Figure 11 of

[4], which shows that their end effector can be included in

the segmentation results. This problem never occurs in our

approach as we use video frames that are temporally further

apart, captured before and after robot action. Also, near-

empty segmentations can be returned by their approach. Our

approach will only perform segmentation if object motion is

detected during the nudge and the subsequent stereo tracking

remains convergent. The satisfaction of these conditions

prevents poor segmentations due to insufficient or unexpected

object motion.

C. System Overview

Fig. 1. Robot System Components

The components of our robot system are shown in Fig-

ure 1. The stereo cameras consists of two Videre Design 1394

CMOS cameras verged together at around 15 degrees from

parallel. These cameras capture 640x480 images at 25Hz

during nearly all parts of the segmentation process, except

for high resolution 1280x960 snapshots of the scene taken

before and after the robotic nudge. The PUMA 260 robot arm

has six degrees of freedom. The calibration grid is used to

perform camera-arm calibration and to estimate the geometryof the table plane. Details of both are described in Section II-

A.

Our autonomous segmentation process is summarized in

Figure 2. The robot begins by surveying the scene for

interesting locations to explore. The details of this process

are described in Section II. Once an interesting location

has been found, the robot manipulator nudges the target

location. If motion is detected during the nudge, stereo

tracking is initiated to keep track of the moving object.

Section III describes the robotic nudge and stereo tracking. If

Fig. 2. Autonomous Segmentation Flowchart

tracking converges, the object is segmented using the method

described in Section IV.

Bilateral symmetry is used as the primary visual feature

throughout all stages of the process. Our Fast Bilateral

Symmetry Detection [6] scheme, herein referred to simply

as symmetry detection, is used to locate lines of symmetry

within input images. The noise robustness of our detection

method is crucial when performing segmentation in visually

cluttered environments.

II. FINDING INTERESTING LOCATIONS

A. System Calibration

This section details the methods used to calibrate our

robotic system. Firstly, the stereo cameras are calibrated

using the MATLAB camera calibration toolbox [7]. The

intrinsic parameters of each camera in the stereo pair are

obtained individually. This is followed by a calibration to

obtain the extrinsics of the stereo system. After this, the

camera system can be used to triangulate locations in 3D

space. The geometry of the table is found by fitting a plane

to the checkerboard corners, the locations of which are found

using stereo triangulation.Prior to calibration, a grid of points is drawn on the table

using the robot manipulator with a special pen attachment.

Using this grid, the corners of the calibration is placed at a

known location in the manipulator’s coordinate frame. The

same corners are triangulated using the stereo cameras to find

their coordinates relative to the camera frame of reference.

The arm-camera calibration is performed by solving the

Absolute Orientation problem, finding the transformation to

map the corner points from one frame of reference to another.

We use the PCA-based solution proposed by [8].



B. Clustering Symmetry Intersects

Symmetry lines are detected in the left and right video

frames to provide data to a clustering algorithm. All pos-

sible pairings of symmetry lines between the left and right

images are triangulated to form 3D axes of symmetry using

the method described in our previous paper [5]. In our

experiments, three symmetry lines are detected for each

image, resulting in a maximum of nine triangulated axesof symmetry. Symmetry axes that lie outside the robot

manipulator’s workspace are left out of the clustering data.

Axes that are more than 10 degrees from being perpendicular

to the table plane are also rejected.

The intersections between valid symmetry axes and the

table plane are collected over 25 pairs of video frames and

recorded as 2D locations on the table plane. This collection

of locations are grouped into clusters using a modified QT

algorithm [9]. The QT clustering algorithm does not require

any prior knowledge of the number of actual clusters. This is

important as we are not making any assumptions concerning

the number of objects on the table. The QT algorithm also

provides a way to limit the diameter of clusters, reducingthe likelihood of clusters that include symmetry lines from

multiple objects. The orignal QT algorithm was modified

with the addition of a cluster quality threshold. The quality

threshold is used to ignore clusters formed by symmetry

axes that occur in less than half of all collected frames. The

geometric centroids of the clusters provide the robot with a

list of interesting locations to explore. A nudge is performed

on the valid location closest to the camera. A location is

deemed invalid if the robot gripper will collide with other

locations of interest during a nudge.

III. OBJECT MANIPULATION: THE ROBOTIC

NUDGE A. Motion Control

Fig. 3. Side view of Robotic Nudge

The motion of the robot gripper during a nudge is shown in

Figures 3 and 4. The L-shaped protrusion is made of sponge

to provide damping during contact, which is especially

important when nudging brittle objects such as ceramic cups.

The L-shaped sponge also allows the application of pushing

force at a height which is very close to the table plane. By

Fig. 4. Top-Down view of Robotic Nudge

applying force to the bottom of objects, nudged objects are

less likely to tip over. An example nudge captured by the

right camera is shown in Figure 5.

Fig. 5. Consecutive video frames from Right camera during a nudge. Theframes are taken from the P 1-P 2-P 1 portion of the nudge motion

The nudge begins by lowering the gripper from P 0 to P 1.

The height of the gripper at location P 0 is well above the

height of tallest expected object. Dmax is set to ensure thatthe L-shaped sponge will not hit the largest expected object

during its descent. After arriving at P 1, the gripper travels

towards P 2. Dmin is selected such that the gripper will make

contact with the smallest expected object before arriving at

P 2. The nudge motion ensures that the gripper never visually

crosses the object’s symmetry line when viewed from the

right camera. The gripper then retreats back through P 1 to

P 0. In early tests, the gripper was moved directly from P 2

back to P 0. This knocked over tapered objects such as the

blue cup in Figure 8 due to friction between the soft sponge

and the object’s outer surface.

In the overhead view of Figure 4, the nudge vector is

perpendicular to the line formed between the focal pointof the right camera and the target object’s symmetry line,

assuming one is present at the location to explore. This

choice of motion will nudge the object horizontally across the

camera’s image. This reduces the scale change of the target

object and also lowers the probability of glancing contact,

improving the quality of segmentation. After a location of

interest has been found, P 0, P 1 and P 2 are determined

based on the camera’s location. Using inverse kinematics,

linearly-interpolated encoder values are generated at run time

to move the gripper smoothly between these three points.



B. Obtaining Visual Feedback by Stereo Tracking

When the gripper begins its descent at P 0, the right

camera image is monitored for motion. Motion detection

is performed at a coarse resolution using 8x8 pixel cells.

Cells with two times the motion of the global average are

labeled as moving. This block motion algorithm is the same

as the one used in our symmetry tracking paper [10]. To

prevent ego motion of the robot manipulator from beinginterpreted as object motion, the object’s symmetry line is

used as a visual barrier. As the robot gripper never crosses

the symmetry line, motion detection is only performed on

green region in Figure 3.

Once motion has been detected, the robot begins stereo

tracking on the target object’s symmetry line. A Kalman

filter is used to track the polar parameters of the target

symmetry line. The tracking system is identical to the one

described in our previous work on real time monocular

symmetry tracking [10]. The monocular tracker is replicated

twice to perform stereo tracking. Visual segmentation will

only take place if tracking converges to a symmetry axes

roughly perpendicular to the table plane. This prevents poorsegmentation caused by insufficient object motion.

Videos of the robotic nudge and stereo tracking can be

downloaded from:

www.ecse.monash.edu.au/centres/irrc/li_iro08.php

IV. OBJECT SEGMENTATION

A. Object Segmentation by Folded Frame Difference

(a) Before Nudge (b) After Nudge (c) Frame Difference

(d) Folded Difference (e) Symmetry Filled (f) Segmentation Result

Fig. 6. Segmentation by Folded Frame Difference. Note that the FoldedDifference and Symmetry Filled images are rotated such that the object’ssymmetry line is vertical

Segmentation is performed using the object motion gen-

erated by the robotic nudge. Figure 6 illustrates the major

steps of segmentation. Figure 6(a) and Figure 6(b) are

images taken by the right camera before and after the nudge.

The absolute frame difference between the before and after

images is shown in Figure 6(c). The green lines are the

object’s symmetry lines before and after the nudge, found

using our symmetry detector. Note that thresholding the raw

frame difference will produce a mask that includes many

background pixels. The mask will also have a large gap

at the center of low-texture objects, such as the clear cup

in the example. Using the object’s symmetry lines, we can

overcome these problems.

Figure 6(d) shows the folded frame difference of the

object. This image is produced by removing the frame dif-

ference pixels between the two symmetry lines. This process

folds the frame difference image together as if it is printed on

a piece of paper, pressing the creases at the symmetry lines

together. Changes in the orientation of the object’s symmetry

lines before and after the nudge are removed prior to folding.

This folding process removes the excess area of the motion

mask autonomously and reduces the size of the motion gap

at the center of the moved object’s frame difference.

After folding, a small gap still remains in the frame

difference. This can be seen in Figure 6(d) as a dark vertical

section inside the cup-like shape. To remedy this, we again

exploit object symmetry to our advantage. Recall that thefolding step merges the symmetry lines of the object in the

before and after frames. Using this newly merged symmetry

line as a mirror, we search for motion on either side of it.

A pixel is considered moving if its frame difference value

is above a threshold. The folded difference image is rotated

so that the merged symmetry line is vertical. The widest

pair of moving pixels bisected by the object’s symmetry line

are recorded for each row of the image. This produces a

symmetric contour of the object. By filling the interior of this

contour, we produce the image in Figure 6(e). Note that this

filling approach retains the non-symmetric parts of objects.

The final segmentation result in Figure 6(f) is obtained by

thresholding the symmetry filled difference image.

V. SEGMENTATION EXPERIMENT R ESULTS

Segmentation experiments were carried out on ten ob-

jects of different size, shape, texture and colour. Trans-

parent, multi-coloured and partially symmetric objects are

also included. Objects are set against different backgrounds,

ranging from plain to cluttered. All segmentation results are

obtained autonomously by our robot without any human aid.

Objects in our scenes casts many shadows due to four bright

fluorescent ceiling light sources illuminating the table. For

safety reasons, a flashing warning beacon is active during

robot motion, periodically casting red light on the table when

the robot manipulator is powered.Due to space constraints, some segmentation results have

been left out. They can be found at:

www.ecse.monash.edu.au/centres/irrc/li_iro08.php

A. Cups without Handles

The white cup in Figure 7 poses a challenge to our

segmentation process not because of its imperfect symmetry,

but because of its shape. Due to its narrow stem-like bottom

half, the nudge produces very small shifts in the object’s



Fig. 7. Partially Symmetric White Cup

Fig. 8. Blue Cup

location, creating a narrow and weak contour of pixels in

the frame difference. As seen from the resulting segmen-

tation, our algorithm is able to handle this kind of object.

Figure 8 shows detection results for a symmetric cup against

background clutter. Lastly, Figure 9 illustrates the robustness

and accuracy of our segmentation process. The robot was

able to autonomously obtain a very clean segmentation of atransparent cup against background clutter.

B. Mugs with Handles

The mugs in Figures 10 and 11 tests the robustness of

our segmentation approach for objects with non-symmetric

parts. The handles of both mugs are successfully included

in the segmentation results. The multi-coloured mug in

Figure 11 was chosen for additional reasons. Firstly, it is a

multi-coloured object with intensities similar to background

Fig. 9. Transparent Cup in Clutter

Fig. 10. White Mug

Fig. 11. Multi-coloured Mug

shadows. This is the reason why the segmentation results is

quite noisy around the bottom of the mug. Secondly, it is a

brittle object made of ceramic. Its successful manipulation

provides evidence of the gentle nature of our robotic nudge.

C. Drink Bottles

Fig. 12. Small Water-Filled Bottle

The water-filled bottle in Figure 12 is used to test the

strength and accuracy of the robotic nudge. Due to its

small size and weight, the nudge must be accurate and

firm to produce enough object motion for segmentation. Thesegmentation result shows that the nudge can actuate small

and dense objects.

The remaining test objects are empty plastic drink bottles.

They are lightweight and have high centers of gravity, very

easy to tip over. During the nudge, their symmetry lines

tend to wobble, which provides noisy measurements to the

symmetry trackers. As such, these objects test the robustness

of stereo tracking and the robotic nudge. Figure 13 shows

a successful segmentation of a textured bottle against a

plain background. Figure 14 is a similar experiment repeated



Fig. 13. Textured Bottle

Fig. 14. Textured Bottle in Clutter

against background clutter. Finally, Figure 15 contains two

segmentation results for a transparent bottle. Note the accu-

rate segmentation obtained for the transparent bottle, which

produces a very weak motion signature when nudged.

VI. CONCLUSION

Our segmentation approach performs robustly and accu-

rately on near-symmetric objects in cluttered environments.By using the robotic nudge, the entire segmentation process

is carried out autonomously. Multi-coloured and transpar-

ent objects, as well as objects with non-symmetric parts,

are handled in a robust manner. We have shown that our

approach can segment objects of varying visual appearance

autonomously, shifting the burden of training data collection

from the user to the robot.

End effector obstacle avoidance and path planning, espe-

cially in situations where non-symmetric objects are present

in the nudge path, are left to future work. As our symmetry

detection method uses edge pixels as input, our segmentation

approach is visually orthogonal to those that use pixel

information, such as colour and image gradient. In situationswhere the target object is non-symmetric, approaches relying

on other features can be applied synergetically.

Our objection to stereo optical flow and graph cuts is their

reliance on object surface information, which is completely

unreliable for transparent and reflective objects. However,

if the opaqueness of an object has been confirmed, these

approaches can be used with our robotic nudge. As the

geometry of our table plane is known, a stereo approach to

segmentation can further improve segmentation by removing

the object shadow which is present in some of the results.

Fig. 15. Transparent Bottle

VII. ACKNOWLEDGMENTS

Thanks go to Steve Armstrong for his help with repairing

the PUMA 260 manipulator and the anonymous reviewers

for their insightful comments.

REFERENCES

[1] D. G. Lowe, “Distinctive image features from scale-invariant key-points,” IJCV , vol. 60, no. 2, pp. 91–110, November 2004.

[2] P. Viola and M. J. Jones, “Rapid object detection using a boostedcascade of simple features,” in IEEE CVPR, 2001.

[3] P. Fitzpatrick, “First contact: an active vision approach to segmenta-

tion,” in Proceedings of Intelligent Robots and Systems, 2003. (IROS2003), vol. 3. IEEE, October 2003, pp. 2161–2166.

[4] P. Fitzpatrick and G. Metta, “Grounding vision through experimentalmanipulation,” in Philosophical Transactions of the Royal Society:

Mathematical, Physical, and Engineering Sciences, 2003, pp. 2165–2185.

[5] W. H. Li and L. Kleeman, “Fast stereo triangulation using symmetry,”in Australasian Conference on Robotics and Automation, 2006.

[6] W. H. Li, A. M. Zhang, and L. Kleeman, “Fast global reflectionalsymmetry detection for robotic grasping and visual tracking,” in

Australasian Conference on Robotics and Automation, 2005.[7] J.-Y. Bouguet, “Camera calibration toolbox for matlab,” Online, July

2006, http://www.vision.caltech.edu/bouguetj/calib doc/.[8] T. S. H. K. S. Arun and S. D. Blostein, “Least-squares fitting of two

3-d point sets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, pp. 698–700, 1987.

[9] S. K. Laurie J. Heyer and S. Yooseph, “Exploring expression data:

Identification and analysis of coexpressed genes,” Genome Research,vol. 9, pp. 1106–1115, 1999.

[10] W. H. Li and L. Kleeman, “Real time object tracking using reflectionalsymmetry and motion,” in IEEE/RSJ Conference on Intelligent Robotsand Systems, 2006, pp. 2798–2803.

Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging

Documents