1 Segmentation and Modelling of Visually Symmetric Objects by Robot Actions Wai Ho Li and Lindsay Kleeman Intelligent Robotics Research Centre Department of Electrical and Computer Systems Engineering Monash University, Clayton, Victoria 3800, Australia { Wai.Ho.Li, Lindsay .Kleeman } @eng.monash.edu.au Abstract—Rob ots usual ly carry out obje ct segme ntati on and modelling passively. Sensors such as cameras are actuated by a robot without disturbing objects in the scene. In this paper, we present an intelligent robotic system that physically moves objects in an active manner to perform segmentation and modelling using visio n. By visua lly detectin g bilat eral symmetr y, our rob ot is able to segment and model object s thr ough control led physical interactions. Extensive experiments show that our robot is able to accurately segment new objects autonomously. We also show that our robot is able leverage segmentation results to autonomously lear n visua l model s of new object s by physical ly gras ping and rotating them. Object recognition experiments confirm that the robot-learned models allow robust recognition. Videos of robotic experiments are available from Multimedia Extensions 1, 2 and 3. Inde x T erms —f ast symmet ry , re al time, comput er vision , autonomous , segmenta tion, robotics , objec t rec ognit ion, SIFT , interactive learning, object manipulation, grasping I. I NTRODUCTION The ability to perform object segmentation and modelling use d to be the exclu sive domain of higher pri mates. With passing time, computer vision research has produced ever im- proving systems that can segment and model objects. Modern techniques such as Interactive Graph Cuts [Boykov and Jolly, 2001] and Geodesic Active Contours [Markus et al., 2008] can produce accurate segmentations given some human guidance. Similarly, visual features such as SIFT [Lowe, 2004], Gabor Filter banks [Mutch and Lowe, 2006] and Haar wavelets [Viola and Jones, 2001] enable reliable object detection and recogni- tion, especially when combined with machine learning meth- ods such as Boosting using AdaBoost [Freund and Schapire, 1997]. However, these computer vision techniques rely heavily on a priori knowledge of objects and their surroundings, such as initial guesses of foreground-background pixels, which is difficult to obtain autonomously in real world situations. This paper presents a robotic system that applies physical actions to segment and model new objects using vision. The system is composed of a robot arm that moves objects within it s workspac e insi de the fie ld of vi ew of a stereo ca mera pai r . The arm-ca mer a geo met ry is con figured to mimic a humanoid platform operating on objects supported by a flat table. A photo of our robotic system is shown in Figure 1. The checkerboard pattern is used to perform a once-off arm- camera calibration prior to robotic experiments. Fig . 1. Robo t Sys tem Co mpon ent s Physical actions can reduce the need for prior knowledge by providing foreground-background segmentation. However, a robot will requi re sign ifican t train ing and back groun d in- formation to perform object manipulations autonomously. By limiting our scope to objects that exhibit bilateral symmetry in a per pen dic ula r man ner to a known pla ne, such as cup s and bottles resting on a table, we propose a partial but robust solution to this problem. Given that many objects in domestic and office environments exhibit sufficient bilateral symmetry for our autonomous system, our symmetry-based approach can be employed in a wide variety of situations. Experiments show tha t our robot is abl e to autonomously segment and model new symmetric objects through the use of controlled physical actions. Object recognition experiments confirm that the robot- collected models allow robust recognition of learned objects. A. Objec t Se gment ation We define obj ect segme nta tion as the task of find ing all pixels in an image that belong to an object in the physical world. An object is defined as something that can be manip- ulated by our robot, such as a cup or bottle. Whereas image segmentation methods generally rely on consistency in adja- cent pixels [Pal and Pal, 1993], [Skarbek and Koschan, 1994],
19
Embed
Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
Objects by Robot ActionsWai Ho Li and Lindsay Kleeman
Intelligent Robotics Research Centre
Department of Electrical and Computer Systems Engineering
Monash University, Clayton, Victoria 3800, Australia
{ Wai.Ho.Li, Lindsay.Kleeman } @eng.monash.edu.au
Abstract—Robots usually carry out object segmentation andmodelling passively. Sensors such as cameras are actuated by arobot without disturbing objects in the scene. In this paper, wepresent an intelligent robotic system that physically moves objectsin an active manner to perform segmentation and modelling usingvision. By visually detecting bilateral symmetry, our robot isable to segment and model objects through controlled physical
interactions. Extensive experiments show that our robot is able toaccurately segment new objects autonomously. We also show thatour robot is able leverage segmentation results to autonomouslylearn visual models of new objects by physically grasping androtating them. Object recognition experiments confirm that therobot-learned models allow robust recognition. Videos of roboticexperiments are available from Multimedia Extensions 1, 2 and3.
Index Terms—fast symmetry, real time, computer vision,autonomous, segmentation, robotics, object recognition, SIFT,interactive learning, object manipulation, grasping
I. INTRODUCTION
The ability to perform object segmentation and modelling
used to be the exclusive domain of higher primates. With
passing time, computer vision research has produced ever im-
proving systems that can segment and model objects. Modern
techniques such as Interactive Graph Cuts [Boykov and Jolly,
2001] and Geodesic Active Contours [Markus et al., 2008] can
produce accurate segmentations given some human guidance.
Similarly, visual features such as SIFT [Lowe, 2004], Gabor
Filter banks [Mutch and Lowe, 2006] and Haar wavelets [Viola
and Jones, 2001] enable reliable object detection and recogni-
tion, especially when combined with machine learning meth-
ods such as Boosting using AdaBoost [Freund and Schapire,
1997]. However, these computer vision techniques rely heavily
on a priori knowledge of objects and their surroundings, such
as initial guesses of foreground-background pixels, which is
difficult to obtain autonomously in real world situations.
This paper presents a robotic system that applies physical
actions to segment and model new objects using vision. The
system is composed of a robot arm that moves objects within
its workspace inside the field of view of a stereo camera
pair. The arm-camera geometry is configured to mimic a
humanoid platform operating on objects supported by a flat
table. A photo of our robotic system is shown in Figure 1.
The checkerboard pattern is used to perform a once-off arm-
camera calibration prior to robotic experiments.
Fig. 1. Robot System Components
Physical actions can reduce the need for prior knowledge
by providing foreground-background segmentation. However,
a robot will require significant training and background in-
formation to perform object manipulations autonomously. By
limiting our scope to objects that exhibit bilateral symmetry
in a perpendicular manner to a known plane, such as cups
and bottles resting on a table, we propose a partial but robust
solution to this problem. Given that many objects in domestic
and office environments exhibit sufficient bilateral symmetry
for our autonomous system, our symmetry-based approach can
be employed in a wide variety of situations. Experiments showthat our robot is able to autonomously segment and model
new symmetric objects through the use of controlled physical
actions. Object recognition experiments confirm that the robot-
collected models allow robust recognition of learned objects.
A. Object Segmentation
We define object segmentation as the task of finding all
pixels in an image that belong to an object in the physical
world. An object is defined as something that can be manip-
ulated by our robot, such as a cup or bottle. Whereas image
segmentation methods generally rely on consistency in adja-
cent pixels [Pal and Pal, 1993], [Skarbek and Koschan, 1994],
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
motion mask. Both problems can be seen in Figure 7(c). We
can overcome these problems by using the object’s symmetry
to our advantage.
(a) Before Nudge (b) After Nudge (c) Frame Difference
(d) Compressed Differ-ence (e) Symmetry Filled (f) Segmentation Result
Fig. 7. Segmentation by Compressed Frame Difference. The CompressedDifference and Symmetry Filled images are rotated so that the object’ssymmetry line is vertical
The compressed frame difference is shown in Figure 7(d).
This image is generated by removing the pixels between the
symmetry lines in the frame difference image, compressing
the two symmetry lines into one. This process also removes
changes in the object’s orientation caused by the robotic
nudge. Notice that the compressed frame difference no longer
includes many background pixels. The motion gap present
in the raw frame difference image is also smaller in thecompressed frame difference.
A small motion gap may remain in the compressed frame
difference. This can be seen in Figure 7(d) as a dark V-
shape bisected by the symmetry line. To remedy this, we
again exploit object symmetry to our advantage. The result
in Figure 7(e) is obtained by following the symmetry filling
process illustrated in Figure 8.
Fig. 8. Symmetry filling process used to generated the result in Figure 7(e)
Recall that the compression step merges the symmetry lines
of the object in the before and after frames. Using this newly
merged symmetry line as a mirror, we search for motion on
either side of it. A pixel is considered moving if its frame
difference value is above a threshold. These pixels are coloured
gray in Figure 8. The filling process marks all pixels from
the symmetry line to the outer most symmetric pixel pair as
moving. This allows the process to fill motion gaps in the
interior of an object while retaining asymmetric parts of a
symmetric object such as the handle of a mug. The object
segmentation result in Figure 7(f) is obtained by using the
symmetry filled image as a mask.
V I. PICKING UP AND ROTATING NUDGED OBJECTS
The object modelling process begins after the robotic nudge.The robot uses the object segmentation results from both
cameras to estimate the height of the object. The top of the
nudged object in the image is determined by following the
object’s symmetry line upwards. The top of the object is
where its symmetry line intersects with the object-background
boundary of its segmentation. Figure 9 visualizes an object’s
symmetry line and the top of the object as detected by our
robotic system.
(a) Left camera image (b) Right camera image
Fig. 9. The top of a nudged object’s symmetry line as detected by the robot
Figure 10 illustrates how an object’s height is estimated
using its symmetry axis. The symmetry axis is produced by
the same stereo triangulation process employed in Section III.
The blue line joins the camera’s focal point and the top of the
object as detected in the camera view. The estimated heightis marked as a black dot. Note that the estimated height has a
systematic bias that makes it greater than the actual height
of the physical object. Height estimates from the left and
right camera views are cross-checked for consistency before
attempting to grasp the object.
Fig. 10. Object height estimation using symmetry axis showing systematicbias in height estimate.
In Figure 10, r represents the object radius and d is the
systematic bias of the estimated height. In cases where the
object deviates from a surface of revolution, r represents the
horizontal distance between the object’s symmetry axis and
the point on the top of the object that is furthest from the
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
camera. The angle between the camera’s viewing direction
and the table plane is labelled as θ. Using similar triangles,
the height error d is described by the following equation. Note
that the equation assumes an object with a convex hull that
has a flat upper surface and ignores the effects of an object
appearing off centre in the camera image.
d = r tanθ (1)
For our experimental rig, which simulates the arm-camera
geometry of a humanoid platform, θ is roughly 30 degrees. As
we are only interested in robot-graspable objects, we assume
radii ranging from 30mm to 90mm. This produces a d error
value between 18mm and 54mm. To compensate for this error,
the gripper is vertically offset downwards by 36mm during
object grasping. As the vertical tolerance of the robot’s two-
fingered end effector is well over ±18mm, object grasping
is reliable as demonstrated by the experiments detailed in
Sections VIII-B and IX.
A. Training Image Collection
After estimating the nudged object’s height, grasping is
performed by lowering the opened gripper vertically along the
object’s symmetry axis. When the gripper arrives at the top of
the object, offset downwards by the height triangulation error
d, a power grasp is performed by closing the gripper. The
object is raised until most of the gripper is outside the field of
view of the stereo cameras. This helps prevent the inclusion
of end effector features in the object’s model.
Training images are collected by rotating the grasped object
about a vertical axis. Right camera images are taken at 30-
degree intervals over 360 degrees to produce 12 trainingimages per object. The 30-degree angle increment is chosen
according to the ±15 degrees view point tolerance reported
for SIFT descriptors [Lowe, 2004]. The first two images of a
training set collected by the robot is shown in Figure 11. Each
training image is 640× 480 pixels in size.
Fig. 11. Two of twelve images in the green bottle training set. The right imagewas captured after the robot has rotated the grasped object by 30 degrees
VII. OFFLINE OBJECT MODELLING USING SIFT
The scale invariant feature transform (SIFT) [Lowe, 2004]
is a multi-scale feature detection method that extracts unique
descriptors from affine regions in an image. It is attractive
for robotic applications because SIFT descriptors are robust
against translation, rotation, illumination changes and small
changes in viewing angle.
A. SIFT Detection
Recall that the robot rotates a grasped object to collect
12 training images at 30-degree increments. After object
manipulation, SIFT detection is performed on each image in
a training set using David Lowe’s binary implementation. Our
own C/C++ code is used to match and visualize descriptors.
The locations of SIFT descriptors detected in a training image
are shown as blue dots in Figure 12(a). Note the densecoverage of descriptors over the grasped object.
B. Pruning Background Descriptors
Figure 12(a) highlights the need to prune non-object de-
scriptors before building object models. The inclusion of
non-object descriptors may lead to false positives in future
object recognition attempts. This problem will be especially
prominent when the robot is operating on objects set against
similar backgrounds.
An automatic pruning method is used to remove non-object
descriptors as well as repetitive object descriptors. The pruned
result is shown in Figure 12(b). Notice that the majorityof background descriptors, including the descriptor extracted
from the object’s shadow, have been successfully removed. Ex-
periments suggests that the remaining non-object descriptors
have negligible effect on object recognition performance.
(a) All detected descriptors
(b) Background descriptors pruned
Fig. 12. Pruning background SIFT descriptors
Pruning is performed as follows. Firstly, a loose bounding
box is placed around the grasped object to remove background
descriptors. The bounding box is large enough to accom-
modate the object tilt and displacement that occurs during
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
Fig. 24. Reconfigured robotic system. Note the different relative locations of the robot arm and cameras when compared against the old system in Figure 1
• Investigation of object recognition failure modes
Videos of the robot in action for the experiments above are
available from Extension 3. The first 4 sets of experimentsas listed above are presented in chronological order within the
video.Note that some object tracking videos have been slowed
down from 25FPS to 10FPS for ease of viewing. As object
recognition experiments are performed using passive vision no
videos are provided for them.
A. Symmetric objects with asymmetric parts
The robotic system was asked to learn a set of bottles using
a nudge then grasp approach in Section VIII-B. To see whether
the grasping approach generalized to symmetric objects with
asymmetric parts, a white mug with a handle was used totest the system. The robot was successful in nudging and
subsequently grasping the white mug. Figure 25 shows the
segmentation returned by the robot.
(a) Right camera image (b) Segmentation results
Fig. 25. Segmentation from autonomous nudge and grasp of white mug withhandle. Non-object pixels are coloured green in the segmentation result
B. Background edge pixel noise
Our robot’s reliance on bilateral symmetry is also its
Achilles heel, as multiple stages of visual processing make
use of our fast symmetry detector. The experiments presented
here attempt to disrupt the symmetry detection results by
introducing noisy edge pixels using a textured table cloth and
a newspaper. Recall from Figure 3 that our robotic system is
also designed to err on the side of caution and abort learning
attempts if anything goes wrong. As such, the experiments
also implicitly test the robustness of the system’s design.
We begin by saturating the camera image with edges using
a highly textured table cloth as seen in Figure 26. Notice the
large number of non-object edge pixels, which drowns out
the object’s symmetric edges. This results in no interesting
locations being found by the robot as the triangulated symme-
try axes do not intersect the table in a perpendicular manner.The robot correctly chose not to attempt a nudge. It may be
possible to find the object by raising the number of symmetry
lines detected. However, as all possible pairings of symmetry
lines from the left and right cameras must be triangulated to
find interesting locations, we chose to limit the number of
symmetry lines detected in each camera image to three to
avoid a combinatorial explosion in computational complexity.
(a) Right camera image (b) Fast symmetry results
Fig. 26. Fast symmetry detection failure due to large quantities of non-objectedge pixels. The top three symmetry lines are shown as green lines with edgepixels overlaid in magenta
Next, the number of noisy edge pixels is reduced by turning
the table cloth over. The symmetry detection results are shown
in Figure 27.
(a) Fast symmetry results (b) Segmentation result
Fig. 27. Successful fast symmetry detection and segmentation via roboticnudge despite the presence of background edge pixel noise
Note that the robot is able to detect the object’s symmetry
line. This resulted in successful stereo triangulation followed
by successful nudge and grasp actions. The same experiment
was also successful on the white mug from Figure 25.
Finally, in order to have finer control over the location of
background edge pixels, a folded newspaper was used as a
noise source. By moving the location of the newspaper, we
were able to produce an experiment where the robot was able
to nudge the object but correctly aborts the learning attempt
before segmentation due to failed fast symmetry tracking.
Symmetry detection results before the nudge are shown in
Figure 28 below. Note that the position of the newspaper
had to be manually fine tuned via guess-and-check in order
to generate this failure mode. In the vast majority of cases,
the newspaper had no effect on the system. A video of the
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
robot successfully performing the nudge and grasp actions
autonomously is also provided as reference.
(a) Left camera (b) Right camera
Fig. 28. Successful fast symmetry detection before the robotic nudge. Notethat symmetry tracking fails during the robotic nudge despite successful objecttriangulation. The top three fast symmetry lines are shown in green. Edgepixels are shown in red over the grayscale input image
C. Partial occlusion of target object
These experiments focused on the effects of partial oc-
clusion on symmetry tracking, the success of which is aprerequisite to proceed to segmentation and subsequent object
learning steps. Four experiments were performed using the
same white mug and pink cup from previous tests for the
sake of consistency. An asymmetric object was used to provide
the occlusion as it is invisible to our symmetry-based vision
system. A symmetric occluding object will be nudged by
the robot first as the system always attempts to actuate the
object nearest the camera. The pre-nudge and post-nudge right
camera image for all four experiments are shown in Figure 29.
By fine tuning the location of the occluding object in
experiment 2, we achieved a failure mode where the target was
detected but the robotic nudge increased the level of occlusion
too much, causing symmetry tracking to diverge. All other
experiments produced segmentations via the robotic nudge
and the robot was able to grasp the objects autonomously.
Segmentation results are shown in Figure 30. As expected,
the occlusions introduced several artefacts in the segmentation
results. However, despite the degradation in segmentation
quality, the robot was able to autonomously grasp the object
in occlusion experiments 1, 3 and 4 in Figure 29.
In occlusion experiment 1, there are two artefacts present
in the object segmentation. Firstly, a collision between the L-
shaped foam nudger and the mug’s handle during the gripper’s
descent when nudging the object caused a large rotation in
the object pose. This resulted in the O-shaped segmentationartefact on the right of the mug. Note that symmetry tracking
(a) Occlusion 1 (b) Occlusion 3 (c) Occlusion 4
Fig. 30. Segmentation results for occlusion experiments 1, 3 and 4from Figure 29. Background pixels are coloured green. Note that Occlusionexperiment 2 did not produce a segmentation as tracking failed during the
robotic nudge
converged despite the unintended collision. In addition, scene
illumination changes caused by reflections and shadows from
the robot arm also resulted in parts of the occluding object
being included in the segmentation results. In experiments
3 and 4, the segmentation results also included parts of the
occluding object and background due to lighting changes.
However, these artefacts did not affect the subsequent grasping
step.Increasing the amount of occlusion before the nudge results
in no object being detected and no robotic nudge. Overall,
we found the fast symmetry detector to be robust to partial
occlusions especially when the occluding object is shorter in
height than the tracking target. The robot was also able to
abort learning attempts prematurely if symmetry tracking fails.
This means that the robot’s object knowledge, in the form of
segmentations and SIFT features, will not be corrupted by
failed symmetry tracking in occlusion experiment 2.
D. Object collisions during nudge
In the experiments presented previously in Section VIII-A,
the robotic nudge was successful in segmenting the test
objects. However, what happens when something goes wrong
during the nudge? Here we present three experiments where
the robotic nudge causes various kinds of unexpected events.
In the first experiment, a nudged cup collides with a tennis
ball which rolls for a short period of time after the nudge. In
the second experiment, the cup collides with another cup. In
the third experiment, an upside-down bottle is tipped over by
the nudge. The robot-eye-view of each experiment before and
after the nudge is presented in Figure 31.
Collision 1 Collision 2 Collision 3
Success Success Tracking fails
Fig. 31. Collision experiments designed to cause unexpected events duringthe robotic nudge (from right). The before and after nudge images (rightcamera) are shown in the top and bottom rows respectively.
The segmentation results for experiments 1 and 2 are shown
in Figure 32. Note that as expected, the movement of the object
being hit by the pink cup resulted in segmentation artefacts.
However, as the height of the object is determined along it’s
symmetry line, autonomous grasping was not adversely af-
fected. Tracking fails to converge for experiment 3 so the robot
correctly aborts the learning attempt before the segmentation
step.
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
Fig. 29. Partial occlusion experiments. The right camera image before and after the robotic nudge are shown in the top and bottom row respectively. Eachof the four experiments are given it’s own column. Note that the robot was successful at performing the entire autonomous learning process, from nudge tograsp, apart from experiment 2. In experiment 2, symmetry tracking failed during the robot nudge, thereby successfully aborting the learning attempt beforesegmentation is performed
(a) Collision 1 (b) Collision 2
Fig. 32. Segmentation results from collision experiments 1 and 2. Note thatdespite the noisy segmentation results autonomous grasping was performedsuccessfully following the nudge
E. Object recognition failure modes
In Section VIII-B, the robot autonomously learned SIFT
models for each of the seven bottles in Figure 23 in order
to build an object database. Here, we investigate the failure
modes of SIFT recognition using the same object recognition
database. Note that the recognition results presented here are
obtained using passive vision without any robotic action.
Firstly, to see if the new lighting conditions affected recog-
nition on the learned objects, we retested the recognition sys-
tem on several objects in Figure 23. The robot was successful
in recognizing the learned objects as expected given SIFT’s
inherent robustness to illumination changes.
Secondly, we showed images of unmodelled objects to
the recognition system to see if any false positives would
be returned. The system was tested against the unmodelled
objects in Figure 33. Object recognition did not return any
false positive object matches.
Thirdly, we attempted to cause a false positive by presenting
a new brown bottle that is nearly identical to the one already
modelled. Both bottles can be seen in Figure 34. Notice the
similarity in features as the new bottle is actually the same
Fig. 33. Previously unmodelled objects used to test recognition system. Nofalse positives were returned by our system
drink with updated branding. This scenario is one that can be
encountered by a robot operating in domestic environments.
Fig. 34. New brown bottle (left) versus brown bottle already modelled inrobot’s object recognition database (right)
The new imposter brown bottle is able to cause false posi-
tives when placed at certain orientations, especially when the
Chinese text and the English lettering is visible. Surprisingly,
the number of SIFT matches is significant smaller with the
new bottle as can be seen in Figure 35. This suggests that a
higher threshold on the minimum required number of SIFT
matches can reject this false positive but will raise the risk
of missed recognition for objects with few distinctive SIFT
features.
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
Fig. 35. Example of false positive caused by the new imposter brown bottle. Note the reduced number of SIFT matches as compared to the old bottle alreadymodelled in the recognition database
F. Discussion of limitations
The experiments in this section highlight the strengths and
weaknesses of our system. As the experiments were conducted
in a different laboratory, they suggest that the proposed design
is robust to illumination changes as well as changes in arm-
camera geometry and camera-table viewpoint. Table II lists
the chance of different failure modes as experienced during
the robotic experiments. A horizontal dash indicates that
tracking is not performed as no symmetric objects are detected.
Note that the object recognition experimental results are not
included in the table as they do not make use of the whole
system.
TABLE IICHANCE OF SYSTEM FAILURE ACCORDING TO EXPERIMENTAL RESULTS
ExperimentFailure mode
No object detected Tracking diverges
Asymmetric parts Rare Rare
High background texture Common -
Some background texture Rare Rare
Textured background distractor Rare Sometimes
Minor occlusion Rare Rare
Major occlusion pre-nudge Common -
Major occlusion post-nudge Rare Common
Collision Rare Rare
Object tipping over Rare Common
Here we define failure mode as the manner in which ourrobotic system aborts object segmentation and modelling,
which does not imply complete failure of the system. As can
be seen in Table II, experiments revealed that our system has
two main modes of failure in the learning process described in
Figure 3. Firstly, object detection can fail due to overwhelming
background edge noise or the lack of object edge pixels caused
by occlusion. This results in the system stopping the object
learning process before any robotic action. Secondly, given
that an object is detected by the robot, fast symmetry tracking
can diverge during the robotic nudge. Tracking failure can be
caused by occlusion of the target object, the nudged object
being tipped over or the presence of background symmetrylines along the moving object’s trajectory. Again, the robot
will err on the side of caution by stopping the learning process
and abandoning the motion segmentation attempt. Overall,
our action-based learning approach appears to be robust to
unexpected events.
The first two collision experiments did not interrupt the
learning process but introduced artefacts in the segmentation
results. These artefacts did not affect subsequent grasping but
one can imagine scenarios with complicated object clutter
that may result in a failed grasp or further collisions between
the gripper and non-target objects. Unintentionally, the robot
gripper also collided with the white mug in the occlusion
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions
[Boykov and Jolly, 2001] Boykov, Y. Y. and Jolly, M.-P. (2001). Interactivegraph cuts for optimal boundary & region segmentation of objects inn-d images. In International Conference on Computer Vision (ICCV),volume 1, pages 105–112, Vancouver, Canada.
[Chen and Chen, 2004] Chen, J. and Chen, C. (2004). Object recognitionbased on image sequences by using inter-feature-line consistencies. Pattern
Recognition, 37:1913–1923.[Christensen, 2008] Christensen, H. I. (2008). Robotics as an enabler for
aging in place. In Robot Services in Aging Society IROS 2008 Workshop,Nice, France.
[Elgammal et al., 2000] Elgammal, A., Harwood, D., and Davis, L. (2000).Non-parametric model for background subtraction. In European Confer-
ence on Computer Vision, Dublin, Ireland.[Fei-Fei et al., 2006] Fei-Fei, L., Fergus, R., and Perona, P. (2006). One-shot
learning of object categories. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 28(4):594–611.[Fitzpatrick, 2003] Fitzpatrick, P. (2003). First contact: an active vision
approach to segmentation. In Proceedings of Intelligent Robots and
Systems (IROS), volume 3, pages 2161–2166, Las Vegas, Nevada. IEEE.[Fitzpatrick and Metta, 2003] Fitzpatrick, P. and Metta, G. (2003). Ground-
ing vision through experimental manipulation. In Philosophical Trans-
actions of the Royal Society: Mathematical, Physical, and Engineering
Sciences, pages 2165–2185.
[Freund and Schapire, 1997] Freund, Y. and Schapire, R. E. (1997). Adecision-theoretic generalization of on-line learning and an application toboosting. Journal of Computer and System Sciences, 55(1):119–139.
[Heyer et al., 1999] Heyer, L. J., Kruglyak, S., and Yooseph, S. (1999).Exploring expression data: Identification and analysis of coexpressedgenes. Genome Research, 9:1106–1115.
[Kenney et al., 2009] Kenney, J., Buckley, T., and Brock, O. (2009). In-teractive segmentation for manipulation in unstructured environments. In
In Proceedings of the IEEE International Conference on Robotics and
Automation, Kobe, Japan.[Kim et al., 2006] Kim, H., Murphy-Chutorian, E., and Triesch, J. (2006).
Semi-autonomous learning of objects. In Conference on Computer Vision
and Pattern Recognition Workshop, 2006. CVPRW ’06., pages 145–145.[Li and Kleeman, 2006a] Li, W. H. and Kleeman, L. (2006a). Fast stereo
triangulation using symmetry. In Australasian Conference on Robotics
and Automation, Auckland, New Zealand. Online.URL: http://www.araa.asn.au/acra/acra2006/.
[Li and Kleeman, 2006b] Li, W. H. and Kleeman, L. (2006b). Real timeobject tracking using reflectional symmetry and motion. In IEEE/RSJ
Conference on Intelligent Robots and Systems, pages 2798–2803, Beijing,China.
[Li and Kleeman, 2008] Li, W. H. and Kleeman, L. (2008). Autonomoussegmentation of near-symmetric objects through vision and robotic nudg-ing. In International Conference on Intelligent Robots and Systems , pages3604–3609, Nice, France.
[Li and Kleeman, 2009] Li, W. H. and Kleeman, L. (2009). Interactivelearning of visually symmetric objects. In International Conference on
Intelligent Robots and Systems, St Louis, Missouri, USA.[Li et al., 2008] Li, W. H., Zhang, A. M., and Kleeman, L. (2008). Bilateral
symmetry detection for real-time robotics applications. International
Journal of Robotics Research, 27(7):785–814.[Lowe, 2004] Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Computer Vision, 60(2):91–
110.[Markus et al., 2008] Markus, U., Thomas, P., Werner, T., Cremers, D.,and Horst, B. (2008). Tvseg - interactive total variation based imagesegmentation. In British Machine Vision Conference (BMVC), Leeds.
[Moreels and Perona, 2005] Moreels, P. and Perona, P. (2005). Evaluationof features detectors and descriptors based on 3d objects. In ICCV ’05:
Proceedings of the Tenth IEEE International Conference on Computer
[Mutch and Lowe, 2006] Mutch, J. and Lowe, D. G. (2006). Multiclassobject recognition with sparse, localized features. In Proceedings of
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, volume 1, pages 11–18. IEEE.[Pal and Pal, 1993] Pal, N. R. and Pal, S. K. (1993). A review on image
segmentation techniques. Pattern Recognition, 26(9):1277–1294.[Ray et al., 2008] Ray, C., Mondada, F., and Siegwart, R. (2008). What do
people expect from robots? In IEEE/RSJ International Conference on
Intelligent Robots and Systems, pages 3816–3821, Nice, France.
[Skarbek and Koschan, 1994] Skarbek, W. and Koschan, A. (1994). Colourimage segmentation — a survey. Technical report, Institute for TechnicalInformatics, Technical University of Berlin.
[Taylor and Kleeman, 2002] Taylor, G. and Kleeman, L. (2002). Graspingunknown objects with a humanoid robot. In Proceedings of Australasian
Conference on Robotics and Automation, Auckland.[Tsikos and Bajcsy, 1988] Tsikos, C. J. and Bajcsy, R. K. (1988). Segmen-
tation via manipulation. Technical Report MS-CIS-88-42, Department of Computer & Information Science, University of Pennsylvania.
[Ude et al., 2008] Ude, A., Omrcen, D., and Cheng, G. (2008). Makingobject learning and recognition and active process. International Journal
of Humanoid Robotics, 5:267–286. Special Issue: Towards CognitiveHumanoid Robots.
[Viola and Jones, 2001] Viola, P. and Jones, M. J. (2001). Rapid objectdetection using a boosted cascade of simple features. In Proceedings
of IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, Kauai Marriott, Hawaii, USA.[Yl-Jski and Ade, 1996] Yl-Jski, A. and Ade, F. (1996). Grouping symmet-
rical structures for object segmentation and description. Computer Vision
and Image Understanding, 63(3):399–417.
8/3/2019 Segmentation and Modelling of Visually SymmetricObjects by Robot Actions