Master thesis Master in Automatic, Control and Robotics Object recognition and grasping using bimanual robot Report Author: Aleix Ripoll Ruiz Director: Jan Rosell Call: October 2016 Escola Tècnica Superior d’Enginyeria Industrial de Barcelona
Master thesis Master in Automatic, Control and Robotics
Object recognition and grasping using
bimanual robot
Report
Author: Aleix Ripoll Ruiz
Director: Jan Rosell
Call: October 2016
Escola Tècnica Superior d’Enginyeria Industrial de Barcelona
Abstract
This document presents an algorithm to recognise and locate known bulky objects from a
workspace and compute grasps using two anthropomorphic hands. Once valid grasp points
have been found and they are kinematically reachable by the manipulator, motion planning
and a collision check are performed.
In the first place, this document presents a general architecture of the system and the
hardware devices and software implemented. Then, a detailed description of the object
recognition, grasp and motion planning and manager modules are provided. Robot
Operating System (ROS) is the framework chosen to be responsible for the communication
between nodes with appropriate messages, to manage the data from the sensors and to
execute robot tasks.
In the second place, attention is focused on the evaluation of the object recognition and
grasp system. From this chapter, the most important conclusions are:
- The camera’s calibration takes an important role in order to obtain good
performance and reliability of the system.
- This project proposes an approach to use multiple camera views to construct a
final point-cloud scene to obtain a complete 3D object.
- This project presents a satisfactory approach of filtering the desired object with its
colours instead of using its shape.
- The more complete the point-cloud is, the more reduced is the error of object
pose.
- A system which grasps a bulky object with six points, three for each hand, instead
of four as other works recommend, is a more robust system.
- The system has a bimanual manipulator with 22 DoF for each arm. Therefore,
inverse kinematic problem is a laborious process to converge to a feasible
solution.
- ROS service is less robust to service provider changes or server failures respect to
the ROS action which provides feedback on the task progress and cancelation at
any time.
Glossary
AAS Allegro Application Studio
API Application Programming Interface
AR Augmented Reality
BSD Berkeley Software Distribution
C-Space Configuration Space
DLS Damped Least Squares
DoF Degrees of Freedom
EST Expansive Space Trees
FK Forward Kinematics
FoV Field of View
FPFH Fast Point Feature Histogram
GA2H Grasping with two Allegro Hands library
GIKL General Inverse Kinematics Library
GUI Graphical User Interface
HLS Hue Lightness Saturation
ICP Iterative Closest Point
IK Inverse Kinematics
IOC Institut d’Organització i Control de sistemes industrials
KDL orocos Kinematics and Dynamics Library
KPIECE Kinematic Planning by Interior-Exterior Cell Exploration
MIT Massachusetts Institute of Technology
MPL2 Mozilla Public License 2.0
NAN Not A Number
OMPL Open Motion Planning Library
PCL Point Cloud Library
PFH Point Feature Histogram
PR Personal Robots
PRM Probabilistic Roadmap Method
ROS Robot Operating System
RRT Rapidly-exploring Random Tree
SAC-IA SAmple Consensus Initial Alignment
SBL Single-query Bi-directional Lazy collision checking planner
SPFH Simplified Point Feature Histogram
STAIR STanford AI Robot
SVD Singular Value Decomposition
ToF Time-of-Flight
UPC Universitat Politècnica de Catalunya
UR Universal Robot
URDF Unified Robot Description Format
XML eXtensible Markup Language
Contents
1. INTRODUCTION ___________________________________________ 1
1.1. Objectives and scope ..................................................................................... 2
1.2. Project activities .............................................................................................. 3
2. GENERAL ARCHITECTURE OF THE SYSTEM __________________ 5
2.1. Hardware architecture .................................................................................... 8
2.1.1. Camera sensors ................................................................................................ 8
2.1.2. Robotic Hand .................................................................................................. 11
2.1.3. Bimanual Manipulator ..................................................................................... 12
2.2. Software architecture.................................................................................... 14
2.2.1. Robot Operating System (ROS) ...................................................................... 14
2.2.2. ROS packages: openni_launch and iai_kienct2 .............................................. 16
2.2.3. C++ libraries .................................................................................................... 17
2.2.4. The Kautham Project ...................................................................................... 18
3. OBJECT RECOGNITION MODULE ___________________________ 20
3.1. Data representation ...................................................................................... 20
3.1.1. Point-clouds .................................................................................................... 20
3.1.2. Data acquisition............................................................................................... 21
3.2. Camera Calibration ...................................................................................... 23
3.3. Object recognition node................................................................................ 29
3.3.1. Point-cloud processing .................................................................................... 29
3.3.2. ROS service: objects_recognition ................................................................... 39
4. GRASP AND MOTION PLANNING MODULE ___________________ 41
4.1. Object grasp node ........................................................................................ 41
4.1.1. Approach overview ......................................................................................... 41
4.2. Robot inverse kinematics node .................................................................... 43
4.2.1. Robot model simulation ................................................................................... 44
4.2.2. General Inverse Kinematics Library (GIKL) ..................................................... 45
4.3. Motion planning node ................................................................................... 47
4.3.1. Introduction to motion planning ....................................................................... 47
4.3.2. Basic features of the Kautham Project ............................................................ 49
5. MANAGER AND EXECUTION MODULE ______________________ 51
5.1. Input data files .............................................................................................. 51
5.2. Manager node .............................................................................................. 53
6. RESULTS _______________________________________________ 55
6.1. Evaluation of the Object Recognition Module .............................................. 55
6.1.1. Camera calibration .......................................................................................... 55
6.1.2. Multiple Kinects evaluation .............................................................................. 58
6.1.3. Object pose estimation .................................................................................... 59
6.1.4. Object recognition ........................................................................................... 66
6.2. Evaluation of the Grasp and Motion Planning Module ................................. 74
7. PROJECT BUDGET _______________________________________ 77
8. ENVIRONMENTAL IMPACT ________________________________ 78
CONCLUSIONS AND FURTHER WORK __________________________ 79
ACKNOWLEDGEMENTS _______________________________________ 81
BIBLIOGRAPHY ______________________________________________ 82
Object recognition and grasping using bimanual robot 1
1. Introduction During the last decades, the number of bimanual human-like robots applications has
increased every day from tasks related to pick and place items of an industrial production
line to an emerging class of personal robots that assist older or disabled people for
cleaning, delivering items or preparing meals at home. In this scope, many of these
applications require an efficient system for sensing the environment of the robot and an
effective bimanual robot with anthropomorphic hands for executing some tasks, for
instance manipulating two objects simultaneously. The module responsible for sensing
the surroundings of the bimanual robot could provide the ability to recognise and locate
the 3D objects, avoiding collisions with external obstacles or localizing the robot while it is
mapping its environment.
This project is focused on determining if a specific object is in the workspace or not and
grasping it with a bimanual robot if so. Human beings tend to relay on visual data to react
to the world around them. Robots can carry out many tasks with sensors such as RFID
sensors, RGB-D cameras, ranging sensors, tactile sensors, accelerometers, gyroscopes,
etc. However, the performance of these sensors is light years away from biological human
sensors as well as its high costs to a high performance. Fortunately, the robotic
community has seen massive progress in low-cost hardware such as depth cameras and
anthropomorphic robotic hands.
An object recognition system finds and locates objects in the real world from the data
provided by the robot sensors and knowing a priori the objects’ models. But this task is
surprisingly difficult. In this project the stages followed for object recognition process with
3D scene captured with RGB-D cameras is discussed. These types of sensors illuminate
the scene with a pattern light to observe its deformation when it collides with the surfaces
of the scene, or blink an infrared pulse estimating the time required to fly into the scene
and bounce back to the depth camera. The first technology is used by Microsoft Kinect
360 [1] and the second one is used by Microsoft Kinect One [2].
Bimanual grasping systems allow robots to manipulate certain objects and perform
complex tasks that are not feasible with single end-effectors. In this project the system
architecture to plan motions towards grasping configurations and to perform robust
grasping operations with a system composed of two Allegro Hands [3] attached to UR5
robots [4] is presented.
The general architecture of the system, the main hardware devices and the software tools
used in this project are presented in chapter 2. This document presents in chapter 3 the
theory about point-cloud processing, the calibration system procedure and the ROS
service implemented. Chapter 4 presents how the planning of the object grasp, the robot
inverse kinematic and the motion planning have been integrated as well as the main
theory background. In chapter 5 the manager module is presented. Chapter 6 shows the
results for each module and, chapter 7 concludes and presents the future work.
2 Report
1.1. Objectives and scope
In order to clarify what it is expected during the development of this project, it is important
to establish objectives and the limits of the project.
The aim of the project is to develop a system able to locate and grasp automatically a
specific object using a bimanual robot. To accomplish it, an open C++ software
application working on the Robotic Operating System (ROS) [5] is implemented. The main
objectives are:
a) Development of a method to recognise and locate the object based on point-
cloud models using RGB-D cameras.
b) Development of the communications with a grasp and motion planning module
(already developed at the IOC) to determine the points to grasp the object
taking into account the hand tips and to find an admissible collision-free path
that connects the initial configuration of the arms to the target one.
c) Implementation of an execution module responsible for managing the
communication between the other modules.
d) Validation of the system by performing some experiments.
The project will use open software tools and hardware devices available at the IOC’s
laboratory:
- RGB-D cameras: Microsoft Kinect 360 and Kinect One.
- Bimanual manipulator: two Allegro Hands attached to two UR5 robots.
- Open software tools: The Kautham Project, General Inverse Kinematics Library
(GIKL) and Grasping with two Allegro Hands library (GA2H).
Object recognition and grasping using bimanual robot 3
1.2. Project activities
This section presents the tasks developed in this project in order to meet the objectives
initially proposed. Below, all tasks are listed and, a diagram is illustrating all activities
without time line representation.
The project consists of four stages in order to meet the initial objectives. Stage I consists
in dealing with all the tasks previous to the programming in C++. These include defining
the project activities, searching similar works (state of the art), studying theory about point-
cloud processing, motion planning and robot inverse kinematics. Stage II consists of
programming the different modules responsible for object recognition, object grasp, path
planning and robot inverse kinematic. Each of these modules are tested in Stage III in
order to study the performance and reliability of each one. At the end, Stage IV
summarises the results and conclusions in a document. During all stages, there is a
parallel Continuous stage which contains the tasks related to the project management.
For each stage, a division of tasks and subtasks has been made.
o Stage I: Previous study
o Task 1: IOC needs
T1.1: Define objectives.
T1.2: Define project activities.
o Task 2: Documentation
T2.1: State of the art.
T2.2: C++ programming language.
o Stage II: Programming
o Task 3: Create database of point-cloud models of common objects
T3.1: Database generation.
o Task 4: System architecture
T4.1: Install required packages and devices drivers.
T4.2: Definition of ROS nodes.
T4.3: Build CMake file.
T4.4: Classes and methods definition.
T4.5: Definition of types of communication required.
o Task 5: Object recognition module:
T5.1: Camera calibration system.
T5.2: Point-cloud processing system.
T5.3: ROS service: object_recognition.
T5.4: Build CMake file.
o Task 6: Grasping and motion planning module:
T6.1: Integration of GA2H.
T6.2: Integration of Kautham application.
T6.3: Build CMake file.
o Task 7: Inverse kinematic node:
4 Report
T7.1: Integration of GIKL.
T7.2: Build CMake file.
o Task 8: Manager module:
T8.1: Implementation of the manager and execution node.
T8.2: Build CMake file.
o Stage III: Experimental
o Task 9: System robustness: find unexpected issues and fix the
maximum number possible.
T9.1: Run execution node to find new unexpected issue.
o Task 10: System performance: find bad performance in certain code
and improve it as much as possible.
T10.1: Run execution node to find the lack of performance.
o Stage IV: Report
o Task 11: Results and Conclusions
o Task 12: Final review and creation of final report
o Continuous stage:
o Task R: Project reviews and weekly meetings with IOC supervisors.
The following diagram does not represent a real time line as some tasks could be finished
earlier or later.
Figure 1.1. Tasks project diagram. It is not scaled in time.
Object recognition and grasping using bimanual robot 5
2. General architecture of the system
This chapter provides useful information about the general structure of the system, the
hardware devices and the software implemented. According to the Engineering Systems
Division at Massachusetts Institute of Technology (MIT) [6], the architecture of any
complex system is “an abstract description of the entities involved of a system and how
they are related”. To construct the architecture of a system “involves determining what the
system is supposed to do and how specifically it will do it”. Moreover, the basic process of
creating the system architecture is by decomposition in which a top level concept of the
system’s functions is broken down into subsystems capable of performing subfunctions.
Knowing these definitions, the architectures of the system and the sub-systems can be
modelled.
The functional architecture [6] is designed to accomplish the system’s objectives
presented in section 1.1:
Figure 2.1. System architecture.
Figure 2.2 shows the general architecture of the Object Recognition and Grasping System
and the connectivity between its modules. The first module consists of the object
recognition process which identifies and recognises the desired objects to be manipulated
later. This subsystem estimates the pose for each recognised objects in the world
coordinate system in order to grasp and manipulate them with the bimanual robot (see
chapter 3).
The second module is responsible for determining the strategy to pick the object involved
with the robot’s end effectors and place the object to the final desired pose with an optimal
motion planning (see chapter 4).
Finally, the execution module carries out the process to manage the information given by
the other two modules (camera sensor, planner and robot information) and the input user
tasks (see chapter 5).
6 Report
Figure 2.2. Block diagram of the Object Recognition and Grasping System.
The architecture decomposition for each module has been presented. Here, the
difference between hardware and software is done as shown in [6]. Figure 2.3 shows the
architecture of the object recognition module, Figure 2.4 shows the architecture of the
object grasp and motion planning module and Figure 2.5 presents the manager and
execution module.
The difference between the physical (hardware) and non-physical (software) entities is
clear and evident in the most engineering projects like the present one. Two schemes,
one with the relations between hardware and other for the software can be constructed.
Figure 2.3. Architecture of object recognition module.
Object recognition and grasping using bimanual robot 7
Figure 2.4. Architecture of the object and motion planning module.
Figure 2.5. Architecture of manager and execution module.
8 Report
2.1. Hardware architecture
The hardware architecture refers to the identification and description of the system’s
physical components and their interaction and compatibility into the system architecture.
Figure 2.6 describes the hardware used in this project: camera sensors, bimanual robot,
two robotic hands and a computer that manages the whole system.
Figure 2.6. Main hardware components used in this project.
2.1.1. Camera sensors
Kinect 360 and Kinect One have been used in this project. Both are sensors for
videogames developed by Microsoft [2]. These cameras are widely used for the robotics
community in the field of computer vision for its low cost. The main differences between
these two cameras are the technique to estimate the distances of object surfaces, the
colour and depth camera resolution, the depth range and the field of view (FoV):
- Technology: Kinect 360 uses structured light and Kinect One uses time-of-
flight (ToF) technique.
- Colour camera resolution: Kinect 360 has 640x480 and Kinect One has
1920x1080.
- Depth camera resolution: Kinect 360 has 320x240 and Kinect One has
512x424.
- Depth range: Kinect 360 has 0.4m to 4m and Kinect One has 0.5m to 4.5m.
- Field of view: Kinect 360 has 43ºvertical and 57º horizontal and Kinect One
has 60ºvertical and 70º horizontal.
Object recognition and grasping using bimanual robot 9
The technical specifications of these two cameras are presented in Table 2.1 and Table
2.2:
Table 2.1. Kinect 360 technical specifications [1].
Kinect 360 Specifications
Technology Structured light (see section 3.1.2)
Colour camera 1280 x 1024 @12fps 640 x 480 @30fps
Pixel size 3.6μm
Spatial resolution
3mm @2m distance
Depth camera 320x240@30fps
Depth resolution 1cm @2m distance
Depth range 0.4m to 4m
Field of view (FoV)
43° vertical by 57° horizontal
Tilt motor Vertical tilt range
± 27º
Frame rate (depth and colour stream)
Up to 30fps
USB standard 2.0
Audio format A four-microphone array and signal processing including acoustic echo cancellation and noise suppression. 16-kHz, 24-bit mono pulse code modulation (PCM)
Accelerometer A 2G/4G/8G accelerometer configured for the 2G range, with a 1° accuracy upper limit.
Price 270€
10 Report
Figure 2.7 shows the main components of the Kinect 360:
Figure 2.7. Kinect 360 components [1].
Table 2.2. Kinect One technical specifications [2].
Kinect One Specifications
Technology Time-of-flight (see section 3.1.2)
Colour camera 1920 x 1080 @30fps
Depth camera 512 x 424 @30fps
Depth range 0.5m to 4.5m
Field of view (FoV)
60° vertical by 70° horizontal
Tilt motor No
Frame rate (depth and colour stream)
Up to 30fps
USB standard 3.0
Audio format Four microphones to capture sound, record audio, as well as find the location of the sound source and the direction of the audio wave.
Dimensions 24.9cm x 6.6cm x 6.7cm. The Kinect cable is 2.9m long
Weight 1.4kg
Accelerometer No
Price 450€
Object recognition and grasping using bimanual robot 11
Figure 2.8 shows the main components of the Kinect One:
Figure 2.8. Kinect One components [1].
2.1.2. Robotic Hand
Two Allegro hands have been used in this project. This hand is developed by SimLab’s
Allegro Application Studio (AAS) [3]. Allegro hand is a low-cost and adaptive robotic hand
for a variety of commercial robots. It is composed with four fingers and sixteen
independent controlled joints. For this reason, it is the good platform for grasp and
manipulation applications. It is capable of holding a variety of object geometries. The
technical specifications are presented in Table 2.3:
Table 2.3. Allegro hand technical specifications [3].
Allegro Hand Specifications
Weight Finger (0.17kg) – Thumb (0.19kg) – Total (1.08kg)
Design Anthropomorphic
DoF 16 (4 Dof x 4 fingers)
payload 5 kg
Actuation Type: DC Motor Gear ratio: 1:369 Max. Torque: 0.70Nm
Joint resolution 0.002º (measured with potentiometer)
Communication Type: CAN (NI, Softing, Kvaser or ESD CAN) Frequency: 333Hz
12 Report
Power 7.4VDC (7.0V – 8.1V) 5A minimum
System requirements
CPU: Inter Core 2 Duo or higher RAM: At least 2GB HDD: at least 2GB Graphics: OpenGL 3.0 H/W Acceleration enabled with at least 64Mb of video RAM OS: MS Windows and Linux (ROS)
Price 13400€
Figure 2.9 shows the main components of the Allegro hand:
Figure 2.9. Allegro hand dimensions in mm and its main components [3].
2.1.3. Bimanual Manipulator
The bimanual robot is composed of two UR5 robots fixed by a T-shaped structure built at
the IOC [7]. Figure 2.10 shows the system with a table to manipulate objects and two
Allegro hands (4 DoF) coupled to UR5 arms (6 DoF). The whole system tries to emulate a
human-like morphology.
Figure 2.10. Bimanual manipulator structure. Image from [8].
The UR5 robot is a collaborative arm with six joints developed by Universal Robots [4].
UR5 is designed to mimic the range of motion of a human arm with its flexibility. This robot
Object recognition and grasping using bimanual robot 13
is widely used in many applications such as industrial assembly line, food handling,
painting, pick and place, etc. The technical specifications are presented in Table 2.4:
Table 2.4. UR5 technical specifications [4].
UR5 Specifications
Weight 18.4kg
Workspace Working radius up to 0.85m
DoF 6 revolute joints
Joint ranges ±360º
speed Joints: 180º/s Tool: 1m/s
payload 5kg
Actuation Max. Torque: 2.3Nm
Repeatability ±0.1mm
Communication TCP/IP 100Mbit: IEEE 802.3u, 100BASE-TX Ethernet socket & Modbus TCP
Power supply 100-240VAC, 50-60Hz
Power consumption
200W
Price 17900€
The UR5 workspace is a sphere of 0.85m radius excluding a middle cylindrical area of
0.117m radius. Figure 2.11 shows the UR5 manipulator view and its workspace:
Figure 2.11. Workspace of the UR5 manipulator [4].
14 Report
2.2. Software architecture
The software architecture refers to the identification and description of the system’s non-
physical components and their interaction and compatibility into the system architecture.
Computer software interprets the information provided for the physical hardware from
which the system is built. Software can be divided into application software and the
system software. The first one uses the computer system to perform a specific
functionality for the benefit of the user (typical computer program). The second one
includes operating systems, which manage resources and provide common services for
other software working on top of them, and device drivers which control the devices
connected to the computer. Figure 2.12 describes the system software and the
application software used in this project:
Figure 2.12. Main software components used in this project.
The object recognition and grasping system is programmed in C++ language and it uses
the components described in Figure 2.12. Additional information of each one is presented
in the following subsections.
2.2.1. Robot Operating System (ROS)
The Robot Operating System (ROS) [5] is framework for building robot software. It started
at the Stanford University in the mid-2000s by the Stanford AI Robot (STAIR) and the
Personal Robots (PR) program. It is an open source with a large variety of collection of
tools, libraries and conventions that simplify the work of creating complex and robust robot
applications with a wide variety of robotic platforms.
Object recognition and grasping using bimanual robot 15
The ROS community allows exchanging robot software and knowledge through ROS
distributions that make easier to install a collection of software, repositories where
different institutions can develop and release their own robot software and ROS wiki
where anyone can contribute with his/her own documentation and tutorials [9].
The ROS software is structured into packages which each one contains some
combination of code, data and documentation. Each package directory has to include a
CMakeList.txt and package.xml file that describes the contents of the package and how
catkin should interact with it. catkin is the ROS build system that generates executable
programs, libraries and interfaces [10].
The ROS system allows different nodes [11] to communicate with each other, exchanging
information and data. However, the whole system needs a running ROS Master [12] in
order to notice nodes the existence of other nodes and starting to communicate with each
other. The ROS Master enables individual ROS nodes to locate one another in the
system and it tracks publishers and subscribers to topics and services.
The communications between nodes can be done with client/server or
publisher/subscriber methodologies. ROS topics [13] implement a publish/subscribe
communication mechanism and ROS services [14] and actions uses a client/server
communication method. Figure 2.13 shows a clear representation of these three
communication ways between nodes:
Figure 2.13. ROS communication mechanisms.
One of the most common way to exchange data in a distributed system is a
publish/subscribe communication mechanism implemented by topics. Before node (A)
starts to transmit data over topics, it must first advertise the topic name and the type of
message that is going to be sent. Node (B) and node (C) have to subscribe to this topic by
making a request to ROS Master. Then, both nodes will receive messages from this topic.
A topic is one-way communication and it is useful if there might be multiple nodes
listening [15].
16 Report
A service is a synchronous two-way communication that allows one node to call a
function that executes in another node. The server node specifies a function and
advertises the service. By making a request, the client node can accesses this service
and then, awaiting a response from the server node [15].
An action is asynchronous two-way communication between nodes. It is similar to the
request and response of a service, for actions a goal and a result respectively. Moreover,
the server node can provide feedback for some updates on the progress and the client
node can cancel the previously request at any time [15].
2.2.2. ROS packages: openni_launch and iai_kienct2
OpenNI SDK, libfreenect and libfreenect2 are available libraries for accessing the
Microsoft Kinect USB cameras [16], [17], [18], [19]. They search for a Kinect through serial
number, parameter necessary for distinguishing between multiple Kinects and running
them at the same time.
A purpose of this project is to use ROS to manage data between different modules that
process different type of information. There are ROS packages available that they
interpret the information provided by the camera drivers into useful data type in ROS.
Different ROS packages have been used in order to use Kinect 360 and Kinect One in the
ROS system:
1. Packages needed for launching Kinect 360 with libfreenect and OpenNI
SDK:
a. Openni_camera: publishes raw depth, RGB and IR image streams from
Microsoft Kinect 360, PrimeSense PSDK, ASUS Xtion Pro or Pro Live
cameras using the OpenNI SDK driver standard [19], [20].
b. Openni_launch: launches RGB-D processing through rgbd_launch with
the OpenNI driver [21].
c. Rgbd_launch: opens an OpenNI SDK device to convert raw
depth/RGB/IR streams to depth images, disparity images and registered
point-clouds. This file is launched internally from openni_launch [22].
2. Package needed for launching Kinect One with libfreenect2 and OpenNI
SDK:
a. Iai_kinect2: this is a collection of tools and libraries for ROS to interface
to the Kinect One device using libfreenect2. It contains a calibration tool
for calibrating the IR sensor to the RGB sensor and depth
measurements. It converts raw depth/RGB/IR streams to depth images
and registered point-clouds [23].
Object recognition and grasping using bimanual robot 17
2.2.3. C++ libraries
This section presents different high-level C++ libraries providing useful utilities. Eigen is
used for basic matrix and vector operations such as computing a translation and rotation
of a system reference frame [24]. Basic colour image processing algorithms are managed
by using OpenCV library [25]. Point Cloud library (PCL) offers most important point-cloud
processing algorithms needed for the object recognition application [26]. Grasping with
two Allegro Hands library (GA2H) and the General Inverse Kinematics library (GIKL) are
two libraries developed at the Industrial and Control Engineering (IOC). GA2H is providing
the grasp points of the 3D object and GIKL computes the inverse kinematics of our
bimanual manipulator to reach the grasp points. Some definitions for each library have
been presented:
1. Eigen: is a C++ library for linear algebra, matrix and vector operations and
geometrical transformations. Eigen is an open source library licensed under
Mozilla Public License 2.0 (MPL2) [24].
2. OpenCV: is an open source library licensed under Berkeley Software
Distribution (BSD) that provides real-time computer vision applications.
OpenCV was developed to provide a common infrastructure for computer
vision applications and to accelerate the use of machine perception in the
commercial products. OpenCV’s application areas include: facial detection and
recognition, human gesture recognition, object segmentation and recognition,
motion tracking, produce 3D point-clouds from stereo cameras, stitch images
together to produce a high resolution image, etc [25].
3. Point Cloud Library: is an open source library for 2D and 3D image and
point-cloud processing under Berkeley Software Distribution (BSD) license
and thus free for commercial and research use. The PCL library has numerous
algorithms for point-cloud processing such as filtering outliers from noisy data,
stitch 3D point-clouds together, feature extraction to recognise objects based
on the geometric appearance, surface reconstruction, registration, segment
relevant parts of a scene, etc [26], [27].
4. Grasping with Allegro Hands: is a C++ library developed at the Industrial
and Control Engineering (IOC). The library computes the grasp points of a 3D
object in order manipulate it with two Allegro Hands mounted in a bimanual
manipulator (see section 2.1.3). The algorithm also provides the inverse
kinematics and the motion planning solutions to grasp the 3D object and
manipulate it [8].
5. General Inverse Kinematics Library: is a C++ library developed at the
Industrial and Control Engineering (IOC). The library deals with the main
robotic kinematic problems of any robot-tree like structure. It provides simple
enhancements to the Orocos Kinematics and Dynamics library (KDL) issues,
implements analytical Jacobian solutions for specific manipulators and offers
alternatives for the numerical inverse kinematic solver such as Damped Least
18 Report
Squares method, not included in KDL. Additional information can be found in
the Annex A.
2.2.4. The Kautham Project
The Kautham Project is a software tool developed at the Industrial and Control
Engineering (IOC) for teaching and research in robot motion planning [28]. It is developed
in C++ programming language and it is using the Open Motion Planning Library (OMPL)
for the large number of planners that it provides [29]. OMPL is an open source library
under the terms of a Berkeley Software Distribution (BSD) license. The library offers a
large number of sampling-based motion planning algorithms such as Probabilistic
Roadmap Method (PRM), Rapidly-exploring Random Tree (RRT), Expansive Space
Trees (EST), etc.
OMPL does not contain any method for collision detection or visualization tool [30], but
Kautham uses QT for the user interface, the Coin3D library for the graphic rendering,
PQP library for the collision checker and ODE library for the dynamic simulation of the
rigid bodies. XML format is used for the input problem formulation. Robots and 3D
obstacles are defined as kinematic chains from Denavit-Hartenberg file or Universal
Robotic Description Format (URDF) file. The following figure shows the Kautham Project
software application:
Figure 2.14. Kautham graphical user interface.
Different basic planners can be easily used and parameterized, allowing users to learn in
deep each of them. This application software can simulate the same problem with several
planners and different strategies changing their parameters such as maximum sampling
points, maximum nearest neighbours, maximum distance to connect two nodes without
collision, sampler method, etc [28]. An example of the configuration space of a problem
solved using a PRM planner is shown in Figure 2.15:
Object recognition and grasping using bimanual robot 19
Figure 2.15. Visualization of the configuration space of a problem solved with a PRM.
The user can fix different solver limitations such as the number of runs to simulate each
problem, maximum memory used or maximum time to find a solution. Kautham creates a
benchmarking output file which contains the simulation results [28]. An example of the
benchmarking output is provided comparing the execution times in seconds of PRM, EST
and Single-query Bi-directional Lazy collision checking (SLB) planners:
Figure 2.16. Example of boxplot in the benchmarking output.
20 Report
3. Object Recognition Module
The object recognition module identifies specific 3D objects models within a captured 3D
scene with colour information. Moreover, the object’s pose with respect to the world
coordinate system is provided. Figure 3.1 shows the block diagram of the module
indicating the inputs and outputs:
Figure 3.1. Object Recognition Module block.
A point-cloud data definition, a data acquisition, a way for calibrating RGB-D cameras and
point-cloud processing pipeline are explained in the following subsections. At the end, this
section presents how a ROS node is constructed to advertise a ROS service with the
functionality described, if any client requests an object recognition from a 3D scene with
colour information.
3.1. Data representation
An image taken with a common optical camera represents a projection of a partial 3D
world scene to a 2D plane. The 2D digital image consists of a numerical matrix which its
corresponding values are the light intensity. The matrix elements are called pixels. Each
pixel can have one or more channels. One channel could define a grayscale image and
three channels a colour image. When the captured image adds, for each pixel, the
information of distances in Cartesian space, the data is representing a 3D world scene.
3.1.1. Point-clouds
A point-cloud is a data structure used to represent a collection of multi-dimensional points
and is commonly used to represent three-dimensional data in some coordinate system.
These points are usually defined by X, Y, Z geometric coordinates and often are intended
to represent the external surface of an object. Moreover, a point-cloud can also be defined
with the corresponding colour information, so, each point includes the light intensity value
of red, green, blue and transparency. The point-cloud becomes 4D when colour
information is present.
A point-cloud can be generated artificially from a computer program, for instance, creating
a point-cloud sphere with algebra, or can be acquired from hardware sensors such as
Object recognition and grasping using bimanual robot 21
stereo cameras, 3D scanners or time-of-flight cameras (see 3.1.2). Figure 3.2 shows a
point-cloud example. The coordinate system of the point-cloud is set with respect to a
fixed coordinate system. If the point-cloud is representing a 3D world scene, then the fixed
coordinate system is usually the sensor origin.
Figure 3.2. Point-cloud example with different objects on the table. Image from [26].
3.1.2. Data acquisition
There are several ways for estimating the distance of an arbitrary object in the 3D scene
and to convert it to a point-cloud. This section presents some hardware sensors such as
stereo cameras, 3D scanners or time-of-flight cameras and how they can construct a
point-cloud with colour information captured in real scene. The type of sensor chosen
depends on the situation and the application to develop. Sometimes, it requires point-
cloud with high resolution, a faster sensor with high frequency rate or a point-cloud without
noise signals.
RGB-D camera sensor combines information about colour and depth. This allows
capturing texture from colour image and geometry of an object in the scene from depth
sensor. This camera sensor contains:
- Optical sensor for getting the colour information.
- Depth sensor composed by infrared ray emitter and infrared sensor which
estimates the distances of the objects in the scene.
Some types of RGB-D cameras are:
1. Stereo cameras: imitates human binocular vision because uses two or more
lenses with a separate image sensor. So, it gives the ability to capture 3D
images [31].
2. Time-of-flight (ToF): measures the time that a ray or pulse of light takes to
travel certain distance between the camera and the object. The entire scene is
22 Report
captured with single pulse of light, as opposed to point-by-point with a laser
beam. ToF camera has an active light source that is used to illuminate the
scene. The camera registers the light reflected from the scene and depth
information is calculated [32].
3. Structured light: is a technique to measure 3D surfaces by the projection of
light patterns of infrared light. This patter is seen distorted when looked from a
perspective different from the projector. By analysing this distortion, information
about the depth can be retrieved and the surface reconstructed. The main
advantage is that the projected features are easily distinguished by the camera
[33].
How the point-cloud is constructed from the optical camera and the depth sensor is not
trivial. Note that a camera calibration is required in order to obtain the correspondences
between the pixels from the colour and depth sensors because both sensors are not in
the same position inside the camera. A translation and rotation matrix is needed to relate
both sensors. Section 3.2 shows the procedure for calibrate the RGB-D camera and an
example of uncalibrated and calibrated point-cloud is shown in Figure 3.5.
Kinect 360 and Kinect One have been used in this project and the main camera
specifications for both are presented in section 2.1.1. The following figures show the
images from Kinect 360 and the point-cloud generated.
Figure 3.3. Example of images taken with Kinect 360.
Object recognition and grasping using bimanual robot 23
Figure 3.4. Coloured point-cloud generated with Kinect 360.
3.2. Camera Calibration
Object recognition, navigation systems or 3D scene reconstruction are examples of
robotics applications that use the camera calibration parameters. The process of camera
calibration gives us a mathematical model of the camera’s geometry, a distortion model of
the lens and the pose of a camera with respect to coordinate frame. Camera calibration
can be divided into two main stages:
Calibrate intrinsic parameters: such as lens distortion, focal length, optical
centre and for RGB-D cameras, colour and depth image offsets.
Calibrate extrinsic parameters: the position and orientation of a camera in a
reference coordinate frame.
The intrinsic model transforms a 3D point in real world scene into a corresponding 2D
point in camera image plane. For RGB-D cameras, the intrinsic calibration solves the
colour and depth offsets because optical and IR sensors are not in the same position
inside the camera. Usually, the intrinsic calibration needs to be done only once per
camera, it makes sense to save the estimated model after a successful calibration into a
file such as XML or YAML format [34], [35]. Later, just loading these files into the program
will allow skipping the calibration stage. Reliable calibration methods already exist, which
are widely used [23], [36].
The default intrinsic parameters of the Kinect 360 provided for the ROS openni_camera
[37] are quite accurate and an intrinsic calibration procedure has not been applied.
However, Kinect One has been calibrated by the method provided by the ROS iai_kinect2
package [23] which gives the camera intrinsic matrix, the distortion coefficients of the lens,
the rotation matrix and the projection matrix for RGB and IR cameras and saves the
information into a YAML file. Moreover, another YAML file is generated containing the
matrix which relates both cameras containing the rotation and translation from RGB to the
24 Report
IR cameras. Figure 3.5 shows how the colour is now correctly applied on the depth data.
This is clear evident around edges like the edge of the Stäubli robot or the rugby ball.
Figure 3.5. Uncalibrated and calibrated point-cloud.
The iai_kinect2 uses a chessboard pattern, a symmetric or an asymmetric circle grid to
calibrate the intrinsic parameters. Typical calibration steps for optical and IR depth
sensors are:
1. Starting at a short distance where the calibration pattern covers most of the
image.
2. Tilting the calibration pattern vertically and horizontally.
3. Move the pattern further away.
4. Move the pattern with different orientations around most of the image.
Figure 3.6. Calibration steps with chessboard pattern.
Object recognition and grasping using bimanual robot 25
Figure 3.7 shows the pattern used to calibrate optical and IR of Kinect One:
Figure 3.7. Chess5x7x0.03 pattern detected for optical and IR sensors.
The extrinsic calibration determines the transformation of the camera coordinate system
to world coordinate system. This transformation can be decomposed into a rotation and a
translation. Homogeneous transformations based on 4x4 real matrices have been used
most often within the robotics and computer vision community. The homogeneous matrix
is described below in a block-partitioned form [38]:
(Eq. 3.1)
where is the relative orientation and
is the relative position of frame A relative to
frame B.
Pattern transformation matrix is known and the current pose of the camera with
respect to the pattern coordinate system is also provided . Then, the camera
coordinate system with respect to world coordinate system is determined by multiplying all
the homogenous transformation matrices:
(Eq. 3.2)
26 Report
Figure 3.8 shows how multiple cameras can be calibrated with the same pattern
coordinate system.
Figure 3.8. Extrinsic calibration of two cameras with the same pattern.
Figure 3.9 presents how a single camera can be calibrated with multiple patterns. This
process is more robust to determine the pose of a camera with respect to the world
coordinate system, because it gets the information of multiple patterns.
Figure 3.9. Extrinsic calibration of a single camera with multiple patterns.
Note that symmetric markers such as chessboard (Figure 3.7) used for intrinsic
calibration, they do not provide a correct orientation of the camera with respect to the
world coordinate system. The algorithm is not able to distinguish when the pattern is
oriented up or down. This issue is solved by using asymmetric patterns. Reliable
Object recognition and grasping using bimanual robot 27
algorithms already exist, which are widely used:
1. Asymmetrical circle pattern of OpenCV library [39].
2. ARToolkit: calculate the real camera position and orientation relative to
physical markers in real time [40].
3. ROS package AR_pose: is an Augmented Reality marker estimation that uses
ARToolkit. It provides a transform between the camera and a single AR marker
or an array of transforms for multiple AR markers [41].
4. AR Track Alvar: is newer and more advanced than the ARToolkit, which has
been the core for other ROS AR markers packages [42].
AR markers are print outs looking much like QR code. They have a special structure that
makes it easy for the common object recognition algorithms to find them inside an image
or camera stream. First, they have a clear border that detaches the marker from the
outside world. Second, inside this border there is a unique pattern that is easily
recognisable by the image recognition algorithms as it provides clear edges. Third,
knowing the size of the marker helps to transform the 2D image coordinates to 3D world
coordinates. However, all types of patterns have some restrictions such as the marker
size and the angle of the camera looking to the marker.
Figure 3.10. Three examples of AR tags [42].
OpenCV algorithms have been implemented for its simplicity instead of external ROS
packages which generates a ROS package dependency. Therefore, asymmetrical circle
grid is used (see Figure 3.13) and AR markers will be integrated in further work.
The method developed to calibrate the extrinsic parameters of a camera is the same
whatever the camera we use. The camera driver, that decodes the sensor information,
has to be adapted to ROS message topic in order to provide the image colour data to the
extrinsic camera calibration algorithm. Figure 3.11 shows the inputs and outputs of the
extrinsic calibration function. The needed inputs are: the colour image topic, intrinsic
camera parameters and the position and orientation of all asymmetric circle grid patterns
in the scene referred to the world coordinate system. The output is the position and
orientation of the camera in the world coordinate system.
28 Report
Figure 3.11. Extrinsic camera calibration block.
A diagram of the code structure is shown in Figure 3.12. Two OpenCV functions have
been used for the generation of the extrinsic camera matrix ( ): findCirclesGrid and
solvePnP. The first one attempts to determine whether the input image contains a grid of
circles. The function locates centres of the circles and returns a non-zero value if all of the
centres have been found [43]. The second one estimates the object pose with respect to
the pattern coordinates, given a set of object points, their corresponding image
projections, the camera matrix and the distortion coefficients [43].
Then, Object3DChessboard determines an array of object points in the pattern
coordinate system. This object points are correlated with the centre of the circles
recognised in the input image. Finally, the pose of the camera in the world coordinates
system is determined by a simple transformation. Knowing the position and orientation of
the detected pattern and an average translation ( ) and an average rotation
( ) of the camera with respect to the pattern detected, the extrinsic camera matrix is
set:
: is an output rotation vector that brings points from the model coordinate
system to the camera coordinate system. This vector is based on the
Rodrigues’ rotation formula which it is an efficient algorithm for rotating a vector
in space given an axis and angle of rotation [43].
: is an output translation vector.
Figure 3.12. Block diagram of camera set_pose function.
Object recognition and grasping using bimanual robot 29
Figure 3.13 shows a detected asymmetric circle grid 4x11 located in front of the camera.
Figure 3.13. Detected an asymmetric circles grid 4x11 used for extrinsic camera calibration.
3.3. Object recognition node
The most important concepts presented in this section are the point-cloud processing
algorithms implemented such as filtering, segmentation, keypoints detection and features
extraction and how the object recognition node is advertising a ROS service in order to
provide object recognition if a client requests it.
3.3.1. Point-cloud processing
A general processing pipeline is presented in Figure 3.14. The main stages are: point-
cloud filtering to remove noise which can cause a poor system performance, a reduced
point-cloud density can decrease the computational cost for processing in the following
stages, a segmentation stage permits to distinguish which are the points of the 3D object
to recognise from the entire point-cloud scene captured, descriptors extraction is used to
get the geometric information more relevant of the object before the comparison with an
object model and the last stage computes the position and orientation of the matched
object.
30 Report
Figure 3.14. Point-cloud processing pipeline.
The RGB-D cameras generate point-clouds with different point densities. Additionally,
measurement errors cannot be avoided which complicates the estimation of point-cloud
parameters such as surface normals. Consequently, these irregularities lead to erroneous
values and fail to object recognition process. However, these issues can be solved
applying some algorithms.
Point-cloud filtering process consists in removing the points out of interest which they do
not provide useful information such as points at infinite caused by unexpected ray
reflections in the scene, unpredictable sensor noise, points with “not a number” (nan)
value, massive amount of data, occlusions, etc. The PCL’s functions used to afford these
issues are:
1. PassThrough filter [44]: makes simple filtering along a specified dimension,
cutting off values that are either inside or outside a given user ranges. This
allows setting a workspace where the objects will be placed for the user to
recognise them. Figure 3.15 shows how this function works where one image
contains the original point-cloud and the other one only the points inside the
workspace fixed by the user.
Object recognition and grasping using bimanual robot 31
Figure 3.15. Example of pass through filtering.
2. removeNaNFromPointCloud [45]: removes all points in the input cloud which
contains a nan value. Most PCL functions used in following stages cannot
afford nan values. Consequently, they cannot process the point-cloud if nan
values are not removed before.
3. VoxelGrid filter [46]: this PCL function is used for downsample the point-cloud
in order to reduce the number of points. The method consists in use a 3D voxel
grid like a 3D box to approximate all the points inside with their centroid. This
approach is slower than approximating them with the centre of the voxel but it
represents the object surface more accurately. Figure 3.16 shows how this
function works where one image contains the original point-cloud and the other
one a downsample applied.
Figure 3.16. Downsampling a point-cloud using a voxel grids.
Two additional functions have been developed:
4. Colour filtering: removes all points which are not closed to the colour
threshold fixed by the user. The function is following the HLS model
representation which rearranges the geometry of RGB in order to be more
intuitive than cartesian cube representation. HLS is defined as double cone or
cylinder and it stands for hue, lightness and saturation. Two vertices correspond
to the white and black colours, the angle defines the hue value, the distance to
32 Report
the vertical axis defines the saturation value and the distance of the white and
black axis corresponds to the lightness.
Figure 3.17. HLS model representation.
In order to compare the colours of the point-cloud and the colour threshold fixed by
the user, it is convenient to obtain the pure hue values of the point-cloud. Note that
points closed to white and black values are always filtered. A radius threshold is
applied in order to consider which are the values closed to colour reference and
which are the ones close to white and black colours (see Figure 3.18). Therefore,
a point-cloud is set at his maximum saturation and a half of lightness if a point is
inside the blue region.
Figure 3.18. HLS model: three regions delimited. Points in region blue are considered as colour.
Figure 3.19 shows the original and the final point-cloud after applying colour filtering:
Figure 3.19. An example of maximum colour saturation and a half of lightness.
Object recognition and grasping using bimanual robot 33
5. Cluster filtering or radius outlier filtering: discard clusters with low density
points. The user specifies a number of neighbours which every index must have
within a specified radius to remain in the point-cloud. For example, if two
neighbours and a radius are specified, blue points will be removed from the
point-cloud (see Figure 3.20).
Figure 3.20. An example of point cluster filtering.
Figure 3.21 shows the selected clusters with low density of points to be removed
from the original point-cloud.
Figure 3.21. Cluster filtering process.
Segmentation methods divide large amounts of data into smaller clusters in order to
regroup elements with the same properties and decreasing the computational cost for
processing the data in the next stages. The segmentation method implemented is based
on colour-based region growing algorithm. Region growing approach consists in
connecting two points within a specified distance. If a point with the same properties is not
close to another, the algorithm splits the remaining point-cloud into separate clusters.
Colour-based region growing uses colour instead of points normals. Clusters are made by
merging points close to the same colour. Then, two neighbouring clusters with a small
difference between average colours are merged together. During this step every single
cluster is verified by the number of points that it contains. If this number is less than the
threshold specified, the current cluster is merged with the closest neighbouring cluster.
34 Report
Figure 3.22 shows how this RegionGrowingRGB function [47] is splitting the whole
scene in different clusters based on colour:
Figure 3.22. Segmented scene with colour-based region growing.
Figure 3.23 and Figure 3.24 present how the object recognition module is capable to
detecting single or multiple objects. This is possible with the colour-based region growing
segmentation where each cluster detected could be an independent object (see Figure
3.23) or clusters could form a single object with different colours (see Figure 3.24):
Figure 3.23. Segmented scene with multiple objects.
Object recognition and grasping using bimanual robot 35
Figure 3.24. Segmented scene with single object with different colours.
At this point, all clusters have been taken as a keypoints to proceed the following stages.
The keypoints contain the important points that can describe the entire cloud. For this
reason, the input cluster can be downsampled using the VoxelGrid [46] with specific leaf
size and all sub-sampled points become our keypoints. Then, the normals of these points
are used to build the shape histogram used in the descriptors. The normal is the
information related with the curvature of the surface and they can be computed with
NormalEstimation function [48].
Object recognition compares different set of points or templates in order to find
correspondences between models. The information that uniquely characterizes a point is
known as descriptor or point-feature representation [49]. Descriptors represent the
information of an image or point-cloud and they are a fundamental part of a system for the
object recognition. A descriptor is build with the information of its neighbourhood and its
normals. If the point-cloud is not dense, the descriptors will have lack of information about
the curvature of the object [50]. The PCL function used to extract a descriptor is:
1. FPFHEstimation [51]: class to compute Fast Point Feature Histogram (FPFH)
[52] descriptors from the input point-cloud and its surface normals.
FPFH descriptor is a local descriptor that improves the performance of Point
Feature Histogram descriptor (PFH) by only considering the direct connections
between keypoints and its neighbours. FPFH removes additional links between
neighbours.
A PFH model is based on the relationships between the points in the k-
neighbourhood and their estimated surface normals. Figure 3.25 presents a
diagram of the PFH computation:
36 Report
Figure 3.25. The influence region diagram for a PFH.
Point Pq marked in red and placed in the middle of a circle (sphere in 3D) with
radius r, is fully interconnected in a mesh with all its k neighbours (points with
distances smaller than the radius specified). Then, the final PFH descriptor is
computed with the relationships between all pairs of points in the
neighbourhood and it represents a computational complexity of O(nk2) being n
the keypoints and k the number of neighbours for each point [52].
Besides, FPFH algorithm reduces the computational complexity of the
algorithm PFH to O(nk). This method computes a descriptor of a point Pq only
considering pairs between itself and its neighbours (red lines illustrated in
Figure 3.26) using a Simplified Point Feature Histogram (SPFH) [52].
An additional step is performed to compensate this loss of information from the
connections between neighbours. FPFH adds to the Pq point a weighted SPFH
from its neighbours according to distance. This procedure incorporates
additional information about the connections between neighbours and the point
Pq until 2r distance [52].
(Eq. 3.3)
where represents the distance between point Pq and a neighbour point Pk.
The extra FPFH connections are shown with black lines in Figure 3.26. As the
diagram shows, some of the value pairs will be counted twice (marked with
thicker lines in the figure).
Object recognition and grasping using bimanual robot 37
Figure 3.26. The influence region diagram for a FPFH.
To sum up, FPFH has been implemented in this project because it reduces the overall
complexity of PFH making possible to use it in real-time applications. FPFH includes
additional point pairs outside the r radius sphere at most 2r away but it does not fully
interconnect all neighbours of point Pq as it can be seen from the Figure 3.26.Some value
pairs is missing which might contribute to loss information of the geometry around the
point. However, the re-weighting scheme recaptures some of the point neighbouring
values using SPFH as presented in equation above.
The final stage presented in Figure 3.1 consists in an alignment process called template
registration or matching. By registering a template to a new point-cloud, the position and
orientation of the object can be determined. Moreover, a fitness score of the alignment is
provided. Two PCL functions have been implemented to deal with the registration stage:
1. SampleConsensusInitialAlignment [53]: this method takes an object
template as input and aligns it to the target cloud. Looking at the fitness score
gives an idea of how successful the alignment was, and looking at the
transformation matrix tells us the position and orientation of the object aligned
to the target cloud. This matrix can be described in a block-partitioned form as
shown Eq.3.1.
The Sample Consensus Initial Alignment procedure (SAC-IA) [52] uses point-
cloud intrinsic, rotation invariant features. However, the computational
complexity is high because the matching step requires analyzing all possible
correspondence pairs searching similar histograms in both point-clouds.
2. IterativeClosestPoint (ICP) [54]: minimises the difference between two point-
clouds iteratively [55] and a transformation is estimated based on Singular
Value Decomposition (SVD). ICP requires accurate initialization and requires
mostly overlapping scans. This function has three termination criteria: a
38 Report
maximum number of iterations, a minimum difference value between the
previous and the current estimated transformation, and a minimum Euclidean
squared error.
The fitness score is updated after ICP alignment. A high value indicates that either the
alignment between template and object scene is not the same object or the alignment
could require more iteration to get the best match between both point-clouds. A lower
number means a better alignment. A transformation matrix that describes how template
points should be rotated and translated in order to best align with the points in the target
cloud is also provided. Note that the alignment score says a lot about the final results. The
3D object is recognised in the entire scene and its position and orientation with respect to
the world coordinate system is fixed, if this parameter has a low value.
A general registration pipeline used in this project is presented in Figure 3.27:
Figure 3.27. Block diagram of the registration procedure.
Object recognition and grasping using bimanual robot 39
3.3.2. ROS service: objects_recognition
ROS service is a communication mechanism to pass data between nodes [14]. A service
client sends a request message to a service server and it keeps waiting for a response.
This is a limitation of ROS services because the client can wait forever if the service is not
available for some reason such as server has died unexpectedly or service name was
misspelled in the client call. Service allows one ROS node [11] to call a function that
executes in another node. Each service has a unique name and a type that determines
the structure of the request and response messages in the ROS network. A client can
make a persistent connection to a service, which enables higher performance but less
robustness to service provider changes. Service calls are well suited to things that you
only need to do occasionally and that take a bounded amount of time to complete. To get
more information about ROS services, look at the API documentation [14].
Service communication has the following characteristics:
Figure 3.28. ROS service scheme.
A single node initiates the request and only one node can receive it and send back a
response. Service files are just lists of message types which can be built using existing
ROS messages [56] such as std_msgs, geometry_msgs, nav_msgs or sensor_msgs, or
they can be ones you have defined yourself. ROS message is a simple data structure
used on the communication between ROS nodes. Messages can contain standard
primitive types as integer, floating, point, boolean, etc [56].
Object recognition node is advertising a service named objects_recognition associated
to the objectsRecognition.srv service type (see Figure 3.30). When the service node
receives a client request, server executes the function srvLocateObjects. This function
proceeds to the point-cloud processing algorithm explained in section 3.3.1. When the
algorithm ends, the server makes a response to the client with a list of objects recognised
and its poses with respect to the world coordinate system. Section 6.1 presents some
tests and the quantitative results of this module. The scheme of the object_recognition
service can be seen in Figure 3.29.
40 Report
Figure 3.29. Scheme of ROS service: object_recognition.
Finally, the files created to make possible the ROS service communication are presented
in Figure 3.30. A single service file (.srv) and three messages files (.msg) are created to
build a ROS service communication. Service definition cannot embed another service
inside of a service, but messages can contain other types of messages. In the service
definition, the request and the response are separated with three dashes (---).
Figure 3.30. Service and messages types available in this project.
Object recognition and grasping using bimanual robot 41
4. Grasp and Motion Planning Module
General robotic applications need the ability to interact with and manipulate objects in the
physical world. Humans see specific objects and know immediately how they would grab
them to pick them up. Robotic grasp detection lags far behind human performance. This
module firstly focuses on, the problem of finding a good grasp given a 3D object model
and its pose and, secondly, on finding an optimal trajectory in a free-collision space
between the initial configuration of the bimanual manipulator to the target object. This
module computes precision grasps for bulky objects using two anthropomorphic hands.
Figure 4.1 shows the block diagram of the module indicating the inputs and outputs:
Figure 4.1. Object Grasp and Motion Planning Module block.
4.1. Object grasp node
This section presents the node responsible for computing the grasps for bulky objects
using two anthropomorphic hands. This type of robotic hand has been developed as
versatile end-effector tools able to perform complex grasp and manipulation tasks. There
are several approaches to find a good grasp points [57], [58], [59], but this project is
focused on the approach developed at the IOC [8]. Figure 4.2 shows the object grasping
points block with the inputs and the output elements:
Figure 4.2. Object Grasping points block.
4.1.1. Approach overview
Most projects define grasp as an end-effector configuration which achieves partial or
complete form closure of a given object. This is a challenging problem because it depends
on the pose and configuration of the robotic gripper as well as the shape and physical
properties of the object to be grasped. So, it requires a search over a large number of
possible gripper configurations.
42 Report
The work in [8] deals with the problem of how to grasp bulky objects that cannot be
grasped with only one hand, using two anthropomorphic hands simultaneously. The
software library provided considers the constraints of the hands and arms in the search of
the contact points on the surface of the object. The approach relies on a full 3D object
model to find an appropriate grasp.
The method consists of searching three reachable contact points for each hand in an
iterative way. Using frictional contact points, two fingers and the thumb from each hand
are sufficient to get robustness of the grasp. The approach has the following steps:
1. The surface of the object model is divided into two sets of slices: ,
2. A set of triplets (
,
) composed of three points with the normal
direction (
,
) are computed in each slice.
3. Each triplet of
is tested together with each triplet of
in order to
find a couple of triplets that satisfies the force-closure condition.
4. The grasp quality is evaluated as the largest perturbation that a grasp can resist
in any direction.
In the end, the algorithm returns the solution grasp (see Figure 4.3), if a free-collision path
for the whole system is found. The method tries with another couple of triplets in case that
the inverse kinematics or the collision check does not return a solution. If after exploring
the object through all triplets and its three axis any grasp points have been found, the
algorithm returns an error [8]. To illustrate the proposed approach, Figure 4.3 shows a
rugby ball divided into blue and orange slices and the resulting contact points:
Figure 4.3. Rugby ball divided in slices based on its inertial axis and the resulting contact points.
Object recognition and grasping using bimanual robot 43
4.2. Robot inverse kinematics node
Kinematics is a branch of physics that studies the motion of systems without the
consideration of forces, mass or moments that cause motion. Thus, robot kinematics
refers the analytical study of the motion of a robot manipulator analysing the relationship
between the dimensions and connectivity of kinematic chains and the pose (position and
orientation) for each link of the manipulator [38].
Therefore, it is crucial to define suitable kinematic models for a robot mechanism in order
to analyze the behaviour of manipulators. A manipulator is characterized by an arm that
provides mobility, a wrist that confers dexterity and an end-effector that performs the
robot’s task. The mechanical structure of a robot manipulator consists of a sequence of
rigid bodies, called links, interconnected by means of joints which provide pure rotation or
translation between two consecutive links. So, a kinematic model is built with a hierarchic
structure with a parent-child relationship. This means that, if a joint is rotated around an
arbitrary axis, all its children will also rotate around the same axis because they derive all
of its parent’s transformations [38].
In fact, the robot kinematics describes the analytical relationship between the joint
positions and the pose of the end-effector. Hence, the formulation of the kinematic
relationships allows the study of the forward kinematic problem and the inverse kinematic
problem. Forward kinematics uses the kinematic equations of a manipulator to compute
the position of the end-effector from specified values for the joint parameters. In contrast,
for serial manipulators inverse kinematics is a much more difficult problem because it
involves the transformation of cartesian coordinates, as positions and orientations of the
end-effector, into joint coordinates of a robot manipulator. In fact, the existence of revolute
joints causes the problem more difficult to solve because of the presence of nonlinear
equations [38]. The following figure shows the relationship between forward and inverse
kinematics:
Figure 4.4. The schematic representation of forward and inverse kinematics.
In addition, there are two main techniques that are used in order to solve an inverse
kinematic problem: the analytical and numerical one [60]. Through the first method, the
joint variables are solved analytically according to a given end-effectors’ pose, while
44 Report
through the second one, the joint variables are obtained based on an iterative technique.
There are several methods for solving inverse kinematic problems numerically. These
include cyclic coordinate descent methods, pseudoinverse methods, Jacobian transpose
methods, the Levenberg-Marquardt damped least squares methods, quasi-Newton and
conjugate gradient methods and neural net and artificial intelligence methods [61]. The
GIKL provides three Jacobian IK solvers: Transpose method, Pseudoinverse method and
Damped Least Squares method.
Figure 4.5 shows the block responsible for calculating the joints position with the robot
model topology and the object grasping points computed in the previous stage (see
section 4.1):
Figure 4.5. Robot inverse kinematics block.
In the annex A.1, additional information about robot kinematics theory and numerical
inverse kinematics solvers are provided.
4.2.1. Robot model simulation
Creating a complete virtual model of a robot by simulating components and control
programs can significantly impact the general efficiency of a project. Some benefits of
simulations could be:
Reducing costs involved in robot manufacturing.
Simulating various alternatives without involving physical costs.
Robot or components can be tested before implementation.
Demonstration of a system to determine if is viable or not.
Compatibility with a wide range of programming languages.
The Unified Robot Description Format (URDF) is an eXtensible Markup Language (XML)
[34] format for representing a robot model. The main limitation is that only tree structures
can be represented, excluding all parallel robots. Also, the specification assumes the
robot consists of rigid links connected by joints; so, flexible elements are not supported.
XML specifications for robot models are:
Sensor: describes a sensor such as a camera, ray sensor, IR, etc.
Link: describes the kinematic and dynamic properties of a link.
Transmission: transmits link actuators to joints and represents their mechanical
coupling.
Joint: describes the kinematic and dynamic properties of a joint.
Gazebo: describes simulation properties such as damping, friction, etc.
Model_state: describes the state of a model at a certain time.
Object recognition and grasping using bimanual robot 45
Model: describes the kinematic and dynamic properties of a robot structure
[62].
The above XML specifications highlighted in bold consist of:
Kinematic and dynamic description of the robot.
Visual representation of the robot.
Collision model of the robot.
The description of a robot model involves a set of link elements, and a set of joint
elements connecting the links together. The link element describes a rigid body with
inertia, collision box and visual features [63]. The joint element describes the kinematics
and dynamics of the joint and also specifies the joint limits [64]. Figure 4.6 shows the main
attributes of a link and a joint element:
Figure 4.6. Link and Joint representations inside URDF.
4.2.2. General Inverse Kinematics Library (GIKL)
Nowadays, the number of robot manipulator applications is increasing and robots’
morphology is changing accordingly to the functionality. Computing a closed-form solution
for the inverse kinematics of general manipulators could lead to complex problems in
need of a solution.
However, an inverse kinematic library which uses different numerical methods to obtain
the optimal configuration for any robot's pose could be a great alternative, mainly for the
generalization for all kinds of nDoF, tree-like structures and mobile robot manipulators and
it could also be an alternative to avoid computing the analytical formulation for each robot
that only exists in some cases.
There are some robot kinematic libraries available to satisfy our initial needs that are very
robust and have quite a good performance. Therefore, in this project, it has been decided
to choose one of these libraries and expand its functionality while improving their
performances. This is the case of the Orocos Kinematics and Dynamics library, commonly
known as KDL, which computes the robot inverse kinematics with the Jacobian pseudo-
46 Report
inverse but it has some drawbacks [65]:
1. Difficult to converge in presence of joint limits.
2. Only one Jacobian pseudoinverse method implemented.
3. Difficulties to converge with tree manipulators.
4. No actions taken when local minima appear during the iteration process.
5. Any function to load a robot from file.
Issues 3-5 above can be mitigated by simple implementation enhancements to KDL
library such as use URDF file to load any open manipulator, implement analytical
Jacobian solution for each tree manipulators and take action when appear locals minima.
Issues 1 and 2 require consideration of alternative IK algorithms such as Jacobian
Transpose and Damped Least Squares methods. All these issues are intended to be
improved in the General Inverse Kinematics Library (GIKL).
GIKL is a C++ library developed at the Industrial and Control Engineering (IOC) which it
deals with the main robotic kinematic problems of any robot-tree like structure. The
purpose of use this library in this project is to find the final joint configuration of the robot
(see Figure 2.10) to reach the grasping points of the object with two robotic hands. So, an
inverse kinematic problem has to be solved. Figure 4.7 (a particularization of Figure 4.5)
shows the input parameters required to use the inverse_kinematic function and the
output variables:
Figure 4.7. GIKL inverse kinematics block diagram.
First, the file path, where the URDF file is, has to be provided to the IK solver. The solver
constructs internally the kinematic robot structure according to the information of this file.
The solver also identifies all robot’s end-effectors. The manipulator presented in Figure
2.10 has eight end-effectors which correspond to the fingers of the two Allegro Hands.
Second, the desired positions and the orientation of the robot’s end-effectors have to be
provided. The numerical IK solver type, the initial joint configuration of the robot, the
maximum iterations allowed and the maximum error in position and orientation allowed
has to be set, too.
Finally, the inverse_kinematic function computes the final joints configuration of the
robot placing each end-effector to the target points of the object with the desired
orientation. The solver tries to find the solution before reaching the maximum iterations
and it gives the final error in position and orientation. In the annex A.2, additional
information about the GIKL structure and its functionality are provided.
Object recognition and grasping using bimanual robot 47
The complete evaluation of the GIKL library is presented in the Annex A.3. The most
important results are presented in Table 4.1:
Table 4.1. Performance of the GIKL.
Chain/Tree DoF
GIKL
pinv
solve
rate
GIKL
pinv
avg
time
GIKL
transpose
solve rate
GIKL
transpose
avg time
GIKL
DLS
solve
rate
GIKL
DLS
avg
time
UR5 5 87,50% 4,22ms 49,00% 183,04ms 88,90% 7,79ms
kukaLWR 7 100,00% 2,35ms 63,30% 158,63ms 100,00% 4,88ms
Bimanual
manipulator 44 - - - - 96.55% 1555ms
4.3. Motion planning node
Robot motion planning is responsible for finding an optimal solution plan from initial joint
configuration to target joint configuration. In addition, the algorithm searches a free
collision path in order to avoid collision of the manipulator to its environment until the
manipulator reaches the target object where the contact is accepted. Figure 4.8 shows the
block diagram of the module indicating the inputs and outputs:
Figure 4.8. Robot Motion planning block.
4.3.1. Introduction to motion planning
Motion planning problem tries to produce a continuous motion that connects a start
configuration (qini) and a goal configuration (qgoal) while avoiding collisions with known
obstacles. There are many robot applications which require the motion planning
formulation such as the mobile robot navigation, the robot autonomy optimization,
grasping objects, etc. A problem of a robot motion planning can be stated as follows:
Fix a start pose of the robot.
Set a desired goal pose of the robot.
48 Report
Describe the geometric structure of the robot.
Provide the geometric description of the world.
The algorithm tries to find a path that moves the robot gradually from start to goal in
configuration space. A configuration space (C-Space) is the set of all possible robot
configurations relative to a fixed coordinate system. Usually a configuration is expressed
as vector of positions and orientations. In addition, free space (Cfree) and obstacle region
(Cobs) are defined inside the C-Space. Free space contains all robot configurations that
avoid collision with obstacles.
The sampling-based algorithms explore the free C-Space by sampling it with enough
points that allow to find a solution. Some of these algorithms are:
The Probabilistic Roadmap Method (PRM) [66].
The Rapidly-exploring Random Trees (RRT) [67].
RRT Connect (bidirectional version of RRT) [68].
Expansive Space Trees (EST) [69].
Single-query Bi-directional planner with Lazy collision checking (SBL) [70].
Kinodynamic motion Planning by Interior Exterior Cell Exploration (KPIECE)
[71].
The Rapidly-exploring Random Trees (RRT) is a single-query planner which attempts to
solve a query as fast as possible and does not focus on the exploration of the entire map
as PRM does. RRT has a random sequence to construct a single tree rooted at qini that
grows toward qgoal configuration [67]. RRT algorithm consists of:
1. First, qrand is selected randomly from a uniform distribution in C-Space.
2. Then, the algorithm adds a new configuration (qnew) into the tree depending on
the step size and the qnear. Configuration qnew is obtained by moving qnear by step
size toward qrand.
Figure 4.9 shows the procedure of RRT to add a new configuration based on qrand and
step size from qnear:
Figure 4.9. The basic RRT construction algorithm.
It is important to take into account the tradeoff between the exploration of Cfree and the
number of samples added to the tree, especially for high-dimensional problems. If step
Object recognition and grasping using bimanual robot 49
size is small, then the exploration steps are short and the nodes of the tree are close
together. As the number of nodes becomes large, memory consumption is increased and
finding the nearest neighbour becomes expensive, which reduces the performance of the
planner [72]. Figure 4.10 shows an example of RRT problem solved:
Figure 4.10. Final RRT construction.
At the end, the implementation of a bidirectional version of RRT such as RRTconnect,
which grows two trees toward each other rooted at qini and qgoal until they are merged into
one, will improve the efficiency of the motion planning system [68].
4.3.2. Basic features of the Kautham Project
The Kautham Project is developed in C++ programming language and it is using the
OMPL library for the large number of planners that it provides. Different basic planners
can be easily used and parameterized, allowing users to learn in deep different planning
algorithms. This application software can simulate the same problem with several
planners and different strategies changing their parameters such as maximum sampling
points, maximum nearest neighbours, maximum distance to connect two nodes without
collision, sampler method, etc. Moreover, the application can visualize the solution path
for the problem formulated and animate it [28].
The main information required to define a problem is:
The path to a robot and the obstacles description file: bimanual manipulator
with Allegro Hands (URDF) and the object.
The initial and the final configurations of the robot(s): the initial and final
joint positions of the bimanual manipulator provided by the inverse kinematic
block (section 4.2).
The configuration of the obstacle(s): the object’s position and orientation
respect to the world coordinate system provided by the object recognition
module (section 3).
Other parameters of the planner.
50 Report
The Kautham Project has been encapsulated as a ROS service. Therefore, the Object
Recognition and Grasping system can initiate a ROS service communication mechanism
to pass data between the manager_node and the kautham_node. Each service has a
unique name and a type that determines the structure of the request and response
messages in the ROS network. Some of these services defined by the kautham_node
server are:
kautham_node/SetQuery
kautham_node/CheckCollision
kautham_node/OpenProblem
kautham_node/CloseProblem
kautham_node/SetInit
kautham_node/SetGoal
kautham_node/SetRobotsConfig
kautham_node/SetObstaclesConfig
kautham_node/SetRobControls
kautham_node/SetPlanner
kautham_node/SetPlannerParameter
kautham_node/Solve
kautham_node/AddObstacle
kautham_node/Connect
etc.
A service example of kautham_node is presented in Figure 4.11. In case that the system
needs to set the initial and goal queries, the manager_node requests it to the server node
and waits to a response. The initial and goal queries are determined by the initial and final
joint positions of the bimanual manipulator provided by the inverse kinematic block.
Figure 4.11. ROS service scheme of the kautham_node/SetQuery and its service file.
Object recognition and grasping using bimanual robot 51
5. Manager and Execution Module
The manager and execution module is responsible for the supervision and the control of
the different modules in the system. The manager is designed to accomplish the
requirements of the user by using the functionalities of the object recognition module and
the grasping module. Figure 5.1 shows the block diagram of the module indicating the
inputs and outputs:
Figure 5.1. Manager and Execution Module block.
The following subsections detail the user input file definition, how is constructed a ROS
client node to call a ROS service, the execution pipeline and other features are explained.
5.1. Input data files
Object Recognition and Grasping System uses simple text files to load the input user
parameters to accomplish the tasks.
One text file is used for the camera calibration setup which contains the most important
parameters needed for several OpenCV functions described in section 3.2. This file has to
include the next arguments:
1. PATTERN_TYPE: ASYMMETRIC_CIRCLES_GRID
2. PATTERN_SIZE: Number of inner corners per a chessboard row and column
3. SQUARE_SIZE: real distance measured in meters between pattern corners.
4. PATTERN_POSE: the pose of the pattern with respect to the world reference
frame described by the cartesian position in meters (x, y, z) and the angle-axis
description of the orientation in degrees (ax, ay, az, angle).
52 Report
A text file example is shown in Figure 5.2:
Figure 5.2. Example of user input file used in the camera calibration stage.
A second file provides important information to the system such as how many objects in
the scene have to be found, the model to be recognised, the hue colours of the object and
the dimensions of the workspace where the objects are. This file has to include the
following arguments:
1. OBJ_SEARCH: Single object or Multiple objects to be recognised in the scene.
2. MODEL: indicating the path where the model template is.
3. COLORS: the hue levels which the system will focus on.
4. WORKSPACE: dimensions in meters of the box where the system will focus on
finding the objects. The box is located at the origin of the world reference frame.
A text file example is shown in Figure 5.3:
Figure 5.3. Example of user input file used in the manager and execution module.
Object recognition and grasping using bimanual robot 53
5.2. Manager node
A general execution pipeline is presented in Figure 5.4. A ROS node is built to implement
these execution stages: request the object_recognition service to get the position and
orientation of the objects in the scene, GA2H is used for computing the grasping points of
the object recognised, inverse kinematic of the bimanual manipulator is computed and
finally, motion planning produces a continuous motion that connects a start configuration
and a goal configuration while avoiding collision with obstacles.
Figure 5.4. Execution processing pipeline.
First, the manager node reads the input user files described in section 5.1 and the node
sends a service request to locate the object or the objects in the scene using the
object_recognition service (see section 3.3.2). When the request is sent, the client node
is waiting for a response.
Second, the manager executes the GA2H library to compute the grasping points of the
object recognised. The inputs are the name of the model, the PCD file of the model object
and its position and orientation. The outputs are the target points on the object where the
tips of the two Allegro Hands will be placed (see section 4.1).
Then, the manager node executes the GIKL library to solve the inverse kinematic of the
bimanual manipulator. The manager sends the desired position and orientation for each
tip of both Allegro Hands. The library also needs the URDF file of the bimanual
manipulator in order to build internally the tree-like structure of the manipulator. Inverse
kinematic solver computes the joint positions with initial joint positions, a method
(pseudoinverse, transpose or DLS), a maximum iteration and error allowed and the vector
54 Report
of the desired end-effectors pose. More information is presented in section 4.2.
Finally, the manager node queries The Kautham Project to solve a basic motion
planning problem to determine a continuous motion that connects a start configuration of
the bimanual manipulator and a goal configuration. The goal configuration of the bimanual
manipulator corresponds to the final joints positions computed in the previous stage.
Furthermore, the Kautham determines the path where the continuous motion of the
manipulator avoids collisions with the known obstacles in the scene. Additional
information about motion planning can be found in section 4.3.
When the solution path is found, the Kautham Project offers the possibility to execute the
plan simultaneously in a virtual simulation and a real execution [73]. More information
about how to run the Kautham Project is explained in the Annex C.
Object recognition and grasping using bimanual robot 55
6. Results The performance of the Object Recognition and Grasping System is analyzed separately
for each module. Quantitative and qualitative results of the object recognition module
have been presented and some object grasps have been illustrated at the end of this
section.
6.1. Evaluation of the Object Recognition Module
6.1.1. Camera calibration
Calibration process of any type of sensor takes an important role in order to obtain good
system’s performances. All RGB-D cameras used in this project have been calibrated with
the methodology described in section 3.2. Object Recognition module processes the
point-cloud data provided from the cameras sensors. Therefore, the system has to
compute the intrinsic transformation matrix for the camera in order to adjust the colour and
the depth images due to both sensor offsets. Figure 6.1 shows how the colour is now
correctly applied on the depth data. This is clearly evident around edges like the edge of
the computers or the rugby ball.
Figure 6.1. Uncalibrated and calibrated point-cloud.
The extrinsic camera calibration defines the transformation of the camera coordinate
system to the world coordinate system. This process requires a definition of an
asymmetric pattern and its transformation to the world coordinate system. Then, the
camera determines its pose with respect to the world by recognising the pattern. Section
3.2 has explained in depth the asymmetric patterns used and the functions implemented
to calibrate the extrinsic parameters of a camera.
In this section, two different configurations are provided in order to study the extrinsic
calibration performance. Note that each camera views a single pattern in both cases.
Running multiple Kinects at the same time causes interference in depth sensors (see the
results in section 6.1.2). Thus, it is important to minimise a common area captured with
both depth sensors and place the cameras in such a way that the interferences caused by
the projection of light patterns of infrared light is minimised (see section 3.1.2).
56 Report
The first configuration consists of the object recognition using two cameras capturing both
sides of the object. Thus, cameras can be placed at 180º each other and two patterns
allow calibrating both cameras with respect to the world coordinate system as Figure 6.2
shows:
Figure 6.2. Object recognition with two cameras and two patterns.
The second configuration consists of the object recognition using two cameras which are
capturing different sides of the object and a common area of the object. Both cameras are
placed almost 90º each other in order to detect the same pattern and minimising the
common area of the object captured.
Figure 6.3. Object recognition with two cameras and a single pattern.
The main problem of using two patterns to find the transformation matrix that links the
cameras to the world coordinate system is the numerical error committed. Thus, when two
point-clouds are merged, there is an offset in translation and rotation. This fact can be
Object recognition and grasping using bimanual robot 57
minimised setting both patterns in the world coordinate system with high precision but
other factors can still add numerical errors in the transformations.
Besides, the second configuration presents an object recognition system with two
cameras and a single pattern. This reduces the numerical error because both cameras
are focusing on the same pattern and thus, the same transformation matrix is applied for
both cameras. The results of merging two point-clouds captured in both configurations
presented can be seen in Figure 6.4:
Figure 6.4. Results of merging two point-clouds: first configuration (left), second configuration (right)
In addition, the second configuration does not merge perfectly both point-clouds. This fact
could come from the type of pattern used. As commented in section 3.2, the asymmetrical
circle grid (4x11) is used for the extrinsic camera calibration. This pattern is not square
and more circles are defining the z axis than y axis of the pattern with respect to the world
coordinate system. So, when two point-clouds are merged, more offset error appears in y
axis than z axis. Moreover, the higher offset between point-clouds appears in x axis
because the distance between the camera frame and the pattern frame is determined by
2D image processing functions. Thus, the functions estimates the x distance according to
the size of the circles: small size of circles indicates higher distance and vice versa.
58 Report
However, the translation offsets are relatively smalls, when the object recognition is
focused on bulky objects such as a rugby ball, and orientation offsets does not appear
between point-clouds. The object recognition module is able to determine the pose of the
object with the presence of that translation offsets (see Figure 6.5).
Figure 6.5. Translation offsets between two merged point-clouds using the same pattern calibration.
6.1.2. Multiple Kinects evaluation
This project presents some configurations where the RGB-D cameras are set at different
position and analysing qualitatively which is the best scenario that provides good
performances. However, the use of multiple structured light cameras such as Kinect 360
at the same time offers some drawbacks. Kinect 360 measures the 3D surfaces by the
projection of light patterns of infrared light. The use of multiple Kinects 360 causes
overlapping views of the light patterns. This fact causes confusion for the depth sensors of
both Kinects 360 resulting in holes and noise in the computed depth images. Figure 6.6
shows a depth map when only single Kinect 360 is running and when two Kinects 360 are
projecting the light patterns at the same time:
Figure 6.6. Depth image without interferences (left) and with interferences (right).
When the point-cloud is constructed, the interferences also appear on it (see Figure 6.7).
Thus, it is important to minimise the common surface captured with both depth sensors in
order to obtain a good point-cloud quality. The interferences are minimised if the cameras
Object recognition and grasping using bimanual robot 59
are located correctly. In addition, the Kinect One is a Time-of-flight camera type which
measures the time that takes a pulse of light to travel certain distance between the
camera and the surface (see section 3.1.2). Although the technology is different with
respect to the Kinect 360, the infrared ray projected of the Kinect One also interferes on
Kinect 360 sensors and vice versa.
Figure 6.7. Point-cloud without interferences and with interferences.
Despite the loss of information in the final point-cloud due to the interferences, the system
with multiple Kinects is able to recognise and estimate the pose of the objects as we can
see in the results in sections 6.1.3 and 6.1.4.
6.1.3. Object pose estimation
The aim of this section is to present quantitative results for the object recognition and
pose estimation mechanism presented in section 3.3. In order to determine the number of
fixed cameras and its configuration in the workspace, four scenarios have been designed.
The intention is to analyse which Kinect type can provide better results and, if the results
improve by using more than one camera. Four different simulations are presented:
a) Simulation using only Kinect One with aerial front view.
b) Simulation using only Kinect 360 with aerial front view.
c) Simulation using two Kinects 360 with lateral view.
d) Simulation using a Kinect One with aerial front view and two Kinects 360 with
lateral view.
The pose estimation error and the computation time have been averaged after 100
simulations for all scenarios presented. Therefore, four systems are presented with its
accuracy and its computation time. Figure 6.8 shows the asymmetrical circle grid used for
the camera calibration, an initial pose of the Rugby ball object and the initial configuration
of Kinect One and Kinect 360:
60 Report
Figure 6.8. Initial configuration of the elements involved in the simulations.
Note that the accuracy of the system is determined by the difference between the
estimated pose provided by the Object Recognition Module and the real pose of the
object. The real pose of the object is set by the user and it could contain some little error.
Thus, the centre of the Rugby ball is placed at:
For scenario 1 and 2: cartesian position (0, 0, 0.165) meters and orientation
expressed in angle-axis of (1, 0, 0) with an angle of π/2 radians.
For scenario 3 and 4: cartesian position (0, 0, 0.165) meters and orientation
expressed in angle-axis of (0, 1, 0) with an angle of π/2 radians.
The system is analyzed with an object which it is big enough to be grasped with the
bimanual manipulator (see Figure 2.10). It does not make sense using a little object which
can be manipulated with one hand. The official dimensions for the rugby ball size 5 are:
Figure 6.9. Official dimensions of the Rugby ball size 5 [74].
Object recognition and grasping using bimanual robot 61
In case that the objects pose estimation algorithm commits an error in position up to the
size of the rugby ball, the system will be considered with poor performance. Therefore, the
Object Recognition Module is designed to estimate the pose of a bulky object with the
minimum error as possible.
Four different scenarios have been simulated:
a) Scenario 1: Kinect One with aerial front view
The Object Recognition Module is tested using the Kinect One with an aerial front view
capturing the scene. A representation of the scenario is sketched in Figure 6.10:
Figure 6.10. Scheme of the scenario 1 with its elements involved in the simulations.
b) Scenario 2: Kinect 360 with aerial front view
The second experiment is exactly the same as the previous one but the Kinect 360 is
used now. A representation of the scenario is sketched in Figure 6.11:
Figure 6.11. Scheme of the scenario 2 with its elements involved in the simulations.
62 Report
c) Scenario 3: two Kinects 360 with lateral view
The third experiment consists of implementing additional depth camera to increase the
point-cloud density of the scene and provide to the processing algorithms more
information about the scene. A representation of the scenario is sketched in Figure 6.12:
Figure 6.12. Scheme of the scenario 3 with its elements involved in the simulations.
d) Scenario 4: Kinect One and two Kinects 360
The last experiment consists of implementing the Kinect One to the previous scenario to
increase the point-cloud density and provide an aerial view of the scene. A representation
of the scenario is sketched in Figure 6.13:
Figure 6.13 Scheme of the scenario 4 with its elements involved in the simulations.
Object recognition and grasping using bimanual robot 63
The position error has been calculated by the difference between the estimated and the
real position expressed in meters. Figure 6.14, Figure 6.15 and Figure 6.16 show the
position errors committed during the simulations of the four different scenarios. The
estimated position errors for all simulations are represented in plus sign with different
colours and the desired error in black circle sign. The simulation results for each scenario
are presented with independent figures in the Annex B.
Figure 6.14. Plane XY of the estimated position error of the Rugby ball.
Figure 6.15. Plane XZ of the estimated position error of the Rugby ball.
64 Report
Figure 6.16. Plane YZ of the estimated position error of the Rugby ball.
The orientation error has been calculated by the difference between the estimated and the
real orientation expressed in angle axis. The error committed to estimate the orientation of
the Rugby ball object is very high (see Table 6.1). Some simulations present errors
around 3 radiant or 171.88 degrees in some axis. This means that the system is very poor
for estimating the orientation of the objects. However, the results presented in section
6.1.4 determine that the orientation errors decrease for objects with asymmetric geometry.
The Rugby ball can be described as an ellipsoid which has symmetry in the semi-major
and semi-minor axis. This fact is a problem for fixing appropriately the orientation of the
object frame. In our case, the orientation of the Rugby ball can only be fixed to the z axis,
adjusting it to the longitudinal direction and the other two axes randomly.
Finally, Table 6.1 summarises the results of all scenarios presented. The position and
orientation errors for each scenario have been averaged and a confidence interval has
been provided to indicate the region with higher probability to obtain the position and
orientation error of the object. In addition, it is presented the density of points for each
scenario as well as the average computational time which is directly related with the size
of the point-cloud to be processed.
Object recognition and grasping using bimanual robot 65
Table 6.1. Results of the four simulated scenarios.
Scenario
Number
of
Points
Object Position errors (units in meters) Object Orientation
errors (units in degrees) Average time
in seconds
xerror yerror zerror angleerror
Kinect One 2700 -0.0448 ± 0.027 -0.0166 ± 0.031 0.0341 ± 0.030 55.61 ± 92.84 20.30
Kinect 360 2500 -0.0217 ± 0.029 -0.0144 ± 0.034 -7.477·10-4 ± 0.040 48.68 ± 100.65 17.12
two Kinect
360 4300 0.0027 ± 0.027 5.5156·10-4 ± 0.022 -0.0314 ± 0.027 49.27 ± 99.08 25.36
Kinect One
and two
Kinect 360
5700 -0.0028 ± 0.026 0.0061 ± 0.028 -0.0287 ± 0.022 48.55 ± 99.54 33.80
66 Report
6.1.4. Object recognition
This section presents qualitative results of the Object Recognition Module in order to
study how the versatile is the system designed with recognising different objects, different
position and orientations of the objects and testing the system with different configurations
of the cameras. The 3D model objects are:
Cylinders with different colours.
Cylinder in T structure.
Rugby ball.
Skate helmet.
Figure 6.17. Object models used for testing Object Recognition Module.
Three different scenarios have been simulated:
a) Simulation using only Kinect One with aerial front view.
b) Simulation using two Kinects 360 with lateral view.
c) Simulation using a Kinect One with aerial front view and two Kinects 360 with
lateral view.
The following three scenarios follow a basic methodology to show clearly the visual
results obtained. First an image colour of the workspace is provided in order to present
the objects to be recognised and then, a set of views of the 3D object recognised. The
point-cloud visualization is done by the PCLVisualizer [26] which an empty window
displays a black background and the world coordinate system (x axis in red, y axis in
green and z axis in blue).
Object recognition and grasping using bimanual robot 67
a) Scenario 1: Kinect One with aerial front view
The aim of this scenario is to analyze the performance of the Object Recognition Module if
the point-cloud scene is taken with a single Kinect One with aerial front view. A green
cylinder with length of 16cm and diameter of 6cm is the first object model to carry out the
simulations. Three simulations are performed by orienting the longitudinal direction of the
cylinder through the three orthogonal planes located at the origin of the coordinate
system:
Figure 6.18. Experiment 1: three different orientations of the green cylinder model.
The longitudinal direction of the cylinder oriented perpendicular to the plane XY achieves
a perfect match between the model template and the cylinder in the scene (see Figure
6.19). When the cylinder is perpendicular to the plane XZ, the surface of the cylinder is
tangential to the infrared ray projected of the Kinect which causes distortion in the point-
cloud (see Figure 6.20). Last figure shows how the match is sometimes done partially
between the model template and the object in the scene probably caused by low
iterations to align both models. Figure 6.21 shows how the cylinder template needs more
iteration to align itself to the cover of the cylinder.
Figure 6.19. Results of experiment 1: three object views of the cylinder perpendicular to the plane XY.
68 Report
Figure 6.20. Results of experiment 1: three object views of the cylinder perpendicular to the plane XZ.
Figure 6.21. Results of experiment 1: three object views of the cylinder perpendicular to the plane YZ.
The second experiment consists of testing the system with multiple objects in the
workspace. Cylinder models with different colours (green, pink and blue) are placed as
shown in Figure 6.22:
Figure 6.22. Experiment 2: two different sets of the cylinder models.
Although the Object Recognition Module is detecting correctly the number of objects in
the scene, the object pose estimation is not achieving good results with high accuracy. As
commented previously, the match is sometimes done partially between the model
template and the object in the scene. This fact is caused by low iterations to align both
models. Figure 6.23 shows a perfect match in the pink cylinder but the green cylinder
could need more iteration to align. Figure 6.24 shows that the pink and the green
cylinders oriented vertically achieve better match than the blue cylinder oriented
horizontally. This fact has been detected in the previous experiment (see Figure 6.20).
Finally, the Object Recognition Module is providing the objects pose with respect to the
Object recognition and grasping using bimanual robot 69
world coordinate system. However, when the model template has symmetric geometry
such as sphere, the object frame is set at the centre and the orientation randomly. In our
case, the orientation of the cylinders can only be fixed to the z axis, adjusting it to the
longitudinal direction of the cylinder. At this point, it is interesting to perform some
experiments with asymmetric objects.
Figure 6.23. Results of experiment 2: three object views of two cylinders perpendicular to the plane XY.
Figure 6.24. Results of experiment 2: three object views of three cylinders.
Third experiment consists of recognise a single object with “T” structure. This object is
composed with two colours and it is placed as shown Figure 6.25:
Figure 6.25. Experiment 3: two different orientations of the “T” model.
The object model used in this experiment maintains some symmetric geometry in two
transversal planes. However, the “T” structure allows to the system to set the orientation
of the object fixing the z axis to the longitudinal direction of the vertical cylinder (green)
70 Report
and the y axis to the longitudinal direction of the horizontal cylinder (blue). Then, the x
axis is fixed by definition.
Figure 6.26. Results of experiment 3: three object views of the “T” model.
Figure 6.27. Results of experiment 3: two object views of the “T” model with random orientation.
The previous experiments have been designed to analyze the behaviour of the Object
Recognition Module with objects that can be manipulated with one robotic hand. Fourth
experiment consists of recognise bulky objects to be manipulated with a bimanual robot.
Figure 6.28 shows how a Rugby ball and a Skate helmet are placed in the workspace:
Figure 6.28. Experiment 4: Rugby ball and Skate helmet models.
The results obtained with Kinect One with aerial front view indicate that the Object
Recognition Module could not obtain good matches for bulky objects as shown in Figure
6.29. One factor could be the low density of points describing the scene. Therefore, a
single depth camera capturing all objects in the scene could be not enough and multiple
depth cameras would be necessary.
Object recognition and grasping using bimanual robot 71
Figure 6.29. Results of experiment 4: Rugby ball and Skate helmet pose estimation.
b) Scenario 2: two Kinect 360 with lateral view
The aim of this scenario is to analyze the performance of the Object Recognition Module if
the point-cloud scene is taken with two Kinects 360. The experiments presented try to
improve the results obtained in previous simulations with bulky objects by choosing
correctly the location of the cameras.
Fifth experiment consists of recognise the Rugby ball with the information provided with
two depth cameras. The point-cloud is much dense respect to the previous experiments.
The Rugby ball pose is shown in Figure 6.30:
Figure 6.30. Experiment 5: two lateral views of the Rugby ball model.
The final result can be seen in Figure 6.31. The pose estimation is not completely
accurate because the two point-clouds taken separately by two different Kinects 360 are
not merged well as can be appreciate in the figure. In section 6.1.1 has been explained
the causes of these offsets between point-clouds. Moreover, the object is illuminated with
the infrared projector of both depth cameras at the same time. This fact causes
interferences in the depth sensors and the visible side of the object has been distorted.
72 Report
Figure 6.31. Results of experiment 5: Rugby ball pose estimation.
The Rugby ball is oriented differently in order to capture two different sides of the ball
while avoiding interferences between the cameras sensor (see section 6.1.2):
Figure 6.32. Experiment 6: two lateral views of the Rugby ball model.
The object has been oriented in order to minimise the common surface captured with both
depth sensors to obtain a good point-cloud quality. Despite the final point-cloud obtained
with two Kinects 360 presents translation offsets, the system with multiple Kinects is able
to recognise and estimate the pose of the object.
Figure 6.33. Results of experiment 6: Rugby ball pose estimation.
Object recognition and grasping using bimanual robot 73
c) Scenario 3: two Kinect 360 with lateral view and Kinect One
The aim of last scenario is to analyze the performance of the Object Recognition Module if
the point-cloud scene is taken with three different views. So, three point-clouds are
merged into one.
Figure 6.34. Experiment 7: three different views of the Rugby ball model.
Finally, the Rugby ball can be constructed almost completely, just missing the bottom and
the behind part. Therefore, a better matching between the object template and the object
in the scene has been obtained with respect to the previous experiments with the Rugby
ball:
Figure 6.35. Results of experiment 7: Rugby ball pose estimation.
74 Report
6.2. Evaluation of the Grasp and Motion Planning Module
The aim of this section is to present visual results for the object grasp and motion planning
procedure presented in section 4. The object model selected has been the rugby ball to
test the system in four different poses:
Table 6.2. Rugby ball pose with respect to workspace coordinate system.
Test x y z ax ay az angle
a) 0.0m 0.0m 0.15m 0 0 1 0º
b) 0.0m 0.0m 0.15m 0 0 1 90º
c) 0.1m 0.1m 0.15m 0 0 1 45º
d) 0.0m 0.0m 0.15m 0.3 0.9 -0.3 70º
Figures below illustrate how the object has been divided in slices to place the left and right
fingers and thumbs, show the resulting contact points and present several pictures in
order to see the trajectory followed for the manipulator to reach the object. Note that these
experiments have been performed in uncluttered environments with the object placed
against a uniform background. Grasping in cluttered environments is a harder problem
both from a robot perception as well as a robot motion planning. In this evaluation, robust
strategies for object grasping are needed to overcome the positioning errors previously
analysed (see section 6.1.3).
Figure 6.36. Test a): Rugby ball divided in slices and the resulting contact points.
Figure 6.37. Test a): Manipulator movements following the path planner solution.
Object recognition and grasping using bimanual robot 75
Figure 6.38. Test b): Rugby ball divided in slices and the resulting contact points.
Figure 6.39. Test b): Manipulator movements following the path planner solution.
Figure 6.40. Test c): Rugby ball divided in slices and the resulting contact points.
76 Report
Figure 6.41. Test c): Manipulator movements following the path planner solution.
Figure 6.42. Test d): Rugby ball divided in slices and the resulting contact points.
Figure 6.43. Test d): Manipulator movements following the path planner solution.
Object recognition and grasping using bimanual robot 77
7. Project Budget
The project budget is shown below:
2 Kinect 360........................................................................................ 540,00€
Kinect One......................................................................................... 450,00€
2 Allegro Hand.................................................................................... 26800,00€
2 UR5.................................................................................................. 35800,00€
Other material.................................................................................... 500,00€
Objects............................................................................................... 100,00€
Computer............................................................................................. 2500,00€
Subtotal.............................................................................................. 66690,00€
IVA (21%)........................................................................................... 14004,90€
Total.................................................................................................... 80694.90€
This project implements open software and any license has been required. The salary of
employees is also detailed.
Technical engineers (900h at 14€/h).................................................. 12600,00€
Project Manager (30h at 30€/h).......................................................... 900,00€
78 Report
8. Environmental impact
This project has presented a system which has a positive impact in the social and natural
environmental but negative in the economic one.
The economic part has an important negative impact due to the high costs of hardware
devices used which is about 80695€, although open software reduces the project budget.
Secondly, the project has a positive social impact. The fact that the system has the ability
to recognise and grasp bulky objects with two hands, allow integrating easily into
scenarios where the robot could coexist with people and help them for specific tasks. At
this point, the system proposed supposes an initial core of a more complex one which will
be able to navigate autonomously into a room and execute other tasks. Moreover, the
open software allows to the robotics’ community to use it without legal issues.
At the end, this project has low natural environmental impact since the power
consumption and the raw materials used are the minimum. Moreover, the hardware
devices have a large lifecycle but once these devices break or do not work properly, these
are recycled correctly.
Object recognition and grasping using bimanual robot 79
Conclusions and further work
In summary, the objectives of this project have been achieved upon the obtention of
satisfactory results with the Object Recognition and Grasping System.
A system able to locate and grasp automatically a specific object using a bimanual robot
has been developed. The system has a perception system to sense the environment of
the robot with RGB-D cameras and processing the point-cloud. When the system has
located the object, it is capable to deduce grasping point according to the pose of the
object and the robotic hands. Finally, the manipulator reaches the object from an initial
configuration thanks to the use of an inverse kinematic solver and the path planning
strategy which finds an admissible trajectory with free collision obstacles.
Furthermore, the main conclusions of the evaluation of the object recognition module are:
- The camera’s calibration takes an important role in order to obtain good
performance and reliability of the system.
- Kinect 360 and Kinect One devices are robust to interference when working
alone. When a multiple camera system is required, the cameras interfere with
each other. However, the results presented in this project show that it is
possible to assume those interferences.
- Multiple Kinects One cannot be launched in the same computer due to large
data to process which collapses the USB bus. Using two independent
computers for each Kinect One increments the complexity of the system and
the project budget.
- The point-cloud processing performance is directly affected for several factors
such as the point density, the measurements errors, the selected keypoints, the
descriptor used, the matching process between the source and the target point-
clouds, etc.
- This project proposes an approach to use multiple camera views to construct a
final point-cloud scene in order to obtain a complete object reconstruction. If the
camera calibration is poor, the final point-cloud contains little translation and
rotation offsets between the matched original points, which worsen the
performance of object recognition.
- Filtering process is important to delete outliers and downsampling a point-cloud
in order to reduce the points and consequently, reducing the computational cost
in the next stages. By contrast, deleting many points could produce a lack of
information of the scene and other processing algorithms could not work
properly with low quality of normals or descriptors.
- This project presents a satisfactory approach of filtering the desired object with
its colours instead of its shape.
- The more complete the point-cloud is, the more reduced is the error of object
pose.
80 Report
Secondly, the main conclusions of the evaluation the whole system are:
- In a complex system, where many elements collaborate with each other such
as cameras, manipulators, objects, etc., it is important to fix a world coordinate
system.
- The strategy to grasp bulky object with two anthropomorphic hands is very
different with respect to small ones. The synchrony of two arms to grasp the
same object is a hard and complex task.
- A system which grasps a bulky object with six points, three for each hand,
instead of four as other projects recommend, is a more robust system.
- The system has a bimanual manipulator with 22 DoF for each arm. Therefore,
inverse kinematic problem is a laborious process to converge a feasible
solution. However, if the manipulator joints limits, good initial joints position and
a fast IK solver such as DLS are correctly fixed, the IK problem converges to a
solution.
- ROS services can make a persistence connection between client and server.
This fact supposes low robustness to service provider changes or server
failures. Although the object recognition service did not have any failure during
the simulations, it is recommended to use ROS actions. Action provides
feedback for some updates on the progress and the client node can cancel the
request at any time.
The system presented tries to minimise the budget of the project. Although, the hardware
devices presented have a high cost, the project uses current devices in the IOC’s
laboratory and open software. This fact facilitates to integrate immediately the Object
Recognition and Grasping System into the IOC’s laboratory.
The future tasks could be the following:
1. To improve the Object Recognition module:
- Develop a module to allow the Kinect camera to be held by a robot or
operator equipped with a magnetic tracker to measure the camera pose,
and generate the point-cloud scene by automatically changing the points
of view.
- Use AR markers [42] to place fixed cameras with respect to the world
coordinate system.
2. To improve the Grasp and Motion Planning module:
- Use of a fourth finger from each hand and the palms.
- Use tactile feedback sensors on the hand tips.
- Implementation of motion planner able to find paths with human-like
appearance for bimanual manipulators.
- Use the RRTConnect planner which grows two trees toward each other
until they are merged into one. It will improve the efficiency of the motion
planning system.
3. To improve the Manager and Execution module:
- Use ROS actions to communicate between nodes.
Object recognition and grasping using bimanual robot 81
Acknowledgements
I would like to thank my supervisors, in special Jan Rosell, Leopold Palomo and Néstor
García for his patience and support during the process of this project and the rest of the
work mates from IOC to accept me as one of them during my internship.
To Isiah Zaplana, for his classes on numerical inverse kinematics and his motivation to
develop the first core of the General Inverse Kinematic Library.
To Abiud Rojas, for sharing his work about Grasping with two Allegro Hands and his
patience to helping me to integrate it in this project.
To my family, friends and university colleagues, for their advice and their support to finish
this project.
82 Report
Bibliography
[1] Microsoft, “Kinect for Windows Sensor Components and Specifications,” [Online].
Available: https://msdn.microsoft.com/en-us/library/jj131033.aspx. [Accessed July
2016].
[2] Microsoft, “Kinect hardware,” [Online]. Available: https://developer.microsoft.com/en-
us/windows/kinect/hardware. [Accessed July 2016].
[3] SimLab, “Allegro Hand Features,” [Online]. Available: http://www.simlab.co.kr/Allegro-
Hand.htm. [Accessed June 2016].
[4] U. Robots, “Universal Robots: UR5 specifications,” [Online]. Available:
http://www.universal-robots.com/es/productos/robot-ur5/. [Accessed August 2016].
[5] M. Quigley, B. Gerkey, J. Leibs, K. Conley, J. Faust, T. Foote, E. Berger, R. Wheeler
and A. Ng, “ROS: an open-source Robot Operating System,” in International
Conference on Robotics and Automation, ser. Open-Source Software workshop,
2009.
[6] E. Crawley, O. De Weck, S. Eppinger, C. Magee, J. Moses, W. Seering, J. Schindall,
D. Wallace and D. Whitney, “The Influence of Architecture in Engineering Systems,”
Massachusetts Institute of Technology (MIT); pp. 2-15, 2004.
[7] “Institute of Industrial and Control Engineering,” [Online]. Available:
https://www.ioc.upc.edu/the-institute/presentation1?set_language=en. [Accessed June
2016].
[8] A. Rojas-de-Silva and R. Suárez, “Grasping Bulky Objects with Two Anthropomorphic
Hands,” Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems; pp. 1-8, 2016.
[9] WillowGarage, “ROS,” [Online]. Available: http://www.ros.org/. [Accessed June 2016].
[10] J. M. O'Kane, A Gentle Introduction to ROS, University of South Carolina:
CreateSpace Independent Publishing Platform; ISBN 978-14-92143-23-9; p. 9, 2014.
[11] ROS, “ROS.org Nodes,” [Online]. Available: http://wiki.ros.org/Nodes. [Accessed June
2016].
[12] ROS, “ROS.org Master,” [Online]. Available: http://wiki.ros.org/Master. [Accessed June
Object recognition and grasping using bimanual robot 83
2016].
[13] ROS, “ROS.org Topics,” [Online]. Available: http://wiki.ros.org/Topics. [Accessed June
2016].
[14] ROS, “ROS.org Services,” [Online]. Available: http://wiki.ros.org/Services. [Accessed
June 2016].
[15] M. Quigley, B. Gerkey and W. D. Smart, Programming Robots with ROS: a practical
introduction to the Robot Operating System, O’Reilly Media, Inc.; pp. 59-80, 81-92, 94-
110, 2016.
[16] ROS, “ROS.org libfreenect,” [Online]. Available: http://wiki.ros.org/libfreenect.
[Accessed June 2016].
[17] OpenKinect, “OpenKinect Main Page,” [Online]. Available:
https://openkinect.org/wiki/Main_Page. [Accessed August 2016].
[18] OpenKinect, “GitHub OpenKinect/libfreenect2,” [Online]. Available:
https://github.com/OpenKinect/libfreenect2. [Accessed August 2016].
[19] Structure, “OpenNI 2 SDK Binaries,” [Online]. Available: http://structure.io/openni.
[Accessed July 2016].
[20] ROS, “ROS.org openni_camera,” [Online]. Available:
http://wiki.ros.org/openni_camera. [Accessed June 2016].
[21] ROS, “ROS.org openni_launch,” [Online]. Available: http://wiki.ros.org/openni_launch.
[Accessed June 2016].
[22] ROS, “ROS.org rgbd_launch,” [Online]. Available: http://wiki.ros.org/rgbd_launch.
[Accessed June 2016].
[23] I. A. I. University of Bremen, “Tools for using the Kinect One (Kinect v2) in ROS,”
[Online]. Available: https://github.com/code-iai/iai_kinect2. [Accessed July 2016].
[24] B. Jacob and G. Guennebaud, “Eigen Main Page,” [Online]. Available:
http://eigen.tuxfamily.org/index.php?title=Main_Page. [Accessed August 2016].
[25] WilowGarage, “OpenCV Main Page,” [Online]. Available: http://opencv.org/. [Accessed
June 2016].
[26] WillowGarage, “Point Cloud Library (PCL),” 2010. [Online]. Available:
84 Report
http://pointclouds.org/. [Accessed June 2016].
[27] R. B. Rusu, “3D is here: Point Cloud Library (PCL),” IEEE International Conference on
Robotics and Automation (ICRA); pp. 1-4, 2011.
[28] J. Rosell, A. Pérez, A. Aliakbar, L. Palomo and N. García, “The Kautham Project: A
teaching and research tool for robot motion planning,” IEEE Int. Conf. On Emerging
Technologies and Factory Automation, ETFA’14; pp. 1-8.
[29] I. A. Șucan, M. Moll and L. E. Kavraki, “The Open Motion Planning Library Main Page,”
[Online]. Available: http://ompl.kavrakilab.org/index.html. [Accessed July 2016].
[30] I. A. Șucan, M. Moll and L. E. Kavraki, “The Open Motion Planning Library,” IEEE
Robotics & Automation Magazine; pp. 1-10, 2012.
[31] X. Armangué, “Modelling Stereoscopic Vision Systems for Robotic Applications,”
Universitat de Girona; pp. 49-70, 2003.
[32] M. Laukkanen, “Performance Evaluation of Time-of-Flight Depth Cameras,” AALTO
university; pp. 3-14, 2015.
[33] J. Posdamer and M. Altschuler, “ Surface measurement by space-encoded projected
beam systems,” Computer Graphics and Image Processing, 1982.
[34] “Extensible Markup Language (XML) - Origin and Goals,” [Online]. Available:
https://www.w3.org/TR/REC-xml/#sec-origin-goals. [Accessed June 2016].
[35] C. Evans, O. Ben-Kiki and I. döt Net, “YAML,” 2001. [Online]. Available:
http://www.yaml.org/start.html. [Accessed June 2016].
[36] J. Bowman and P. Mihelich, “ROS camera_calibration,” [Online]. Available:
http://wiki.ros.org/camera_calibration. [Accessed July 2016].
[37] P. Mihelich, “ROS OpenNI launch,” [Online]. Available:
http://wiki.ros.org/openni_launch. [Accessed August 2016].
[38] B. Siciliano, L. Sciavicco, L. Villani and G. Oriolo, Robotics: Modelling, Planning and
Control (Advanced Textbooks in Control and Signal Processing), London: Springer;
pp. 1- 140, 2009.
[39] OpenCV, “OpenCV Tutorials: Camera calibration,” [Online]. Available:
http://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.
html. [Accessed August 2016].
Object recognition and grasping using bimanual robot 85
[40] H. Kato, “ARToolkit,” Human Interface Technology Laboratory, [Online]. Available:
http://www.hitl.washington.edu/artoolkit/. [Accessed August 2016].
[41] I. Dryanovsk and B. Morris, “ROS ar_pose,” [Online]. Available:
http://wiki.ros.org/ar_pose. [Accessed August 2016].
[42] S. Niekum, “ROS AR Track Alvar,” [Online]. Available:
http://wiki.ros.org/ar_track_alvar. [Accessed August 2016].
[43] OpenCV, “OpenCV Tutorials: Camera Calibration and 3D Reconstruction,” [Online].
Available:
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstru
ction.html. [Accessed August 2016].
[44] PCL, “PassThrough Class Template,” [Online]. Available:
http://docs.pointclouds.org/1.7.1/classpcl_1_1_pass_through.html. [Accessed August
2016].
[45] PCL, “Module filters,” [Online]. Available:
http://docs.pointclouds.org/trunk/group__filters.html#gac463283a9e9c18a66d3d29b28
a575064. [Accessed August 2016].
[46] PCL, “VoxelGrid Class Template,” [Online]. Available:
http://docs.pointclouds.org/trunk/classpcl_1_1_voxel_grid.html. [Accessed August
2016].
[47] PCL, “RegionGrowingRGB Class Template,” [Online]. Available:
http://docs.pointclouds.org/1.7.0/classpcl_1_1_region_growing_r_g_b.html. [Accessed
August 2016].
[48] PCL, “NormalEstimation Class Template,” [Online]. Available:
http://docs.pointclouds.org/1.7.1/classpcl_1_1_normal_estimation.html. [Accessed
August 2016].
[49] A. Evans Patterson, “Registration and Recognition in 3D,” Publicly Accessible Penn
Dissertations. Paper 1401; University of Pennsylvania; pp. 9-15, 2014.
[50] D. Wegrzyn, “Creating 3D Object Descriptors Using a Genetic Algorithm,”
Universidade da Beira Interior; pp. 15-23, 2013.
[51] PCL, “FPFHEstimation Class Template,” [Online]. Available:
http://docs.pointclouds.org/trunk/classpcl_1_1_f_p_f_h_estimation.html#details.
86 Report
[Accessed August 2016].
[52] R. Rusu, N. Blodow and M. Beetz, “Fast point feature histograms (FPFH) for 3D
registration,” In Robotics and Automation, 2009. ICRA'09. IEEE International
Conference on (pp. 3212-3217). IEEE., 2009.
[53] PCL, “SampleConsensusInitialAlignment Class Template,” [Online]. Available:
http://docs.pointclouds.org/1.7.1/classpcl_1_1_sample_consensus_initial_alignment.ht
ml. [Accessed August 2016].
[54] PCL, “IterativeClosestPoint Class Template,” [Online]. Available:
http://docs.pointclouds.org/1.7.1/classpcl_1_1_iterative_closest_point.html. [Accessed
August 2016].
[55] P. J. Besl and N. D. McKay, “A Method for Registration of 3D Shapes,” In Robotics-DL
tentative (pp. 586-606). International Society for Optics and Photonics., 1992.
[56] ROS, “ROS.org Messages,” [Online]. Available: http://wiki.ros.org/msg. [Accessed
June 2016].
[57] A. Bicchi and V. Kumar, “Robotic grasping and contact: A review,” IEEE International
Conference on Robotics & Automation (pp. 348-353), 2000.
[58] A. Miller, S. Knoop and H. Christensen, “Automatic grasp planning using shape
primitives,” In Robotics and Automation, 2003. Proceedings. ICRA'03. IEEE
International Conference on (Vol. 2, pp. 1824-1829). IEEE., 2003.
[59] R. Pelossof, A. Miller, P. Allen and T. Jebara, “An svm learning approach to robotic
grasping,” In Robotics and Automation, 2004. Proceedings. ICRA'04. 2004 IEEE
International Conference on (Vol. 4, pp. 3512-3518). IEEE., 2004.
[60] J. Lenarcic, “An efficient numerical approach for calculating the inverse kinematics for
robot manipulators,” Robotica, vol. 3, no 01, p. 21-26, 1984.
[61] S. R. Buss, “Introduction to Inverse Kinematics with Jacobian Transpose,
Pseudoinverse and Damped Least Squares methods,” IEEE Journal of Robotics and
Automation, 17(1-19), 16., 2004.
[62] ROS.org, “XML specifications,” [Online]. Available: http://wiki.ros.org/urdf/XML.
[Accessed June 2016].
[63] ROS.org, “Link element,” [Online]. Available: http://wiki.ros.org/urdf/XML/link.
[Accessed June 2016].
Object recognition and grasping using bimanual robot 87
[64] ROS.org, “Joint element,” [Online]. Available: http://wiki.ros.org/urdf/XML/joint.
[Accessed June 2016].
[65] P. Beeson and B. Ames, “TRAC-IK: An Open-Source Library for Improved Solving of
Generic Inverse Kinematics,” In Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th
International Conference on (pp. 928-935). IEEE., November 2015.
[66] L. Kavraki, P. Svestka, J.-C. Latombe and M. Overmars, “Probabilistic Roadmaps for
Path Planning in High-Dimensional Configuration Spaces,” IEEE transactions on
robotics and automation, vol 12 n4, 1996.
[67] S. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” 1998.
[68] J. Kuffner and S. LaValle, “RRT-Connect: An Efficient Approach to Single-Query Path
Planning,” In Robotics and Automation. Proceedings. ICRA'00. IEEE International
Conference on (Vol. 2, pp. 995-1001). IEEE., 2000.
[69] D. Hsu, J.-C. Latombe and R. Motwani, “Path planning in expansive configuration
spaces,” In Robotics and Automation. Proceedings., 1997 IEEE International
Conference on (Vol. 3, pp. 2719-2726). IEEE., 1997.
[70] G. Sánchez and J.-C. Latombe, “A single-query bi-directional probabilistic roadmap
planner with lazy collision checking.,” In Robotics Research (pp. 403-417). Springer
Berlin Heidelberg., 2003.
[71] I. Sucan and L. Kavraki, “Kinodynamic motion planning by interior-exterior cell
exploration,” In Algorithmic Foundation of Robotics VIII (pp. 449-464). Springer Berlin
Heidelberg., 2009.
[72] S. M. LaValle, “Planning algorithms,” Cambridge university press, 2006.
[73] C. Rodríguez, A. Rojas-de-Silva and R. Suárez, “Dual-Arm framework for cooperative
applications,” Proc. IEEE Int. Conf. Emerging Technologies and Factory Automation,
ETFA, 2016.
[74] S. B. Shop, “Sports Ball Shop: Rugby ball buying guide,” [Online]. Available:
http://www.sportsballshop.co.uk/acatalog/Rugby-Ball-Buying-Guide.html. [Accessed
July 2016].