Transformational Planning for Mobile Manipulation based on … · Transformational Planning for Mobile Manipulation based on Action-related Places Andreas Fedrizzi, Lorenz Mosenlechner,

Transformational Planning for Mobile Manipulationbased on Action-related Places

Andreas Fedrizzi, Lorenz Mosenlechner, Freek Stulp, Michael BeetzIntelligent Autonomous Systems, Technische Universitat Munchen{fedrizza|moesenle|stulp|beetz}@cs.tum.edu

Abstract— Opportunities for interleaving or parallelizingactions are abundant in everyday activities. Being able toperceive, predict and exploit such opportunities leads to moreefficient and robust behavior. In this paper, we present amobile manipulation platform that exploits such opportunitiesto optimize its behavior, e.g. grasping two objects from onelocation simultaneously, rather than navigating to two differentlocations. To do so, it uses a general least-commitment repre-sentation of place, called ARPLACE, from which manipulationis predicted to be successful. Models for ARPLACEs are learnedfrom experience using Support Vector Machines and Point Dis-tribution Models, and take into account the robot’s morphologyand skill repertoire. We present a transformational planner thatreasons about ARPLACEs, and applies transformation rules toits plans if more robust and efficient behavior is predicted.

I. INTRODUCTION

In everyday activities, opportunities for optimizing thecourse of action arise constantly, as tasks can often beinterleaved or executed in parallel. For instance, when settingthe table, plates can be stacked instead of carrying them oneat a time, cupboards can be left open during the task, etc.Being able to perceive, predict and exploit such opportunitiesleads to more efficient and robust behavior.

To enable this approach, the robot must: 1) use least-commitment planning, i.e. not prematurely commit to aspecific plan when it is not necessary; 2) have rules fortransforming suboptimal plans into more efficient ones. Inthis paper, we apply plan transformation rules to a mobilemanipulation task, in which a robot approaches a table andgrasps one or more cups, as depicted in Fig. 1.

Navigate Grasp Lift

Initial

Success!

Failure!

Fig. 1. Mobile manipulation task considered in this paper. Example of asuccessful and a failed attempt. This figure is explained in more detail inSection IV.

To enable least-commitment planning when navigating inorder to grasp, we propose a concept of action-related place,denoted ARPLACE, that takes into account the manipulationand navigation skills of a robot, as well as its hardwareconfiguration. The ARPLACE is represented as a probabilitymap that maps positions of the robot and target objects to aprobability that the target objects will be successfully graspedfrom the robot’s position. Fig. 2 visualizes ARPLACEs forsome given target object positions. The ARPLACE imple-ments a least-commitment realization of positions, meaningthat the robot does not commit itself to a specific initialposition, but can refine it as the robot learns more about thetask context, such as a better estimation of the target object’sposition, or observed clutter in the environment.

1 0.5 0

0.8

0.6

0.4

0.2

0

0.2

0.4

1 0.5 0 1 0.5 0

∩ =

Fig. 2. Left: ARPLACE probability map for grasping cup with left gripper.Center: ARPLACE probability map for right gripper. Right: Grab bothcups with left/right gripper respectively. It is the product of the other twoprobability maps. Green areas mark regions where the probability of asuccessful grasping action is high.

In this paper, we combine the concept of ARPLACE with atransformational planning system and compose ARPLACEsfor multiple actions. The transformational planning systemspecifies its plans in RPL [12]. RPL is a flexible andpowerful language for specifying plans — control programsthat cannot only be executed but also reasoned about. Trans-formational planning performs substantial changes in robotbehavior, e.g. by adding plan steps, removing unnecessaryones, or reordering and recombining existing steps. Morespecific examples are stacking of objects to reduce thenumber of navigation actions, but also the elimination ofmore critical failures such as moving objects temporarily outof the way when they are blocking a location the robot needsto navigate to.

We consider a scenario in which the robot has to pickup two cups. By default, this task is solved by navigatingto one cup, grasping it, navigating to the second cup, andgrasping it. By using the ARPLACE probability map, thetransformational planner is able to decide if a position existswhere the probability of successfully grasping both cups at

once is sufficiently high. If this is the case, the plannerapplies a rule for transforming the default plan into themore efficient plan of grasping both cups from one position.This reduces the overall plan length (we save one navigationaction), and enables faster task execution.

The main contributions of this paper are: 1) Proposingmethods for learning ARPLACE representations from ob-served experience, using Support Vector Machines and PointDistributions Models 2) Merging different ARPLACEs forindependent tasks 3) Integrating ARPLACEs in a transfor-mational planner.

The rest of this paper is structured as follows. In the nextsection, we discuss related work. In Section III, we describethe concept of ARPLACE. We then explain how a so-calledGeneralized Success Model (GSM) is learned, and how theGSM is used to compute an ARPLACE with a Monte Carlosimulation in Section IV and V respectively. In Section VIwe show how a transformational planning system uses theconcept of ARPLACE to optimize plans. Empirical results arepresented in Section VII, and we conclude with Section VIII.

II. RELATED WORK

Berenson et al. [3] deal with the problem of finding opti-mal start and goal configurations for manipulating objects inpick-and-place operations. They explicitly take the placementof the mobile base into account. As they are interestedin the optimal start and goal configurations, instead of aprobabilistic representation, this approach does not enableleast-commitment planning.

The Capability Map is another approach to modellingrobot configurations that lead to possible grasps [20]. Ca-pability Maps are used to find regions where the dexterityof a manipulator is high. As they focus on the kinematicsof a robot, they are not related to a given skill repertoire orenvironment. Also, they do not take uncertainties in robot orobject position into account.

The robot ‘Dexter’ learns sequences of manipulation skillssuch as searching for and then grasping an object [7].Declarative knowledge such as the length of its arm is learnedfrom experience. Learning success models has also beendone in the context of robotic soccer, for instance learning thesuccess rate of approaching a ball [18]. Our system extendsthese approaches by explicitly representing the regions inwhich successful instances are observed, and computing aGSM for these regions.

Friedman and Weld demonstrated the advantages of least-commitment planning in [5]. They showed that setting openconditions to abstract actions and later refining this choiceto a particular action can lead to exponential savings. Theprinciple of lazy evaluation is applied to motion planning byBohlin and Kavraki [4]. They are able to significantly reducethe number of collision checks for building a PRM.

Sussman [19] was the first to realize that bugs in plansdo not just lead to failure, but are actually an opportunityto construct improved and more robust plans. Although thisresearch was done in the highly abstract symbolic blocks

world domain, this idea is still fundamental to transforma-tional planning.

The basis of our transformational planning system is thedeclarative and expressive plan language RPL, which isdescribed in [2]. The constraints for plan design, especiallythe specification of declarative goals indicating the purposeof code parts, have been shown in [1]. Besides the modelingof navigation tasks, our system scales with respect to reason-ing about perception based on computer vision, the relationbetween objects and their representation in the robot’s belief,as well as reasoning about complex manipulation tasks.

Temporal projection is an integral component of a trans-formational planning system. McDermott [13] developed avery powerful, totally ordered projection algorithm capableof representing and projecting various kinds of uncertainty,concurrent threads of execution, and exogenous events.

III. CONCEPT OF ARPLACE

We propose ARPLACE as a powerful and flexible repre-sentation of the utility of positions in the context of action-related mobile manipulation. The concept of ARPLACE isimplemented as a continuous probability map that representsthe probability of successfully grasping the target objectwhen standing at a certain initial position. Figure 2 depictsthree such maps for grasping various cups on a table.Figure 8 shows how variations in the robot’s estimation ofthe cup position influences ARPLACE1.

Instead of committing to a specific position in advance, anARPLACE enables least-commitment planning, as a wholerange of positions are predicted to be successful, or at leastprobable. The robot will start to move to a position that isgood enough to execute the subsequent manipulation actionand will refine the goal position while it moves. In thecontext of grasping a cup from a table, this would meanthat the concept of ARPLACE finds a solution area that isgood enough for the robot to start moving. As the robotapproaches the table, new sensor data comes in, and therobot’s state estimate is updated (i.e. cup position accuracy,information on clutteredness of regions, etc.). As a conse-quence the ARPLACE is updated, and becomes more precise.The principle of least commitment is especially powerful inreal environments, where complete information, required tocompute optimal goal positions, is not available. Even if theenvironment is completely observable, dynamic propertiescould make an optimal pre-planned position suboptimal orunaccessible.

Additionally, the concept of ARPLACE can be easily trans-fered to a utility-based representation by creating heuristicsthat optimize for arbitrary secondary constraints such aspower consumption, time, end-effector-movement, or torque-change. For further information on the optimization of sub-goals with respect to secondary criteria, we refer to [18].

Figure 3 depicts a system overview of how ARPLACE islearned and used for transformational planning. Numerals in

1A better impression is given by a video which can be downloaded fromhttp://www9.cs.tum.edu/people/fedrizzi/icar 09

the figure refer to sections in this paper. First, a GeneralizedSuccess Model (GSM) is learned from observed experience.The transformational planner requests ARPLACEs wheneverit finds several manipulation actions in order to find abetter location to perform them. These are computed bythe PLA4MAN module, which uses the GSM to performa Monte Carlo simulation. ARPLACEs for individual tasksare merged, to compute ARPLACEs for joint tasks.

VI. Transf.Planner

Plan

AR

Pla

ces

V.A. MergeARPlaces

ObservedExperience

IV. LearnGSM

GeneralizedSuccessModel

IV. GatherExperience

V.

requests

Fig. 3. System overview.

IV. LEARNING A GENERALIZED SUCCESS MODEL

In this section, we describe the implementation of thelearning of the GSM, as depicted on the left side in Fig. 3.The rest of this section is structured according to Algo-rithm 2, which is explained throughout this section.

input : T ; (task relevant parameters (cup positions))#episodes ; (#experiments per parameter setting)1

output : gsm ; (generalized success model)

forall cupxy in T do2experience set.clear( );3for i=1:#episodes do4

robotxy = randompos(cupxy);5success? = executescenario(robotxy , cupxy);6experience set.add( 〈robotxy , success?〉 );7

end8boundary = classify(experience set) ; (With SVM)9boundary set.add( 〈cupxy , boundary〉 );10

end11H = alignpoints(boundary set);12〈H,P,B〉 = computePDM(H);13W = [1 T]/BT ; (Mapping from task relevant parameters to B)14

gsm = 〈H,P,W〉15Algorithm 1: Computing a Generalized Success Model.

Line 2-6: Acquiring Training Data. The robot firstgathers training data by repeatedly executing a navigate-reach-grasp action sequence (see Fig. 1). To acquire sufficientdata in little time, we perform the training experiments inthe Gazebo simulator. The robot is modeled accurately, andthus the simulator provides training data that is also validfor the real robot. The action sequence is executed for avariety of task-relevant parameters, i.e. positions of the cupon the table (cupxy), which are stored in the matrix T.The 12 cup positions with which the robot is trained aredepicted in Fig. 4. For each cup position, the action sequencedepicted in Fig. 1 is executed (Line 6) 350 (#episodes)times. After approximately 350 episodes, adding gathering

more data does not improve the accuracy of the learnedmodel (on a fixed test set with 100 episodes). The initialrobot position robotxy for reaching and grasping is randomlysampled (Line 5), and the result success? (whether therobot was able to grasp the cup or not) is stored in a log-file (Line 10). In Fig. 4 depicts succesful and failed grasppositions in green and red respectively.

Line 9: Computing Classification Boundaries. To dis-cern between good and bad places to perform manipulationactions from, the robot needs a compact model of the largeamount of data it has acquired in simulation. To do so, welearn a binary classifier for the observed data with SupportVector Machines (SVM), using the implementation by [17].We used a Gaussian kernel with σ=0.03, and cost parameterC=20.0. Fig. 4 depicts the resulting classification boundariesfor different configurations of task-relevant parameters. Themodels on average classify 5% of examples wrongly whenusing a training/test set that contain 66%/33% of the datarespectively, and 3% when using the training data for testing.

-0.6

-0.4

-0.2

0

0.2

-0.6

-0.4

-0.2

0

0.2

-1.1 -1 -0.9 -0.8 -0.7 -0.6

-0.6

-0.4

-0.2

0

0.2

-1.1 -1 -0.9 -0.8 -0.7 -0.6 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0

-0.3 -0.2 -0.1 0.0

-0.20.0

0.2

-0.3 -0.2 -0.1 0.0

-0.20.0

0.2

-0.3 -0.2 -0.1 0.0

-0.20.0

0.2

Fig. 4. Successful grasp positions and their classification boundaries. Everysub-image shows the boundary that corresponds to the cup position that isvisualized with the black cup. To save space, the table on which the cup isplaced is only shown in the right-most graphs, and not all failed data pointsare drawn. Data points correspond to the center of the robot base.

Line 12-13: Computing the Point Distribution Model.

Fig. 5. Point-alignment

As input a PDM requires n pointsthat are distributed over the contour.We distribute 20 points equidis-tantly over each boundary, anddetermine the correspondence be-tween points on different bound-aries by minimizing the sum ofthe distances between correspond-ing points, while maintaining orderbetween the points on the boundary.The result is depicted in Fig. 5,where only 4 of the 12 classification boundaries are depictedfor clarity. Given the aligned points on the boundaries, we

compute a PDM. Although PDMs are most well-known fortheir use in computer vision, we use the notation by Roduitet al. [15], who focus on robotic applications. First, the 2Dboundaries are merged into one 40x12 matrix H, where thecolumns are the concatenation of the x and y coordinatesof the 20 points along the classification boundary. Each rowrepresents one boundary. The next step is to compute P,which is the matrix of eigenvectors of the covariance matrixof H. Given P, we can decompose each boundary hk inthe set into the mean boundary and a linear combinationof the columns of P as follows hk = H + P · bk. Here,bk is the so-called deformation mode of the kth boundary.This is the Point Distribution Model. To get an intuition forwhat the PDM represents, the first two deformation modesare depicted in Fig. 6(a), where the values of the first andsecond column of B are varied between their extreme values.

1 0.9 0.8 0.70.6

0.5

0.4

0.3

0.2

0.1

0

0.1

0.2

0.3

0.4

1 0.9 0.8 0.70.6

0.5

0.4

0.3

0.2

0.1

0

0.1

0.2

0.3

0.4

Mean of PDM

Hull reconstructedwith PDM

Hull from which PDM is computed

(a) First and second deformation mode in B.

1 0.9 0.8 0.70.6

0.5

0.4

0.3

0.2

0.1

0

0.1

0.2

0.3

0.4

(b) Reconstructing theboundaries from Fig. 4.

Fig. 6. A Generalized Success Model based on a Point Distribution Model.

By inspecting the eigenvalues of the covariance matrix ofH, we determined that the first 2 components already contain96% of the deformation energy. Therefore, we use only thefirst 2 deformation modes, without losing much accuracy.Fig. 6(b) demonstrates that the original 12 boundaries canbe reconstructed well when using combinations of only thefirst two deformation modes.

The advantage of the PDM is not only that it substantiallyreduces the high dimensionality of the initial 40D bound-aries. It also allows us to interpolate between them in aprincipled way using only two deformation parameters. ThePDM is therefore a compact, general, yet accurate model forthe classification boundaries.

Line 14: Relation to task-relevant parameters. The finalstep of model learning is to relate the specific deformationof each boundary (contained in B) to the values of the task-relevant parameters such as cupxy that are varied during datacollection. Since the correlation coefficients between the firstand second deformation modes and the task relevant param-eters T are 0.99 and 0.97 respectively, we simply computethe linear relation between them with W = [1 T]/BT . Thisapproach adheres to the proposed strategy of “learning task-relevant features that map to actions, instead of attemptingto reconstruct a detailed model of the world with which to

plan actions” [9].Line 15: Generalized Success Model. The GSM is con-

structed as a 3-tuple, containing the mean of the classificationboundaries H, the eigenvectors P, and the mapping fromtask relevant parameters to deformation modes W.

V. COMPUTING ARPLACES ON-LINE

In this section, we describe how appropriate ARPLACEsfor manipulation are determined on-line. We call this module’places for manipulation’ (PLA4MAN). As can be seen in thecomputational model in Fig. 3, this module takes the GSMand the estimation of the robot and target object position(as probability distributions) as an input, and returns anARPLACE such as depicted in Fig. 2.

input : gsm = 〈H,P,W〉 ; (generalized success model)objectposition ; (probability distribution, estimated)1robotposition ; (probability distribution, estimated)2

output : arplace ; (probability map)

for i=1 to #samples do3ts = samplefromdistribution(objectposition);4bs = ([1 ts] ·W)T ;5classif boundary set.add( H + P · bs );6

end7

arplace =∑#samples

i=1grid(boundary seti) / #samples;8

arplace = arplace * robotposition ; (Convolution)9Algorithm 2: Computing ARPLACE.

Line 3-6: Monte Carlo simulation. Because of theuncertainty in the estimated cup position, it does not sufficeto compute only one classification boundary given the mostprobable position of the cup as the ARPLACE from whichto grasp. This might lead to a failure if the cup is not at theposition where it is expected. Therefore, we use a Monte-Carlo simulation to generate a probabilistic advice on whereto navigate to grasp the cup.

This is done by taking 100 (#samples) samples from thedistribution of the cup position. For each iteration, this yieldsthe task relevant parameters ts = [xs ys]. The correspondingclassification boundary is determined by first computing theappropriate deformation values from the cup position withbs = ([1 ts] ·W)T , and then using these to compute hs =H + P · bs. The boundary hs estimates the area in whichthe robot should stand to be able to make a successful graspof the cup at position ts = [xs ys].

The distribution of cup position is modeled as a Gaussianwith mean [x y], and covariance matrix

(σ2xx σ

2yx

σ2xy σ

2yy

), which is

provided by our vision-based object localization module [10].In Fig. 7(a), 30 out of the 100 sampled boundaries aredepicted, with a cup position distribution with x=-0.3, y=0.1,σxx=σyy=0.05, σxy=σyx=0.

Line 8: Summation over classification boundaries. Wethen generate a discrete grid in which each cell measures2.5 × 2.5cm, and compute the number of classificationboundaries that classify this cell as a success. Dividingthe result by the overall number of boundaries yields theprobability that grasping the cup will succeed from thisposition. The corresponding probability map, which takes

y

1.2 1 0.8 0.6

0.4

0.2

0

0.2

0.4

0.6

Mean of PDM

Hull for x=-0.3, y=0.1

0.4 0.2

Sampled cup positions

x

(a) Sampled classification boundaries(hs).

x

y

1.2 1 0.8 0.6

0.4

0.2

0

0.2

0.4

0.6

Ste

ep

declin

e

(b) Discretized relative sumof the boundaries.

Fig. 7. Monte-Carlo simulation of boundaries to compute ARPLACE.

the uncertainty of the cup position into account, is depictedin Fig. 7(b).

Line 9: Uncertainty in robot position. The AdaptiveMonte Carlo Localization from the Player project [6] alsoreturns a covariance matrix for the robot’s position. Thisuncertainty must be taken into account in ARPLACE. Forinstance, although any position near to the left of the steepincline in Fig. 7(b) is predicted to be successful, they mightstill fail if the robot is actually more to the right thanexpected. Therefore, we convolve the ARPLACE as depictedin Fig. 7(b) with a discretized (2.5 × 2.5cm) probabilitydistribution of the robot’s position2. Some results of thisconvolution are depicted in Fig. 2 (2D) and 8 (3D). Note thatthis convolution also works for multi-modal distributions asreturned by particle filters.

Fig. 8. These images show how varying certain task-relevantparameters affects the shape of the ARPLACE probability map. Thetable and the cup are drawn to scale in the xy-plane. The video onhttp://www9.cs.tum.edu/people/fedrizzi/icar 09 givesan even better impression.

Fig. 8 depicts how the probability map is affected byvarying task relevant parameters. Please notice in the firstrow, how it becomes ‘more difficult’ (less likely to suc-ceed) to grasp the cup as the cup moves away from thetable’s edge. The probability maps in Fig. 8 represent therobot’s concept of ARPLACE which takes into account theuncertainty in both the pose of the robot and target object.

2Note that this uncertainty is in the current position of the robot (c), notthe position it is navigating to in order to grasp (g). An assumption wemake is that as the robot approaches g, the uncertainty in c will becomecloser and closer to the uncertainty at g, and be equal once g = c.

These distributions are generated from a model that is verymuch grounded in observed experience, as it is learned fromobservation. Note that this concept is also specific for the taskcontext and the skills of the robot. Using a different robotor controller would lead to different observations, and henceto a different concept of successful ARPLACEs. It is theautonomous learning ARPLACE from observed experiencethat enables us to apply the same algorithm to a wide rangeof robots and controllers; an advantage over analytical orhand-coded approaches.

A. Merging ARPLACEs

So far, we have considered ARPLACEs for single objectsO1. In this case we compute the ARPLACE representationRO1 of grasping O1 with the right hand and the ARPLACErepresentation LO1 of grasping O1 with the left hand.Without further constraints the robot will use the right armto grasp O1 if max(RO1) ≥ max(LO1) and the left armotherwise. In the case where two objects have to be moved,there are several possibilities. The robot can either graspeach object individually by moving to O1 and grasping it,then moving to O2 and grasping it. Another possibility is tograsp O1 and O2 from one single position. Our ARPLACErepresentation can handle this generalization easily. Giventhe ARPLACE representations RO1 , LO1 , RO2 , and LO2 , wecan compute the joint ARPLACE probability maps RO1LO2

(the robot grasps O1 with its right arm and O2 with itsleft arm) and RO2LO1 . This is easily done by a piecewisemultiplication of probabilities in the ARPLACE maps, asdepicted in Fig. 2. In the next section, we describe howthe transformational planner uses merged ARPLACEs todetermine the best course of action.

Computing and merging ARPLACEs online as the robotgathers more information about the state of the world is apowerful approach. If a RRT [11] planner is used to computethe optimal configuration C1 to grasp O1 and configurationC2 to grasp O2, it is not straightforward how to merge C1

and C2 to find a configuration C3, from where both objectscan be grasped at once. A RRT approach would have torevise large parts of the prior solutions in a replanning stepto solve for a new task. Moreover the configuration space ofthe new task will be higher, because more joints have to beconsidered when grasping multiple objects. In our approachthe probability of successful execution can be composedfrom more simple solutions through matrix multiplication,which is a computationally cheap operation.

VI. TRANSFORMATIONAL PLANNING WITH ARPLACE

In this paper, we consider the optimization of a pick-uptask: grasping one cup with the left gripper and one withthe right gripper. When executing these actions, the robotfirst drives to a location near the first cup, picks it up, andnavigates to a different location to pick up the second cup.This approach works very reliably, and is independent ofwhere the cups are located relative to each other. But it doesnot perform very well. The overall task could be executedfaster by using the same location for picking up both cups.

It is difficult to solve this in a control program withoutsacrificing generality. The reason is that the two pick-up actions are executed sequentially, and in their defaultimplementation they cannot influence each other or wouldbecome less general. In contrast, a transformational planner,as described in this section, is able to detect and locallymodify the two locations that cause the sub-optimal behavior.

A. Plan Design

We define plans as robot control programs that cannotonly be executed, but also reasoned about. This is important,since it enables a transformational planner to reason aboutthe intention of a specific code part and therefore to inferif a goal has been achieved or not, and what the reason fora failure was. Standard control programs written in RPL[12] are annotated in order to indicate their purpose andmake them transparent to the transformational planner. Forexample, actions that must be performed at a certain locationare executed within the context of an at-location block. Themost important RPL instructions for semantic annotation inthe context of pick-and-place tasks are achieve, perceive andat-location. In the context of this paper, we will not give aformal definition of the semantics of these instructions butwill describe them only informally.

The achieve statement asserts that a successful executionof the statement implies that the logical expression passedas its argument must hold after execution. For example, thestatement (achieve (entity-picked-up ?cup)) states that afterexecuting this instruction, the object referenced by variable?cup must be in the robot’s gripper3.

Before manipulating objects, the robot must find theobjects and instantiate them in its belief state. The statement(perceive ?cup) guarantees that after executing it, the objectreferenced by ?cup has been found and a reference to itsinternal representation is returned.

Manipulation implies the execution of actions at specificlocations. Therefore, it must be assured that pick-up actionsare only executed when the robot is at a specific location. (at-location ?location ...) asserts that code within its context iseither executed at the specified location or fails. Please notethat transformations which affect the location where actionsare performed directly modify the ?location parameter ofsuch at-location expressions. Therefore, at-location is themost important declarative plan expression for optimizingARPLACEs.

The declarative expressions explained above form a codetree. Every achieve statement can contain several furtherachieve, perceive and at-location statements. For instance,the goal (achieve (entity-at-location ?object ?location)) firstperceives the object, then picks it up by achieving entity-picked-up, which executes the pick-up action within anat-location block, and puts the object down by achievingentity-put-down, which also contains an at-location block.Transformation rules are implemented to replace sub-trees

3Please note the lisp syntax, where variables are prefixed with a ’?’, forexample ?cup, and predicates and functions are pure symbols.

within the code tree by new code. The plan is executed byinterpreting every node in the code tree and a so-called tasktree is generated. The task tree contains information aboutplan execution the code tree does not contain, but which isnecessary to find behavior flaws.

B. Transformational Planning

A transformational planning system consists of three maincomponents: 1) a projection mechanism for predicting theoutcome of a plan, 2) a mechanism for detecting behaviorflaws within the predicted plan outcome, and 3) mechanismsto fix the detected flaws by applying transformation rules tothe plan code. Planning is performed by repeatedly perform-ing these steps until the resulting plan cannot be optimizedany more, or a timeout occurs.

Transformational planning enables the robot to detect andfix behavior flaws, such as 1) collisions, e.g. caused by under-parameterized goal locations; 2) blocked goals, e.g. when achair is standing at a location the robot wants to navigate to;3) flaws affecting performance, as in our example of pickingup two cups from two different initial locations.

1) Plan Projection: A central component of a transforma-tional planning system is an accurate prediction mechanismthat generates, based on the plan code, a temporally orderedset of events. For projecting plans, we again use the Gazebosimulator. To not only simulate the execution of plans,but record the events generated by the interaction with thesimulated world, we extended Gazebo with plug-ins thatsignal collisions, perception events, and location changesof the robot and objects. Projection of a plan generates anexecution trace that contains the state of the plan, the beliefstate of the robot and the state of the simulated world forevery point in time.

2) Behavior Flaws and Reasoning about Plan Execution:The second component of a transformational planner isa reasoning engine to find pre-defined flaws in the robotbehavior. Behavior flaws include not only critical errors inplan execution, like collisions, but also behavior affecting theperformance of the executed plan. The latter is investigatedin this paper. Behavior flaws are specified using a Prolog-likereasoning engine implemented in Common Lisp. Expressionsare written in Lisp syntax and have the form

(〈predicate〉 [param]∗)

== is unification and thnot is a weak inversion predicate,that holds true when it cannot be proven that the passedterm holds true.

The execution trace generated by plan projection is trans-parently integrated into the reasoning engine, i.e. the execu-tion trace is queried using Prolog predicates. In combinationwith a set of facts modeling the semantics of declarativeexpressions such as achieve and at-location and concepts ofthe world, for instance that objects are placed on “supportingplanes” (table, cup-board, ...), the information recorded inthe execution trace is a central component in order to findbehavior flaws.

Behavior flaws can be separated in several classes whichcan further be specialized. Our planner is aware of the twomain classes “detected flaw”, which represents errors therobot control program was able to detect and “undetectedflaw”, flaws that can only be inferred from the executiontrace. It includes flaws such as collisions, misperceivedobjects, but also performance flaws. Listing 1 shows thespecification of the flaw generated by two pick-up actionswhich are executed at different initial locations. The codewill be explained in Section VI-C.

Listing 1. Definition of the performance flaw1 ( def−b e h a v i o r−f l aw unop t imized−l o c a t i o n s2 : s p e c i a l i z e s pe r fo rmance−f l aw3 : f l aw ( and4 ( t a s k−g o a l ? t a s k−15 ( a c h i e v e ( e n t i t y−p icked−up ? o b j e c t −1)))6 ( t a s k−g o a l ? t a s k−27 ( a c h i e v e ( e n t i t y−p icked−up ? o b j e c t −2)))8 ( t h n o t (== ? t a s k−1 ? t a s k −2))9 ( o p t i m i z e d−a c t i o n−l o c a t i o n

10 ? o b j e c t−1 ? o b j e c t−211 ? o p t i m i z e d−l o c a t i o n ) ) )

3) Plan Transformations and Transformation Rules: Aftera behavior flaw has been detected, the last step of a planneriteration is the application of a transformation rule to fix thebehavior flaw. Transformation rules are applied to parts ofthe code tree formed by the plan code and cause substantialchanges in its structure and the corresponding robot behavior.A transformation rule consists of three parts. The inputschema is matched against the plan part to be transformed,the transformation part performs transformations on thematched parts, and the output plan describes how the newcode of the respective plan part has to be reassembled.

input schemaoutput plan

transformation

C. Optimization using ARPLACE

Besides the integration of ARPLACE into the robot controlprogram, it is also integrated into the reasoning engine of ourtransformational planner. Using two locations for grasping isconsidered a performance flaw if one would suffice. Infor-mally, we investigate the execution trace for the occurrenceof two different pick-up actions, where one is executed atlocation L1, and the other one is executed at location L2.Then we request a location L3 to perform both actions andthe corresponding probability. If the probability of success issufficiently high, we apply a plan transformation, and replacelocations L1 and L2 by location L3.

More specifically, Listing 1 shows the definition of thebehavior flaw. The flaw is in the class of performance flaws,i.e. specializing the flaw performance-flaw (line 2). In lines4 to 8, two different pick-up tasks are matched, and thecorresponding variables are bound. For that, it uses thepredicate task-goal, which asserts that a task successfullyachieves the corresponding goal according to the semanticsof achieve. Finally, in line 9, the ARPLACE system is queriedfor a location to grasp both objects, ?object-1 and ?object-2. The predicate only holds true when the probability of

the new location is sufficiently high (>0.85), i.e. the flaw isdetected only if a better location exists. For more detailedinformation on transformation rules and their application forplan optimization, please see [14].

Note that “sufficiently high” depends very much on thescenario context. In robotic soccer for instance, it can bebeneficial to choose fast and risky moves, whereas in safehuman-robot interaction, certainty of succesful execution ismore important than mere speed. This paper focusses onprincipled ways of integrating such thresholds in a transfor-mational planner, and relating them to grounded models ofthe robot’s behavior. What these thresholds should be, andhow they are determined, depends on the application domainand the users.

VII. EMPIRICAL EVALUATION OF ARPLACE

The hardware platform we use for our experiments is aB21r mobile robot from Real World Interface. Its wheelsallow this round robot to move forward and turn around itscenter axis. Two 6-DOF lightweight arms from Amtec withslide grippers are mounted on this base, allowing for themanipulation of objects at table height.

For localization and navigation, we use several standardmodules from the Player project [6], being Adaptive MonteCarlo Localization, pmap for map building, and the AMCLWavefront Planner for global path planning. These modulesuse the SickLMS400 laser range scanner and odometryprovided by the base. For reaching and grasping, we usea combination of Dynamic Movement Primitives [8] andVector Fields. The inverse kinematics computations are per-formed using the Kinematics and Dynamics Library (KDL)from Orocos [16]. Detection and localization of the objectsto be grasped is done using the method described in [10].For debugging and efficient data collection purposes, we alsouse the Gazebo simulator [6].

At a day of open house, our B21 mobile manipulation plat-form continually performed an application scenario, where itlocates, grasps, and lifts a cup from the table and moves it tothe kitchen oven. Fig. 9 shows two images taken during thedemonstration. The robot performed this scenario 50 times inapproximately 6 hours, which has convinced us that the robothardware and software are robust enough to be deployedamongst the general public.

First, we compared the use of ARPLACE for graspinga single cup (without plan transformations) with anotherstrategy which we call FIXED. FIXED implements the well-in-reach strategy by always moving to a location that hasthe same relative offset to the target object. The relativelocation is chosen to be the offset with the best possibleoverall performance. The cup is placed in three differentlocations. In one experimental episode, we first determinethe real position of the cup, and sample an observed positiongiven the real position po of the cup and the covariancematrix C(po). Given the estimated cup position, the robotthen uses the PLA4MAN or well-in-reach module to computean ARPLACE and performs the manipulation action. Whenthe robot is able to perform the manipulation task after

moving to the proposed position we mark the experiment asSUCCESS. Otherwise we mark the experiment as FAILED.

Fig. 9. A reach-grasp sequence per-formed at a public demonstration.

Succ

ess

(%

)

0

10

20

30

40

50

60

70

80

90

Pla4ManFixed

Covariance matrix of object localization

Fig. 10. Result of the empiricalevaluation.

Fig. 10 shows the results of the evaluation. Naturally,the performance of both methods decreases, as the robotbecomes more and more uncertain about the cup pose. Oneresult is that PLA4MAN always performs better than FIXED.We computed the significance of this performance increasewith the χ2 test; the p-values are depicted in Fig. 10. Theonly case when the increase is not significant is when thereis no uncertainty, a situation that does not arise on the realrobot. Another important result is that when the uncertaintyrises, the performance of FIXED suffers more than theperformance of PLA4MAN. This can be explained by thefact that PLA4MAN tries to stay away from steep declines,when the estimations of the robot get more uncertain.

We then evaluated the merging of ARPLACEs for jointgrasping, and applying transformation rules with our RPLplanner. Two cups are placed on the table, where the dis-tance between them is varied between 20 and 60cm, withincrements of 5cm. Our evaluation shows that grasping twocups from separate positions requires on average 48 seconds,independent of the relative distance of the cups to eachother. By applying transformation rules, the default planis optimized to 32 seconds, which is a significant (t-test:p < 0.001) and substantial performance gain of 50%. Above45cm, two cups cannot be grasped from one position, andplan transformation is not applied.

VIII. CONCLUSION

In this article, we presented a system that enables robotsto learn a concept of ARPLACE that is compact, groundedin observed experience, and tailored to the robot’s hardwareand controller. ARPLACE is modeled as a probability map,which enables the robot to perform least-commitment plan-ning, instead of prematurely committing itself to specificpositions that could be suboptimal. We presented a trans-formational planner that uses ARPLACEs to determine iftransformation rules for optimizing plans should be applied.Our empirical evaluation has shown that ARPLACEs improvethe robustness of grasping, and when combined with thetransformational planner, leads to a substantial improvementin plan execution duration.

We are currently extending our approach in several direc-tions. We are applying our approach to more complex sce-narios, and different domains. For instance, we are learninghigher-dimensional ARPLACE concepts. New aspects thatwe are taking into account, are different kinds of objectswhich require different kinds of grasps. We are also investi-gating extensions and other machine learning algorithms thatwill enable our methods to generalize over larger spaces.

ACKNOWLEDGEMENTS

The research described in this article is funded by theCoTeSys cluster of excellence (Cognition for Technical Sys-tems, http://www.cotesys.org), part of the Excel-lence Initiative of the DFG.

REFERENCES

[1] M. Beetz and D. McDermott. Declarative goals in reactive plans.In J. Hendler, editor, First International Conference on AI PlanningSystems, pages 3–12, Morgan Kaufmann, 1992.

[2] M. Beetz. Structured Reactive Controllers. Journal of AutonomousAgents and Multi-Agent Systems. Special Issue: Best Papers of theInternational Conf. on Autonomous Agents ’99, 4:25–55, 2001.

[3] D. Berenson, H. Choset, and J. Kuffner. An optimization approachto planning for mobile manipulation. In Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRA), 2008.

[4] R. Bohlin and L. Kavraki. Path planning using lazy prm. In In IEEEInternational Conference Robototics and Automation, 2000.

[5] M. Friedman and Daniel S. Weld. Least-commitment action selection.In In Proc. 3rd International Conf. on A.I. Planning Systems, 1996.

[6] B. Gerkey, R. Vaughan, and A. Howard. The Player/Stage Project:Tools for multi-robot and distributed sensor systems. In Proc. of the11th International Conference on Advanced Robotics (ICAR), 2003.

[7] S. Hart, S. Ou, J. Sweeney, and R. Grupen. A framework for learningdeclarative structure. In RSS-06 Workshop: Manipulation for HumanEnvironments, 2006.

[8] A. J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation withnonlinear dynamical systems in humanoid robots. In InternationalConference on Robotics and Automation (ICRA2002), 2002.

[9] C. Kemp, A. Edsinger, and E. Torres-Jara. Challenges for robotmanipulation in human environments. IEEE Robotics and AutomationMagazine, 14(1):20–29, 2007.

[10] U. Klank, M. Zia, and M. Beetz. 3D Model Selection from an InternetDatabase for Robotic Vision. In International Conference on Roboticsand Automation (ICRA), 2009.

[11] S. M. LaValle. Planning Algorithms, chapter Chapter 5: Sampling-Based Motion Planning. Cambridge University Press, 2006.

[12] D. McDermott. A Reactive Plan Language. Research ReportYALEU/DCS/RR-864, Yale University, 1991.

[13] D. McDermott. An algorithm for probabilistic, totally-ordered tempo-ral projection. In O. Stock, editor, Spatial and Temporal Reasoning.Kluwer Academic Publishers, Dordrecht, 1997.

[14] A. Muller, A. Kirsch, and M. Beetz. Transformational planning foreveryday activity. In Proceedings of the 17th International Conferenceon Automated Planning and Scheduling (ICAPS), 2007.

[15] P. Roduit, A. Martinoli, and J. Jacot. A quantitative method for com-paring trajectories of mobile robots using point distribution models. InProceedings of the IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS), pages 2441–2448, 2007.

[16] R. Smits, T. De Laet, K. Claes, P. Soetens, J. De Schutter, andH. Bruyninckx. Orocos: A software framework for complex sensor-driven robot tasks. IEEE Robotics and Automation Magazine, 2008.

[17] S. Sonnenburg, G. Raetsch, C. Schaefer, and B. Schoelkopf. Largescale multiple kernel learning. Journal of Machine Learning Research,7:1531–1565, 2006.

[18] F. Stulp and M. Beetz. Refining the execution of abstract actionswith learned action models. Journal of Artificial Intelligence Research(JAIR), 32, June 2008.

[19] G. J. Sussman. A computational model of skill acquisition. PhD thesis,Massachusetts Institute of Technology, 1973.

[20] F. Zacharias, C. Borst, and G. Hirzinger. Positioning mobile ma-nipulators to perform constrained linear trajectories. In Proc. of theInternational Conf. on Intelligent Robots and Systems (IROS), 2008.

Transformational Planning for Mobile Manipulation based on … · Transformational Planning for Mobile Manipulation based on Action-related Places Andreas Fedrizzi, Lorenz Mosenlechner,

Documents