This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleRobot Obstacle Avoidance Learning Based on Mixture Models
Huiwen Zhang12 Xiaoning Han12 Mingliang Fu12 and Weijia Zhou1
1State Key Laboratory of Robotics Shenyang Institute of Automation Shenyang 110016 China2University of Chinese Academy of Sciences Beijing China
Correspondence should be addressed to Huiwen Zhang zhanghuiwensiacn
Received 27 May 2016 Accepted 21 July 2016
Academic Editor Keigo Watanabe
Copyright copy 2016 Huiwen Zhang et alThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
We briefly surveyed the existing obstacle avoidance algorithms then a new obstacle avoidance learning framework based onlearning from demonstration (LfD) is proposed The main idea is to imitate the obstacle avoidance mechanism of human beingsin which humans learn to make a decision based on the sensor information obtained by interacting with environment Firstly weendow robots with obstacle avoidance experience by teaching them to avoid obstacles in different situations In this process a lotof data are collected as a training set then to encode the training set data which is equivalent to extracting the constraints of thetask Gaussian mixture model (GMM) is used Secondly a smooth obstacle-free path is generated by Gaussian mixture regression(GMR) Thirdly a metric of imitation performance is constructed to derive a proper control policy The proposed frameworkshows excellent generalization performance whichmeans that the robots can fulfill obstacle avoidance task efficiently in a dynamicenvironmentMore importantly the framework allows learning a wide variety of skills such as grasp andmanipulationwork whichmakes it possible to build a robot with versatile functions Finally simulation experiments are conducted on a Turtlebot robot toverify the validity of our algorithms
1 Introduction
With the development of robot technology especially thedevelopment of computer science and cognitive sciencesome promising technologies come out such as deep learningand computer vision These technologies greatly improve theintelligence level of robots and also laid the foundation forthe modern intelligent robots Unlike the traditional matureindustrial robots intelligent robots are expected to coexistwith people such as service robots To achieve this purposerobots are required to have strong environmental adaptabilityand autonomous decision-making ability Obstacle avoidanceand planning are basic skills of robots which are alsoresearch focus in the field of robotics There have been alot of mature algorithms for obstacle avoidance but how todevelop obstacle avoidance algorithmwith strong robustnessespecially algorithms that can make robots fulfill task underunstructured environment is still worthy of further research
The obstacle avoidance algorithms can be classifiedinto traditional obstacle avoidance algorithms and intelli-gent algorithms based on the developments history In [1]
the authors concluded them as classical approaches andheuristic approaches In [2] the authors called them classicalalgorithms and evolutionary algorithms Here the traditionalapproaches cover the same content while the intelligentalgorithms combine somemachine learningmethods appliedto robot obstacle avoidance in recent years which is differ-ent from the terms coined by other authors and covers awider range Classical obstacle avoidance algorithms can becategorized into three classes based on their main idea (i)methods based on geometric model (ii) methods based onvirtual field and (iii) methods based on mathematical pro-gramming Details of classic obstacle avoidance algorithmscan be found in [3] Methods based on geometric models canbe divided into two steps First build the geometric model ofthe environment and then choose the right kind of searchalgorithms The main idea of virtual field methods is toassume that all robots stay in a virtual field the change ofvirtual field reflects the feasible solution for the path Thisconcept is first put forward and used in [4 5] Mathematicalprogramming methods represent constraints for obstacle assome parameterized inequities in configuration space so
Hindawi Publishing CorporationJournal of RoboticsVolume 2016 Article ID 7840580 14 pageshttpdxdoiorg10115520167840580
2 Journal of Robotics
a geometric problem is turned into a mathematical problemthe best obstacle-free path is a solution for the inequitieswhich can be solved by optimization methods All thesemethods have been successfully used but the traditionalmethod generally is only applicable to the static environmentand easy to fall into a local optimal solution To solve theseproblems more and more intelligent approaches are used By2003 more than half of articles use heuristic algorithms tosolve obstacle avoidance problem [1]
The core idea of heuristic algorithms is to simulate someintelligent behaviors of animals or people Algorithms aredesigned by bionic mechanism There are a large number ofrepresentative algorithms such as Ant Colony Optimization(ACO) algorithm Particle SwarmOptimization (PSO) Algo-rithm and Genetic Algorithm (GA) In [6] ACO is used asa global path planning method to avoid obtaining a localoptimal solution when using potential field method In [7]distance and memory feature are introduced to speed up thecomputation of the ACO The emergence of PSO derivedfrom the simulation of birdsmigrating behavior Applicationsof particle swarm optimization to solve the path planningproblem include [8ndash10] GA derived from the theory ofevolution and genetics is a kind of global probabilisticoptimization algorithm In [11] a special encoding technologyis used to speed up the path generation The fitness functionis designed according to security and distance relative to atarget and then a barrier-free path is solved through a geneticalgorithm In [12] a variable-length chromosome geneticalgorithm is designed for path planning In [13] the authorsuse artificial network to solve the obstacle avoidance problemof a redundant manipulator An improved Grossberg neuralnetwork method is used in [14] to solve the UAV pathplanning under a ldquodanger moderdquo Neural network can alsorealize the obstacle avoidance in conjunction with othermethods such as fuzzy neural network learning [15] Amongall the intelligent algorithms there are broad categoricalapproaches which are based on the learning A representativework is using reinforcement learning Reinforcement learn-ing derives from the psychology The main idea is throughconstant interaction with the environment and ldquotrial anderrorrdquo to maximize the reward defined by a reward function[16] Based on this concept a lot of obstacle avoidancealgorithms are proposed [17ndash22] It is worth noting thatwith a great improvement of hardware and the developmentof deep learning theory some methods can directly learnobstacle avoidance strategy from the sensor data called end-to-end learning [23] Intelligent algorithms usually can obtainthe global optimal solution and have good ability to adaptto the environment But often these methods bear highcomputational complexity and bad real-time performance
In this paper we proposed a kind of obstacle avoidancemethod based on teaching and learning First learning sam-ple is obtained by showing robots how to avoid the obstacleand thenusing probabilisticmixturemodel for extraction andencoding task constraints autonomously Finally obstacle-free path is generated by GMR Our approach differs fromprevious work in the following (i) it is more in line with thehabit of human cognition drawing lessons from imitationlearning and (ii) the algorithm has good generalization
ability and can adapt to dynamic unstructured environmentBesides the computational complexity is low and in par-ticular the path generation process can be realized onlineefficiently (iii) the learning framework not only can be usedto learn obstacle avoidance but also can be used for learningother skills which is a key point to achieve generalizedartificial intelligence (GAI)
Themain contributions of the paper are as follows (i) thefirst contribution is using LfD for the obstacle avoidance taskfor the first time making up for the disadvantages of othermethods (ii) due to the nature of LfD nontechnical peoplecan programme the robots to realize obstacle avoidance func-tion (iii) obstacle avoidance learning using mixture modelsis able to adapt the environmental changes and dynamicobstacles and at the same time shares low computationalcomplexity
The remainder of the paper is organized as followsIn Section 2 some background knowledge is introducedincluding the concepts of LfD and GMM Section 3 intro-duces parameters solving methods for GMM Section 4 givesa detailed introduction of the obstacle avoidance learningarchitecture In Section 5 simulation experiments based on aTurtlebot robot are conducted In the end conclusion ismadeand further improvements are introduced
2 Learning from Demonstration
LfD is a paradigm which allows robots to automaticallyacquire new skills Unlike the traditional robots to fulfill atask a previous task decomposition work and programmingfor each action need to be done by humans LfD adopts thepoint that robotrsquos skills can be acquired by observing thehuman operationThis approach has the advantages of simpleprogramming and even people who do not understandprogramming technique can programme robots throughteaching Besides due to the nature of LfD robots are moreadvantageous to adapt to new environment LfD is alsonamed robot Programming by Demonstration (PbD) (Imi-tation Learning and Apprenticeship Learning) Learning byDemonstration (LbD) Learning by Showing Learning byWatching Learning from Observation and so forth It isworth noting that some authors think that imitation learningand LfD have different meanings in [24] the authors distin-guish imitation learning and LfD according to how learningdata is recorded and how the imitator uses the data in thelearning process
21 Formulation of LfD Problem Solving process for LfD canbe divided into two steps First acquire training data seconduse the obtained data to derive a control policy We use 119878
and 119860 to represent state set and action set respectively Theprobability distribution function 119879(119904
1015840| 119904 119886) 119878 times 119860 times 119878 rarr
[0 1] represents the transfer probability from state (119904 119886)
to state 1199041015840 An experience dataset 119863 should include this
information Actually in most situations the true states 119878 areunknown the observable states are 119885 and then a learningproblem can be defined as deriving a policy 120587mapping form119885 to 119860 For an obstacle avoidance task the problem can
Figure 1 Framework of learning form demonstration (LfD)
be described as finding the more general barrier-free path(policy) from the demonstrated barrier-free path (sample119863)Themain process for LfD framework is illustrated in Figure 1
22 Main Research Topics in LfD Nehaniv and Dautenhahn[25] summarized themain problems in LfD as four issues (1)What to imitate (2) How to imitate (3) When to imitate(4) Whom to imitate So far only the former two problemsare solved For the problem of robot obstacle avoidancewhat to imitate means the robot needs to extract the rightconstraints which can characterize the task and problem ofhow to imitate refers to generating a trajectory that satisfiesthe constraints obtained in the previous step
In order to solve the above problems (mainly the formertwo problems) there are three main ideas
(i) Directly get a map function 120587 = 119891(119911) from trainingset 119863 = (119911
119894 119886119894) where 119911 stands for the observed
state This idea treats the problem as supervisedlearning problem
(ii) Learn a system model namely the return functionand state transition probability distribution formtraining set and then derive a policy using MDPmethods
(iii) Learn the pre- or postcondition of each action andthen obtain control sequence using programmingmethods
For the obstacle avoidance problem through learning thebarrier-free path from teachers to produce a generalized pathcan be seen as a supervised regression problem GMM is aprobabilistic model which is suitable to model the naturalvariations in human and robot motionsTheoretically GMMcan model any probabilistic density functions (PDFs) byincreasing the number of Gaussian components so GMMhas strong encoding capacity Finally movements recogniz-ing predicting and generating can be integrated togetherwithin the same GMM and GMR encoding strategy So itis very favorable for us to use GMM encoding our priordemonstration information and use GMR as the solution tothe regression problem mentioned above Details of GMMtheory are described below
3 Gaussian Mixture Model
For a probability density estimation problem if in additionto the known observation variables there may exist some
unobservable latent variables but we know the distributionof the latent variable and the conditional distribution of theobservable variable given the latent variable then we can usemixture models to estimate the parameters of the distribu-tion so as to get the approximate probability distribution ofthe known data Before introducing GMM we deviate to talkabout Gaussian distribution
31 Marginal and Conditional Distribution of a MultivariateGaussian Distribution Assume that we have a random vari-able
119909 = [1199091
1199092
] (1)
in which 1199091isin R119903 119909
2isin R119904 119909 isin R119903+119904 119909 sim N(120583 Σ) where
120583 = [1205831
1205832
]
Σ = [Σ11
Σ12
Σ21
Σ22
]
(2)
And 1205831isin R119903 120583
2isin R119904 120583 isin R119903+119904 Σ
11isin R119903times119903 and Σ
12isin
R119903times119904 Noting that the covariance matrix is symmetric so wehave Σ
12= Σ21
119879Because the marginal distribution of a Gaussian is also
a Gaussian distribution it is easy to get that 119864[1199091] = 120583
1
cov(1199091) = Σ
11 119864[1199092] = 1205832 and cov(119909
2) = Σ
22 and we have
1199091sim N(120583
1 Σ11) and 119909
2sim N(120583
2 Σ22) As a step forward we
can prove 1199091| 1199092sim N(120583
1|2 Σ1|2
) where
1205831|2
= 1205831+ Σ12Σminus1
22(1199092minus 1205832)
Σ1|2
= Σ11
minus Σ12Σminus1
22Σ21
(3)
32 Formulation of GMM Assuming that we have a trainingset 119863 = 119909
(1) 119909(2)
119909(119898)
containing 119898 samples everysample 119909(119894) is a vectorWe are trying to find a density functionto describe the data with the least error which is equal tomaximizing the probability of the data In order to representthis probability we need to make some assumptions Firstwe assume that the latent variable 119911
(119894)sim Multinomial(120601) in
which 120601119895ge 0 sum
119896
119895=1120601119895= 1 besides we suppose 119909
(119894)| 119911(119894)
=
119895 sim N(120583119895 Σ119895) So we can model the data in our hands by
Figure 2 System architecture of obstacle avoidance learning
adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as
119897 (120601 120583 Σ) =
119898
sum
119894=1
log119901 (119909(119894) 120601 120583 Σ)
=
119898
sum
119894=1
log119896
sum
119911(119894)=1
119901 (119909(119894)
| 119911(119894) 120583 Σ) 119901 (119911
(119894) 120601)
(4)
Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)
33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows
E-step for each 119894 and 119895 compute the poster-probability of119911(119894)
iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange
4 The Architecture of ObstacleAvoidance Learning
Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general
Journal of Robotics 5
Start
GoalObstacle
x120579
120596
dro
drg
Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro
119889rg
represent location orientation and distance information of therobot respectively
trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail
41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target
We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use
(119909 120596 119889ro 119889
rg) to represent the sensor informationmentioned
above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task
In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909
119904 119909119905 120596 = 120596
119904 120596119905
119889ro
= 119889ro119904 119889
ro119905 and 119889
rg= 119889
rg119904 119889
rg119905 119909119904isin R119873times2 represent
location 120596 isin R represent orientation 119889ro119904
isin R119873times2 representdistance between robot and obstacles and 119889
rg119904
isin R119873times2
represent distance between robot and goal state Supposingcoordinates of the goal are 119900
where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities
rg119904
= 119904119894119895
minus 119904119894 (9)
42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data
43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585
in which 120587119896 120583119896 Σ119896 are model parameters representing
prior-probability mean value and variance respectively
6 Journal of Robotics
0
2
4
6x
(m)
minus1
0
1
120596(r
ads
)
0
2
4
dro
(m)
0
2
4
drg
(m)
100 2000t (s)
100 2000t (s)
100 2000t (s)
100 2000t (s)
(a)
0
2
4
6
x(m
)
50 100 150 2000t (s)
minus1
0
1
120596(r
ads
)
50 100 150 2000t (s)
0
2
4
dro
(m)
50 100 150 2000t (s)
0
2
4
drg
(m)
50 100 150 2000t (s)
(b)
Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing
To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587
119896 120583119896 Σ119896 that
is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which
is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I
119905 120585
O119905) which represents input and
output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum
119870
119894=1120587119894N(120583119894 Σ119894) and 120585
I119905
solve the density probability distribution for 120585O119905 Solution to
this problem is called Gaussian mixture regression Regard-ing 120585
I119905as query points 120585O
119905can be estimated using regression
methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585
119905 vectors
120583119896 and matrices Σ
119905can be written as
120585119905= [
120585I119896
120585O119896
]
120583119896= [
120583I119896
120583O119896
]
Σ119896= [
ΣI119896
ΣIO119896
ΣOI119896
ΣO119896
]
(11)
Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are
O119896(120585
I119905) = 120583
O119896+ Σ
OI119896
ΣIminus1119896
(120585I119905
minus 120583I119896)
ΣO
119896= Σ
O119896minus Σ
OI119896
ΣIminus1119896
ΣIO119896
(12)
Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585
O119905given 120585
I119905are
119875 (120585O119905| 120585
I119905) sim
119870
sum
119896=1
ℎ119896(120585
I119905)N (
O119896(120585
I119905) Σ
O
119896) (13)
Journal of Robotics 7
where
ℎ119896(120585
I119905) =
120587119894N (120585
I119905
| 120583I119894 Σ
I119894)
sum119870
119896=1120587119896N (120585
I119905
| 120583I119896 Σ
I119896)
(14)
Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution
119875 (120585O119905| 120585
I119905) = N (
O119905 Σ
O
119905) with
120583O119905=
119870
sum
119894=1
ℎ119894(120585
I119905) 120583
O119894(120585
I119905)
ΣO
119905=
119870
sum
119894=1
ℎ119894(120585
I119905) (Σ
O
119894+
O119894(120585
I119905)
O119894(120585
I119905)119879
)
minus O119905O119879119905
(15)
Substituting 120585I119905into 120585
O119905using equations described above
a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property
44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O
119905is equal to 120585
119904 and 119904 represents
spatial location information Using 119904represents the desired
trajectory and 120585119904represents the actual reached location In
order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance
119863minus1
sum
119894=1
120596119894(120585119904119894
minus 119904119894)2
= (120585119904minus 119904)119879
119882(120585119904minus 119904) (16)
Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909
120596
119889ro
119889rg
represent the desired trajectory the actualtracking performance 119875 can be expressed as
According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows
119871 (119909
119904 120596
119904 119889ro
119904 119889rg
119904 120582) = 119875 + 120582
119879(119909
119904minus 119889rg
119904) (19)
where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for
Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely
119882 = (ΣO119905)minus1
(23)
5 Simulation Experiments
Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that
8 Journal of Robotics
StartGoal
x
y
(a)
x
y
(b)
(c) (d)
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
a geometric problem is turned into a mathematical problemthe best obstacle-free path is a solution for the inequitieswhich can be solved by optimization methods All thesemethods have been successfully used but the traditionalmethod generally is only applicable to the static environmentand easy to fall into a local optimal solution To solve theseproblems more and more intelligent approaches are used By2003 more than half of articles use heuristic algorithms tosolve obstacle avoidance problem [1]
The core idea of heuristic algorithms is to simulate someintelligent behaviors of animals or people Algorithms aredesigned by bionic mechanism There are a large number ofrepresentative algorithms such as Ant Colony Optimization(ACO) algorithm Particle SwarmOptimization (PSO) Algo-rithm and Genetic Algorithm (GA) In [6] ACO is used asa global path planning method to avoid obtaining a localoptimal solution when using potential field method In [7]distance and memory feature are introduced to speed up thecomputation of the ACO The emergence of PSO derivedfrom the simulation of birdsmigrating behavior Applicationsof particle swarm optimization to solve the path planningproblem include [8ndash10] GA derived from the theory ofevolution and genetics is a kind of global probabilisticoptimization algorithm In [11] a special encoding technologyis used to speed up the path generation The fitness functionis designed according to security and distance relative to atarget and then a barrier-free path is solved through a geneticalgorithm In [12] a variable-length chromosome geneticalgorithm is designed for path planning In [13] the authorsuse artificial network to solve the obstacle avoidance problemof a redundant manipulator An improved Grossberg neuralnetwork method is used in [14] to solve the UAV pathplanning under a ldquodanger moderdquo Neural network can alsorealize the obstacle avoidance in conjunction with othermethods such as fuzzy neural network learning [15] Amongall the intelligent algorithms there are broad categoricalapproaches which are based on the learning A representativework is using reinforcement learning Reinforcement learn-ing derives from the psychology The main idea is throughconstant interaction with the environment and ldquotrial anderrorrdquo to maximize the reward defined by a reward function[16] Based on this concept a lot of obstacle avoidancealgorithms are proposed [17ndash22] It is worth noting thatwith a great improvement of hardware and the developmentof deep learning theory some methods can directly learnobstacle avoidance strategy from the sensor data called end-to-end learning [23] Intelligent algorithms usually can obtainthe global optimal solution and have good ability to adaptto the environment But often these methods bear highcomputational complexity and bad real-time performance
In this paper we proposed a kind of obstacle avoidancemethod based on teaching and learning First learning sam-ple is obtained by showing robots how to avoid the obstacleand thenusing probabilisticmixturemodel for extraction andencoding task constraints autonomously Finally obstacle-free path is generated by GMR Our approach differs fromprevious work in the following (i) it is more in line with thehabit of human cognition drawing lessons from imitationlearning and (ii) the algorithm has good generalization
ability and can adapt to dynamic unstructured environmentBesides the computational complexity is low and in par-ticular the path generation process can be realized onlineefficiently (iii) the learning framework not only can be usedto learn obstacle avoidance but also can be used for learningother skills which is a key point to achieve generalizedartificial intelligence (GAI)
Themain contributions of the paper are as follows (i) thefirst contribution is using LfD for the obstacle avoidance taskfor the first time making up for the disadvantages of othermethods (ii) due to the nature of LfD nontechnical peoplecan programme the robots to realize obstacle avoidance func-tion (iii) obstacle avoidance learning using mixture modelsis able to adapt the environmental changes and dynamicobstacles and at the same time shares low computationalcomplexity
The remainder of the paper is organized as followsIn Section 2 some background knowledge is introducedincluding the concepts of LfD and GMM Section 3 intro-duces parameters solving methods for GMM Section 4 givesa detailed introduction of the obstacle avoidance learningarchitecture In Section 5 simulation experiments based on aTurtlebot robot are conducted In the end conclusion ismadeand further improvements are introduced
2 Learning from Demonstration
LfD is a paradigm which allows robots to automaticallyacquire new skills Unlike the traditional robots to fulfill atask a previous task decomposition work and programmingfor each action need to be done by humans LfD adopts thepoint that robotrsquos skills can be acquired by observing thehuman operationThis approach has the advantages of simpleprogramming and even people who do not understandprogramming technique can programme robots throughteaching Besides due to the nature of LfD robots are moreadvantageous to adapt to new environment LfD is alsonamed robot Programming by Demonstration (PbD) (Imi-tation Learning and Apprenticeship Learning) Learning byDemonstration (LbD) Learning by Showing Learning byWatching Learning from Observation and so forth It isworth noting that some authors think that imitation learningand LfD have different meanings in [24] the authors distin-guish imitation learning and LfD according to how learningdata is recorded and how the imitator uses the data in thelearning process
21 Formulation of LfD Problem Solving process for LfD canbe divided into two steps First acquire training data seconduse the obtained data to derive a control policy We use 119878
and 119860 to represent state set and action set respectively Theprobability distribution function 119879(119904
1015840| 119904 119886) 119878 times 119860 times 119878 rarr
[0 1] represents the transfer probability from state (119904 119886)
to state 1199041015840 An experience dataset 119863 should include this
information Actually in most situations the true states 119878 areunknown the observable states are 119885 and then a learningproblem can be defined as deriving a policy 120587mapping form119885 to 119860 For an obstacle avoidance task the problem can
Figure 1 Framework of learning form demonstration (LfD)
be described as finding the more general barrier-free path(policy) from the demonstrated barrier-free path (sample119863)Themain process for LfD framework is illustrated in Figure 1
22 Main Research Topics in LfD Nehaniv and Dautenhahn[25] summarized themain problems in LfD as four issues (1)What to imitate (2) How to imitate (3) When to imitate(4) Whom to imitate So far only the former two problemsare solved For the problem of robot obstacle avoidancewhat to imitate means the robot needs to extract the rightconstraints which can characterize the task and problem ofhow to imitate refers to generating a trajectory that satisfiesthe constraints obtained in the previous step
In order to solve the above problems (mainly the formertwo problems) there are three main ideas
(i) Directly get a map function 120587 = 119891(119911) from trainingset 119863 = (119911
119894 119886119894) where 119911 stands for the observed
state This idea treats the problem as supervisedlearning problem
(ii) Learn a system model namely the return functionand state transition probability distribution formtraining set and then derive a policy using MDPmethods
(iii) Learn the pre- or postcondition of each action andthen obtain control sequence using programmingmethods
For the obstacle avoidance problem through learning thebarrier-free path from teachers to produce a generalized pathcan be seen as a supervised regression problem GMM is aprobabilistic model which is suitable to model the naturalvariations in human and robot motionsTheoretically GMMcan model any probabilistic density functions (PDFs) byincreasing the number of Gaussian components so GMMhas strong encoding capacity Finally movements recogniz-ing predicting and generating can be integrated togetherwithin the same GMM and GMR encoding strategy So itis very favorable for us to use GMM encoding our priordemonstration information and use GMR as the solution tothe regression problem mentioned above Details of GMMtheory are described below
3 Gaussian Mixture Model
For a probability density estimation problem if in additionto the known observation variables there may exist some
unobservable latent variables but we know the distributionof the latent variable and the conditional distribution of theobservable variable given the latent variable then we can usemixture models to estimate the parameters of the distribu-tion so as to get the approximate probability distribution ofthe known data Before introducing GMM we deviate to talkabout Gaussian distribution
31 Marginal and Conditional Distribution of a MultivariateGaussian Distribution Assume that we have a random vari-able
119909 = [1199091
1199092
] (1)
in which 1199091isin R119903 119909
2isin R119904 119909 isin R119903+119904 119909 sim N(120583 Σ) where
120583 = [1205831
1205832
]
Σ = [Σ11
Σ12
Σ21
Σ22
]
(2)
And 1205831isin R119903 120583
2isin R119904 120583 isin R119903+119904 Σ
11isin R119903times119903 and Σ
12isin
R119903times119904 Noting that the covariance matrix is symmetric so wehave Σ
12= Σ21
119879Because the marginal distribution of a Gaussian is also
a Gaussian distribution it is easy to get that 119864[1199091] = 120583
1
cov(1199091) = Σ
11 119864[1199092] = 1205832 and cov(119909
2) = Σ
22 and we have
1199091sim N(120583
1 Σ11) and 119909
2sim N(120583
2 Σ22) As a step forward we
can prove 1199091| 1199092sim N(120583
1|2 Σ1|2
) where
1205831|2
= 1205831+ Σ12Σminus1
22(1199092minus 1205832)
Σ1|2
= Σ11
minus Σ12Σminus1
22Σ21
(3)
32 Formulation of GMM Assuming that we have a trainingset 119863 = 119909
(1) 119909(2)
119909(119898)
containing 119898 samples everysample 119909(119894) is a vectorWe are trying to find a density functionto describe the data with the least error which is equal tomaximizing the probability of the data In order to representthis probability we need to make some assumptions Firstwe assume that the latent variable 119911
(119894)sim Multinomial(120601) in
which 120601119895ge 0 sum
119896
119895=1120601119895= 1 besides we suppose 119909
(119894)| 119911(119894)
=
119895 sim N(120583119895 Σ119895) So we can model the data in our hands by
Figure 2 System architecture of obstacle avoidance learning
adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as
119897 (120601 120583 Σ) =
119898
sum
119894=1
log119901 (119909(119894) 120601 120583 Σ)
=
119898
sum
119894=1
log119896
sum
119911(119894)=1
119901 (119909(119894)
| 119911(119894) 120583 Σ) 119901 (119911
(119894) 120601)
(4)
Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)
33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows
E-step for each 119894 and 119895 compute the poster-probability of119911(119894)
iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange
4 The Architecture of ObstacleAvoidance Learning
Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general
Journal of Robotics 5
Start
GoalObstacle
x120579
120596
dro
drg
Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro
119889rg
represent location orientation and distance information of therobot respectively
trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail
41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target
We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use
(119909 120596 119889ro 119889
rg) to represent the sensor informationmentioned
above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task
In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909
119904 119909119905 120596 = 120596
119904 120596119905
119889ro
= 119889ro119904 119889
ro119905 and 119889
rg= 119889
rg119904 119889
rg119905 119909119904isin R119873times2 represent
location 120596 isin R represent orientation 119889ro119904
isin R119873times2 representdistance between robot and obstacles and 119889
rg119904
isin R119873times2
represent distance between robot and goal state Supposingcoordinates of the goal are 119900
where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities
rg119904
= 119904119894119895
minus 119904119894 (9)
42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data
43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585
in which 120587119896 120583119896 Σ119896 are model parameters representing
prior-probability mean value and variance respectively
6 Journal of Robotics
0
2
4
6x
(m)
minus1
0
1
120596(r
ads
)
0
2
4
dro
(m)
0
2
4
drg
(m)
100 2000t (s)
100 2000t (s)
100 2000t (s)
100 2000t (s)
(a)
0
2
4
6
x(m
)
50 100 150 2000t (s)
minus1
0
1
120596(r
ads
)
50 100 150 2000t (s)
0
2
4
dro
(m)
50 100 150 2000t (s)
0
2
4
drg
(m)
50 100 150 2000t (s)
(b)
Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing
To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587
119896 120583119896 Σ119896 that
is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which
is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I
119905 120585
O119905) which represents input and
output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum
119870
119894=1120587119894N(120583119894 Σ119894) and 120585
I119905
solve the density probability distribution for 120585O119905 Solution to
this problem is called Gaussian mixture regression Regard-ing 120585
I119905as query points 120585O
119905can be estimated using regression
methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585
119905 vectors
120583119896 and matrices Σ
119905can be written as
120585119905= [
120585I119896
120585O119896
]
120583119896= [
120583I119896
120583O119896
]
Σ119896= [
ΣI119896
ΣIO119896
ΣOI119896
ΣO119896
]
(11)
Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are
O119896(120585
I119905) = 120583
O119896+ Σ
OI119896
ΣIminus1119896
(120585I119905
minus 120583I119896)
ΣO
119896= Σ
O119896minus Σ
OI119896
ΣIminus1119896
ΣIO119896
(12)
Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585
O119905given 120585
I119905are
119875 (120585O119905| 120585
I119905) sim
119870
sum
119896=1
ℎ119896(120585
I119905)N (
O119896(120585
I119905) Σ
O
119896) (13)
Journal of Robotics 7
where
ℎ119896(120585
I119905) =
120587119894N (120585
I119905
| 120583I119894 Σ
I119894)
sum119870
119896=1120587119896N (120585
I119905
| 120583I119896 Σ
I119896)
(14)
Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution
119875 (120585O119905| 120585
I119905) = N (
O119905 Σ
O
119905) with
120583O119905=
119870
sum
119894=1
ℎ119894(120585
I119905) 120583
O119894(120585
I119905)
ΣO
119905=
119870
sum
119894=1
ℎ119894(120585
I119905) (Σ
O
119894+
O119894(120585
I119905)
O119894(120585
I119905)119879
)
minus O119905O119879119905
(15)
Substituting 120585I119905into 120585
O119905using equations described above
a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property
44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O
119905is equal to 120585
119904 and 119904 represents
spatial location information Using 119904represents the desired
trajectory and 120585119904represents the actual reached location In
order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance
119863minus1
sum
119894=1
120596119894(120585119904119894
minus 119904119894)2
= (120585119904minus 119904)119879
119882(120585119904minus 119904) (16)
Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909
120596
119889ro
119889rg
represent the desired trajectory the actualtracking performance 119875 can be expressed as
According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows
119871 (119909
119904 120596
119904 119889ro
119904 119889rg
119904 120582) = 119875 + 120582
119879(119909
119904minus 119889rg
119904) (19)
where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for
Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely
119882 = (ΣO119905)minus1
(23)
5 Simulation Experiments
Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that
8 Journal of Robotics
StartGoal
x
y
(a)
x
y
(b)
(c) (d)
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 1 Framework of learning form demonstration (LfD)
be described as finding the more general barrier-free path(policy) from the demonstrated barrier-free path (sample119863)Themain process for LfD framework is illustrated in Figure 1
22 Main Research Topics in LfD Nehaniv and Dautenhahn[25] summarized themain problems in LfD as four issues (1)What to imitate (2) How to imitate (3) When to imitate(4) Whom to imitate So far only the former two problemsare solved For the problem of robot obstacle avoidancewhat to imitate means the robot needs to extract the rightconstraints which can characterize the task and problem ofhow to imitate refers to generating a trajectory that satisfiesthe constraints obtained in the previous step
In order to solve the above problems (mainly the formertwo problems) there are three main ideas
(i) Directly get a map function 120587 = 119891(119911) from trainingset 119863 = (119911
119894 119886119894) where 119911 stands for the observed
state This idea treats the problem as supervisedlearning problem
(ii) Learn a system model namely the return functionand state transition probability distribution formtraining set and then derive a policy using MDPmethods
(iii) Learn the pre- or postcondition of each action andthen obtain control sequence using programmingmethods
For the obstacle avoidance problem through learning thebarrier-free path from teachers to produce a generalized pathcan be seen as a supervised regression problem GMM is aprobabilistic model which is suitable to model the naturalvariations in human and robot motionsTheoretically GMMcan model any probabilistic density functions (PDFs) byincreasing the number of Gaussian components so GMMhas strong encoding capacity Finally movements recogniz-ing predicting and generating can be integrated togetherwithin the same GMM and GMR encoding strategy So itis very favorable for us to use GMM encoding our priordemonstration information and use GMR as the solution tothe regression problem mentioned above Details of GMMtheory are described below
3 Gaussian Mixture Model
For a probability density estimation problem if in additionto the known observation variables there may exist some
unobservable latent variables but we know the distributionof the latent variable and the conditional distribution of theobservable variable given the latent variable then we can usemixture models to estimate the parameters of the distribu-tion so as to get the approximate probability distribution ofthe known data Before introducing GMM we deviate to talkabout Gaussian distribution
31 Marginal and Conditional Distribution of a MultivariateGaussian Distribution Assume that we have a random vari-able
119909 = [1199091
1199092
] (1)
in which 1199091isin R119903 119909
2isin R119904 119909 isin R119903+119904 119909 sim N(120583 Σ) where
120583 = [1205831
1205832
]
Σ = [Σ11
Σ12
Σ21
Σ22
]
(2)
And 1205831isin R119903 120583
2isin R119904 120583 isin R119903+119904 Σ
11isin R119903times119903 and Σ
12isin
R119903times119904 Noting that the covariance matrix is symmetric so wehave Σ
12= Σ21
119879Because the marginal distribution of a Gaussian is also
a Gaussian distribution it is easy to get that 119864[1199091] = 120583
1
cov(1199091) = Σ
11 119864[1199092] = 1205832 and cov(119909
2) = Σ
22 and we have
1199091sim N(120583
1 Σ11) and 119909
2sim N(120583
2 Σ22) As a step forward we
can prove 1199091| 1199092sim N(120583
1|2 Σ1|2
) where
1205831|2
= 1205831+ Σ12Σminus1
22(1199092minus 1205832)
Σ1|2
= Σ11
minus Σ12Σminus1
22Σ21
(3)
32 Formulation of GMM Assuming that we have a trainingset 119863 = 119909
(1) 119909(2)
119909(119898)
containing 119898 samples everysample 119909(119894) is a vectorWe are trying to find a density functionto describe the data with the least error which is equal tomaximizing the probability of the data In order to representthis probability we need to make some assumptions Firstwe assume that the latent variable 119911
(119894)sim Multinomial(120601) in
which 120601119895ge 0 sum
119896
119895=1120601119895= 1 besides we suppose 119909
(119894)| 119911(119894)
=
119895 sim N(120583119895 Σ119895) So we can model the data in our hands by
Figure 2 System architecture of obstacle avoidance learning
adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as
119897 (120601 120583 Σ) =
119898
sum
119894=1
log119901 (119909(119894) 120601 120583 Σ)
=
119898
sum
119894=1
log119896
sum
119911(119894)=1
119901 (119909(119894)
| 119911(119894) 120583 Σ) 119901 (119911
(119894) 120601)
(4)
Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)
33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows
E-step for each 119894 and 119895 compute the poster-probability of119911(119894)
iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange
4 The Architecture of ObstacleAvoidance Learning
Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general
Journal of Robotics 5
Start
GoalObstacle
x120579
120596
dro
drg
Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro
119889rg
represent location orientation and distance information of therobot respectively
trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail
41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target
We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use
(119909 120596 119889ro 119889
rg) to represent the sensor informationmentioned
above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task
In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909
119904 119909119905 120596 = 120596
119904 120596119905
119889ro
= 119889ro119904 119889
ro119905 and 119889
rg= 119889
rg119904 119889
rg119905 119909119904isin R119873times2 represent
location 120596 isin R represent orientation 119889ro119904
isin R119873times2 representdistance between robot and obstacles and 119889
rg119904
isin R119873times2
represent distance between robot and goal state Supposingcoordinates of the goal are 119900
where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities
rg119904
= 119904119894119895
minus 119904119894 (9)
42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data
43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585
in which 120587119896 120583119896 Σ119896 are model parameters representing
prior-probability mean value and variance respectively
6 Journal of Robotics
0
2
4
6x
(m)
minus1
0
1
120596(r
ads
)
0
2
4
dro
(m)
0
2
4
drg
(m)
100 2000t (s)
100 2000t (s)
100 2000t (s)
100 2000t (s)
(a)
0
2
4
6
x(m
)
50 100 150 2000t (s)
minus1
0
1
120596(r
ads
)
50 100 150 2000t (s)
0
2
4
dro
(m)
50 100 150 2000t (s)
0
2
4
drg
(m)
50 100 150 2000t (s)
(b)
Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing
To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587
119896 120583119896 Σ119896 that
is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which
is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I
119905 120585
O119905) which represents input and
output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum
119870
119894=1120587119894N(120583119894 Σ119894) and 120585
I119905
solve the density probability distribution for 120585O119905 Solution to
this problem is called Gaussian mixture regression Regard-ing 120585
I119905as query points 120585O
119905can be estimated using regression
methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585
119905 vectors
120583119896 and matrices Σ
119905can be written as
120585119905= [
120585I119896
120585O119896
]
120583119896= [
120583I119896
120583O119896
]
Σ119896= [
ΣI119896
ΣIO119896
ΣOI119896
ΣO119896
]
(11)
Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are
O119896(120585
I119905) = 120583
O119896+ Σ
OI119896
ΣIminus1119896
(120585I119905
minus 120583I119896)
ΣO
119896= Σ
O119896minus Σ
OI119896
ΣIminus1119896
ΣIO119896
(12)
Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585
O119905given 120585
I119905are
119875 (120585O119905| 120585
I119905) sim
119870
sum
119896=1
ℎ119896(120585
I119905)N (
O119896(120585
I119905) Σ
O
119896) (13)
Journal of Robotics 7
where
ℎ119896(120585
I119905) =
120587119894N (120585
I119905
| 120583I119894 Σ
I119894)
sum119870
119896=1120587119896N (120585
I119905
| 120583I119896 Σ
I119896)
(14)
Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution
119875 (120585O119905| 120585
I119905) = N (
O119905 Σ
O
119905) with
120583O119905=
119870
sum
119894=1
ℎ119894(120585
I119905) 120583
O119894(120585
I119905)
ΣO
119905=
119870
sum
119894=1
ℎ119894(120585
I119905) (Σ
O
119894+
O119894(120585
I119905)
O119894(120585
I119905)119879
)
minus O119905O119879119905
(15)
Substituting 120585I119905into 120585
O119905using equations described above
a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property
44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O
119905is equal to 120585
119904 and 119904 represents
spatial location information Using 119904represents the desired
trajectory and 120585119904represents the actual reached location In
order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance
119863minus1
sum
119894=1
120596119894(120585119904119894
minus 119904119894)2
= (120585119904minus 119904)119879
119882(120585119904minus 119904) (16)
Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909
120596
119889ro
119889rg
represent the desired trajectory the actualtracking performance 119875 can be expressed as
According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows
119871 (119909
119904 120596
119904 119889ro
119904 119889rg
119904 120582) = 119875 + 120582
119879(119909
119904minus 119889rg
119904) (19)
where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for
Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely
119882 = (ΣO119905)minus1
(23)
5 Simulation Experiments
Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that
8 Journal of Robotics
StartGoal
x
y
(a)
x
y
(b)
(c) (d)
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 2 System architecture of obstacle avoidance learning
adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as
119897 (120601 120583 Σ) =
119898
sum
119894=1
log119901 (119909(119894) 120601 120583 Σ)
=
119898
sum
119894=1
log119896
sum
119911(119894)=1
119901 (119909(119894)
| 119911(119894) 120583 Σ) 119901 (119911
(119894) 120601)
(4)
Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)
33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows
E-step for each 119894 and 119895 compute the poster-probability of119911(119894)
iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange
4 The Architecture of ObstacleAvoidance Learning
Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general
Journal of Robotics 5
Start
GoalObstacle
x120579
120596
dro
drg
Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro
119889rg
represent location orientation and distance information of therobot respectively
trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail
41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target
We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use
(119909 120596 119889ro 119889
rg) to represent the sensor informationmentioned
above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task
In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909
119904 119909119905 120596 = 120596
119904 120596119905
119889ro
= 119889ro119904 119889
ro119905 and 119889
rg= 119889
rg119904 119889
rg119905 119909119904isin R119873times2 represent
location 120596 isin R represent orientation 119889ro119904
isin R119873times2 representdistance between robot and obstacles and 119889
rg119904
isin R119873times2
represent distance between robot and goal state Supposingcoordinates of the goal are 119900
where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities
rg119904
= 119904119894119895
minus 119904119894 (9)
42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data
43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585
in which 120587119896 120583119896 Σ119896 are model parameters representing
prior-probability mean value and variance respectively
6 Journal of Robotics
0
2
4
6x
(m)
minus1
0
1
120596(r
ads
)
0
2
4
dro
(m)
0
2
4
drg
(m)
100 2000t (s)
100 2000t (s)
100 2000t (s)
100 2000t (s)
(a)
0
2
4
6
x(m
)
50 100 150 2000t (s)
minus1
0
1
120596(r
ads
)
50 100 150 2000t (s)
0
2
4
dro
(m)
50 100 150 2000t (s)
0
2
4
drg
(m)
50 100 150 2000t (s)
(b)
Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing
To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587
119896 120583119896 Σ119896 that
is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which
is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I
119905 120585
O119905) which represents input and
output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum
119870
119894=1120587119894N(120583119894 Σ119894) and 120585
I119905
solve the density probability distribution for 120585O119905 Solution to
this problem is called Gaussian mixture regression Regard-ing 120585
I119905as query points 120585O
119905can be estimated using regression
methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585
119905 vectors
120583119896 and matrices Σ
119905can be written as
120585119905= [
120585I119896
120585O119896
]
120583119896= [
120583I119896
120583O119896
]
Σ119896= [
ΣI119896
ΣIO119896
ΣOI119896
ΣO119896
]
(11)
Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are
O119896(120585
I119905) = 120583
O119896+ Σ
OI119896
ΣIminus1119896
(120585I119905
minus 120583I119896)
ΣO
119896= Σ
O119896minus Σ
OI119896
ΣIminus1119896
ΣIO119896
(12)
Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585
O119905given 120585
I119905are
119875 (120585O119905| 120585
I119905) sim
119870
sum
119896=1
ℎ119896(120585
I119905)N (
O119896(120585
I119905) Σ
O
119896) (13)
Journal of Robotics 7
where
ℎ119896(120585
I119905) =
120587119894N (120585
I119905
| 120583I119894 Σ
I119894)
sum119870
119896=1120587119896N (120585
I119905
| 120583I119896 Σ
I119896)
(14)
Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution
119875 (120585O119905| 120585
I119905) = N (
O119905 Σ
O
119905) with
120583O119905=
119870
sum
119894=1
ℎ119894(120585
I119905) 120583
O119894(120585
I119905)
ΣO
119905=
119870
sum
119894=1
ℎ119894(120585
I119905) (Σ
O
119894+
O119894(120585
I119905)
O119894(120585
I119905)119879
)
minus O119905O119879119905
(15)
Substituting 120585I119905into 120585
O119905using equations described above
a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property
44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O
119905is equal to 120585
119904 and 119904 represents
spatial location information Using 119904represents the desired
trajectory and 120585119904represents the actual reached location In
order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance
119863minus1
sum
119894=1
120596119894(120585119904119894
minus 119904119894)2
= (120585119904minus 119904)119879
119882(120585119904minus 119904) (16)
Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909
120596
119889ro
119889rg
represent the desired trajectory the actualtracking performance 119875 can be expressed as
According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows
119871 (119909
119904 120596
119904 119889ro
119904 119889rg
119904 120582) = 119875 + 120582
119879(119909
119904minus 119889rg
119904) (19)
where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for
Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely
119882 = (ΣO119905)minus1
(23)
5 Simulation Experiments
Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that
8 Journal of Robotics
StartGoal
x
y
(a)
x
y
(b)
(c) (d)
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro
119889rg
represent location orientation and distance information of therobot respectively
trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail
41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target
We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use
(119909 120596 119889ro 119889
rg) to represent the sensor informationmentioned
above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task
In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909
119904 119909119905 120596 = 120596
119904 120596119905
119889ro
= 119889ro119904 119889
ro119905 and 119889
rg= 119889
rg119904 119889
rg119905 119909119904isin R119873times2 represent
location 120596 isin R represent orientation 119889ro119904
isin R119873times2 representdistance between robot and obstacles and 119889
rg119904
isin R119873times2
represent distance between robot and goal state Supposingcoordinates of the goal are 119900
where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities
rg119904
= 119904119894119895
minus 119904119894 (9)
42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data
43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585
in which 120587119896 120583119896 Σ119896 are model parameters representing
prior-probability mean value and variance respectively
6 Journal of Robotics
0
2
4
6x
(m)
minus1
0
1
120596(r
ads
)
0
2
4
dro
(m)
0
2
4
drg
(m)
100 2000t (s)
100 2000t (s)
100 2000t (s)
100 2000t (s)
(a)
0
2
4
6
x(m
)
50 100 150 2000t (s)
minus1
0
1
120596(r
ads
)
50 100 150 2000t (s)
0
2
4
dro
(m)
50 100 150 2000t (s)
0
2
4
drg
(m)
50 100 150 2000t (s)
(b)
Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing
To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587
119896 120583119896 Σ119896 that
is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which
is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I
119905 120585
O119905) which represents input and
output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum
119870
119894=1120587119894N(120583119894 Σ119894) and 120585
I119905
solve the density probability distribution for 120585O119905 Solution to
this problem is called Gaussian mixture regression Regard-ing 120585
I119905as query points 120585O
119905can be estimated using regression
methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585
119905 vectors
120583119896 and matrices Σ
119905can be written as
120585119905= [
120585I119896
120585O119896
]
120583119896= [
120583I119896
120583O119896
]
Σ119896= [
ΣI119896
ΣIO119896
ΣOI119896
ΣO119896
]
(11)
Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are
O119896(120585
I119905) = 120583
O119896+ Σ
OI119896
ΣIminus1119896
(120585I119905
minus 120583I119896)
ΣO
119896= Σ
O119896minus Σ
OI119896
ΣIminus1119896
ΣIO119896
(12)
Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585
O119905given 120585
I119905are
119875 (120585O119905| 120585
I119905) sim
119870
sum
119896=1
ℎ119896(120585
I119905)N (
O119896(120585
I119905) Σ
O
119896) (13)
Journal of Robotics 7
where
ℎ119896(120585
I119905) =
120587119894N (120585
I119905
| 120583I119894 Σ
I119894)
sum119870
119896=1120587119896N (120585
I119905
| 120583I119896 Σ
I119896)
(14)
Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution
119875 (120585O119905| 120585
I119905) = N (
O119905 Σ
O
119905) with
120583O119905=
119870
sum
119894=1
ℎ119894(120585
I119905) 120583
O119894(120585
I119905)
ΣO
119905=
119870
sum
119894=1
ℎ119894(120585
I119905) (Σ
O
119894+
O119894(120585
I119905)
O119894(120585
I119905)119879
)
minus O119905O119879119905
(15)
Substituting 120585I119905into 120585
O119905using equations described above
a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property
44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O
119905is equal to 120585
119904 and 119904 represents
spatial location information Using 119904represents the desired
trajectory and 120585119904represents the actual reached location In
order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance
119863minus1
sum
119894=1
120596119894(120585119904119894
minus 119904119894)2
= (120585119904minus 119904)119879
119882(120585119904minus 119904) (16)
Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909
120596
119889ro
119889rg
represent the desired trajectory the actualtracking performance 119875 can be expressed as
According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows
119871 (119909
119904 120596
119904 119889ro
119904 119889rg
119904 120582) = 119875 + 120582
119879(119909
119904minus 119889rg
119904) (19)
where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for
Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely
119882 = (ΣO119905)minus1
(23)
5 Simulation Experiments
Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that
8 Journal of Robotics
StartGoal
x
y
(a)
x
y
(b)
(c) (d)
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing
To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587
119896 120583119896 Σ119896 that
is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which
is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I
119905 120585
O119905) which represents input and
output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum
119870
119894=1120587119894N(120583119894 Σ119894) and 120585
I119905
solve the density probability distribution for 120585O119905 Solution to
this problem is called Gaussian mixture regression Regard-ing 120585
I119905as query points 120585O
119905can be estimated using regression
methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585
119905 vectors
120583119896 and matrices Σ
119905can be written as
120585119905= [
120585I119896
120585O119896
]
120583119896= [
120583I119896
120583O119896
]
Σ119896= [
ΣI119896
ΣIO119896
ΣOI119896
ΣO119896
]
(11)
Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are
O119896(120585
I119905) = 120583
O119896+ Σ
OI119896
ΣIminus1119896
(120585I119905
minus 120583I119896)
ΣO
119896= Σ
O119896minus Σ
OI119896
ΣIminus1119896
ΣIO119896
(12)
Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585
O119905given 120585
I119905are
119875 (120585O119905| 120585
I119905) sim
119870
sum
119896=1
ℎ119896(120585
I119905)N (
O119896(120585
I119905) Σ
O
119896) (13)
Journal of Robotics 7
where
ℎ119896(120585
I119905) =
120587119894N (120585
I119905
| 120583I119894 Σ
I119894)
sum119870
119896=1120587119896N (120585
I119905
| 120583I119896 Σ
I119896)
(14)
Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution
119875 (120585O119905| 120585
I119905) = N (
O119905 Σ
O
119905) with
120583O119905=
119870
sum
119894=1
ℎ119894(120585
I119905) 120583
O119894(120585
I119905)
ΣO
119905=
119870
sum
119894=1
ℎ119894(120585
I119905) (Σ
O
119894+
O119894(120585
I119905)
O119894(120585
I119905)119879
)
minus O119905O119879119905
(15)
Substituting 120585I119905into 120585
O119905using equations described above
a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property
44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O
119905is equal to 120585
119904 and 119904 represents
spatial location information Using 119904represents the desired
trajectory and 120585119904represents the actual reached location In
order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance
119863minus1
sum
119894=1
120596119894(120585119904119894
minus 119904119894)2
= (120585119904minus 119904)119879
119882(120585119904minus 119904) (16)
Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909
120596
119889ro
119889rg
represent the desired trajectory the actualtracking performance 119875 can be expressed as
According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows
119871 (119909
119904 120596
119904 119889ro
119904 119889rg
119904 120582) = 119875 + 120582
119879(119909
119904minus 119889rg
119904) (19)
where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for
Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely
119882 = (ΣO119905)minus1
(23)
5 Simulation Experiments
Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that
8 Journal of Robotics
StartGoal
x
y
(a)
x
y
(b)
(c) (d)
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution
119875 (120585O119905| 120585
I119905) = N (
O119905 Σ
O
119905) with
120583O119905=
119870
sum
119894=1
ℎ119894(120585
I119905) 120583
O119894(120585
I119905)
ΣO
119905=
119870
sum
119894=1
ℎ119894(120585
I119905) (Σ
O
119894+
O119894(120585
I119905)
O119894(120585
I119905)119879
)
minus O119905O119879119905
(15)
Substituting 120585I119905into 120585
O119905using equations described above
a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property
44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O
119905is equal to 120585
119904 and 119904 represents
spatial location information Using 119904represents the desired
trajectory and 120585119904represents the actual reached location In
order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance
119863minus1
sum
119894=1
120596119894(120585119904119894
minus 119904119894)2
= (120585119904minus 119904)119879
119882(120585119904minus 119904) (16)
Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909
120596
119889ro
119889rg
represent the desired trajectory the actualtracking performance 119875 can be expressed as
According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows
119871 (119909
119904 120596
119904 119889ro
119904 119889rg
119904 120582) = 119875 + 120582
119879(119909
119904minus 119889rg
119904) (19)
where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for
Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely
119882 = (ΣO119905)minus1
(23)
5 Simulation Experiments
Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that
8 Journal of Robotics
StartGoal
x
y
(a)
x
y
(b)
(c) (d)
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them
4 62x (m)
0
1
2
3
4
5
y(m
)
(a)
4 62x (m)
0
1
2
3
4
5
y(m
)
(b)
Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2
we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations
Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)
and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2
to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35
51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance
Journal of Robotics 9
GMM
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
1 2 3 40
drgx (m)
(a)
GMR
0
1
2
3
4
y(m
)
4 62
x (m)
minus1
0
1
2
3
drg y
(m)
2 40
drgx (m)
(b)
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion
information 119889rg and both variables are 2-dimensional vec-
tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870
in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding
to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889
rg As can be seen from the figures 119889rg shares
similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so
10 Journal of Robotics
minus1
0
1120596
(rad
s)
10 200
t (s)
minus1
0
1
120596(r
ads
)
10 200
t (s)
10 200
t (s)
0
5
dro
(m)
10 200
t (s)
0
2
4
dro
(m)
10 200
t (s)
minus5
0
5
drg x
(m)
minus5
0
5
drg x
(m)
10 200
t (s)
minus5
0
5
10 200
t (s)10 200
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7
domean value and covariance which verify the validity of themethod indirectly
Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles
the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully
Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect
Journal of Robotics 11
GMM GMR
1
2
3
4
y(m
)
3 4 5 62
x (m)
1
2
3
4
y(m
)
3 4 5 62
x (m)
minus2
minus1
0
1
drg y
(m)
10 3 42
drgx (m)
10 3 42
drgx (m)
minus2
minus1
0
1
drg y
(m)
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2
52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity
119909
119904and angular velocity
120579
119904 Assuming we have a
desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute
119909
119904and
120579
119904 Iterating on
every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results
Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884
direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth
noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct
Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time
6 Conclusion
In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory
12 Journal of Robotics
minus2
0
2120596
(rad
s)
10 20 30 400
t (s)
minus2
0
2
120596(r
ads
)
10 20 30 400
t (s)
0
2
4
dro
(m)
10 20 30 400
t (s)10 20 30 400
t (s)
0
2
4
dro
(m)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
drg x
(m)
10 20 30 400
t (s)
minus5
0
5
10 20 30 400
t (s)10 20 30 400
t (s)
minus5
0
5
drg y
(m)
drg y
(m)
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 10 Encoding and retrieval for orientation and distance information in task 2
Expected pathActual path
2 3 4 5 6 71x (m)
05
1
15
2
25
3
35
4
y(m
)
Figure 11 Tracking performance using the derived policy
Journal of Robotics 13
Expected valueCalculated value
50 100 150 2000t (s)
50 100 150 2000t (s)
minus202
Vx
(ms
)
minus101
Vy
(ms
)
(a) Linear velocity
Expected valueCalculated value
50 100 150 2000t (s)
minus202
120596(r
ads
)
(b) Angular velocity
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
Figure 12 Comparisons between the expected angular velocity andcalculated value
trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms
For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved
Competing Interests
The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript
Acknowledgments
This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)
References
[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo
World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007
[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012
[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for
fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989
[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986
[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006
[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009
[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012
[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005
[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010
[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997
[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003
[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004
[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007
[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995
[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998
[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005
14 Journal of Robotics
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001
[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003
[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005
[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997
[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002
[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002
[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005
[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009
[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001