-
Rewards-Driven Control of Robot Arm by Decoding EEG Signals
Ajay Kumar Tanwani1, José del. R. Millán2, Aude Billard1
Abstract— Decoding the user intention from non-invasiveEEG
signals is a challenging problem. In this paper, westudy the
feasibility of predicting the goal for controlling therobot arm in
self-paced reaching movements, i.e., spontaneousmovements that do
not require an external cue. Our proposedsystem continuously
estimates the goal throughout a trialstarting before the movement
onset by online classification andgenerates optimal trajectories
for driving the robot arm to theestimated goal. Experiments using
EEG signals of one healthysubject (right arm) yield smooth reaching
movements of thesimulated 7 degrees of freedom KUKA robot arm in
planarcenter-out reaching task with approximately 80% accuracy
ofreaching the actual goal.
I. INTRODUCTION
The world around us is going to change markedly with theuse of
wearable robotic devices assisting humans in everydaytasks.
Brain-Machine Interfaces (BMIs) are envisioned tofacilitate this
integration in the most ‘natural’ way. Decodingbrain signals for
controlling these devices, however, poses allkind of challenges to
existing machine learning and controltechniques due to the
high-dimensional and non-stationarynature of the data along with
the large variability acrossusers. Despite the efforts, there has
been a little focus onunderstanding the high-level intention of the
user in decodingthe brain signals; a fundamental characteristic for
practicalimplementation of such devices.
This paper investigates the use of slow cortical EEGsignals in
decoding the intention of the user for self-pacedreaching movements
of a robot arm. Intention here refersto an early plan to move that
represents a high-level statesuch as the desired goal to reach as
compared to thelow-level muscle activations for executing the
movement.Contrary to decoding the cue-based movements [1], [2],we
consider self-paced reaching movements where the userspontaneously
executes the movement without an externalcue. Such reaching
movements tend to better encapsulatethe natural motor behaviour in
humans. In this paper, wecontinuously estimate the current
goal/intention throughoutthe trial starting prior to the movement
onset. Previousstudies indicate the modulation of slow cortical EEG
signalsby the intention to move [3]. The decoded goal is used in
thereward function to generate optimal trajectories for drivingthe
robot arm to the goal. Our proposed trajectory decoderis easy to
learn and generalizes effectively to unseen parts
1A. K. Tanwani and A. Billard are with Learning Algorithms
andSystems Laboratory (LASA), Ecole Polytechnique Federale de
Lausanne(EPFL), Switzerland. 2 J. d. R. Millán is the head of the
Defitech Chairon Non-Invasive Brain-Machine Interface (CNBI),
Center for Neuropros-thetics in EPFL, Switzerland. {ajay.tanwani,
jose.millan,aude.billard} at epfl.ch
of the robot workspace. The integrated framework combinesthe
high-level goals encoded in EEG signals with low-levelmotion plans
to control the robot arm in continuous taskspace. Our target
application of this work is to use EEGsignals for direct motor
control of patients with possiblysevere upper-limb
disabilities.
II. MATERIALS & METHODS
A. Experiments
Experiments were designed to perform center-out planarreaching
movements to four goal targets in cardinal direc-tions located 10
cm away from the center, while holding thePHANTOM robotic arm. Four
subjects – two healthy andtwo stroke patients – participated in the
experiment carriedout at the San Camillo Hospital, Venice, Italy.
One patienthad left paretic arm with left cerebellar hemorrhagic
strokesince 2 months; while other had right paretic arm
sufferingfrom left nucleo-capsular stroke since 2 years. After
thetarget was shown to the subject, the subject was asked to
waitfor at least 2 seconds to perform a self-paced movement (see[3]
for details of experimental set-up). For each arm,
subjectsperformed three runs each containing 80 trials each (20
trialsper target). Trials were extracted ranging from 2 s before
themovement onset until 1 s after the task. For brevity, we
onlyreport results of the right arm of the first healthy subject
inthis work.
The EEG and EOG signals were simultaneously recordedwith a
portable BioSemi ActiveTwo system using 64 elec-trodes arranged in
an extended 10/20 montage. EOG chan-nels were placed above nasion
and below the outer canthiof both eyes in order to capture
horizontal and vertical EOGcomponents. The kinematics data of the
robotic arm wasrecorded at 100 Hz, while EEG signals were captured
at2048 Hz and then downsampled to 256 Hz. Preprocessingsteps to
analyse EEG data required Common Average Ref-erencing (CAR)
procedure to remove the global backgroundactivity [5]. Moreover,
only 34 EEG channels were selected,excluding the peripheral
channels and those having highcorrelation with the EOG activity.
EEG signals were thenpassed through a zero-phase low-pass
Butterworth filter withcut-off frequency of 120 Hz, further
down-sampled at 128 Hzand finally low-pass filtered at 1 Hz to
extract slow corticalpotentials. Each EEG channel and kinematic
signal wasnormalized to have zero-mean and unit-standard
deviation.
B. Framework
1) Intention/Goal Decoder: To decode the intention/goalin the
EEG signals, we perform the online classification insliding window
of 250 ms that shifts by 62.5 ms within the
978-1-4244-7929-0/14/$26.00 ©2014 IEEE 1658
-
−0.5 0 0.5−0.5
0
0.5
−1 −0.5 0 0.5 1
Fig. 1: Evolving EEG channels activity in the time interval [−1
1] seconds
trial period of [−2 1] seconds. Note that we start to decodethe
goal prior to the movement onset to minimize any delaysin
controlling the arm (see [3] for details). For each of
thesewindows, the features are selected separately using
CanonicalVariant Analysis (CVA) with 5 fold cross-validation
takingone EEG sample per window at the end. 10 EEG channelswith
best discriminant power are selected in each windowto classify
among the 4 target goals. For classification, EEGdata is further
downsampled to 16 Hz taking into account 4samples of 10 EEG
channels for a total of 40 features. LinearDiscriminant Analysis
(LDA) [6] is then used for predictingthe goal estimate xg in every
time window from the givenEEG feature vector. For the EEG feature
vector representedby ut at time instant t, the classification of
the goal xgt isbased on the probability of belonging to each of the
goals:
xgt = f(ut) = arg maxi=1...4
P (C = x(i)g |ut) (1)
2) Trajectory Decoder: The goal of the trajectory decoderis to
continuously generate the motion plans to drive therobot arm to the
goal. In this paper, we represent this decoderwith a dynamical
system of the form:
˙̄x = f(x̄) + ǫ (2)
where, f is a continuously differentiable function that mapsthe
2D-planar Cartesian position of the robot arm x toits Cartesian
velocity ẋ. For simplicity, we transform thecoordinates to x̄ = x
− xg to signify the change of allgoal positions to the origin of
the transformed system. Theevolution of robot motion can be
computed by integratingEq. 2. Let α ∈ Rn represent the parameters
of the functionf . We are required to learn the parameters α such
that therobot follows the intended movement of the user. To this
end,we take a two-step methodology: 1) learn the initial
functionfrom demonstrations of the hand kinematics recorded fromthe
subjects using Programming by Demonstration (PbD)[7], and 2)
optimize the function parameters for effectivegeneralization using
Reinforcement Learning (RL) [8].
In the first stage, we use Support Vector Regression (SVR)to
estimate the initial function fi given data samples {x̄, ˙̄x}from
the experiments, represented as:
˙̄x = fi(x̄) = αTφ(x̄) + b (3)
where, α represents the weights of the support vectors, φ(x̄)is
the projection of the data x̄ in the n−dimensional feature
space, and b is the constant bias. Note that each
outputdimension is learned separately in this model. To speed upthe
learning process, we downsample the kinematic data to 5Hz for a
total of 750 samples corresponding to the right armof the first
subject in the training set. Hyper-parameters of theSVR are
obtained after grid-search with size of the epsilon-tube, ǫ = 0.5,
width of the radial basis kernel functionγ = 0.5, and complexity
parameter C = 1.
In the second stage, we modify the landscape of thelearned
function to generate optimal trajectories in the wholestate space
by maximizing the reward function. The rationalehere is to decode
the movement effectively far from the train-ing data (see Fig. 4
for clarity). Moreover, optimization inthe second stage caters for
the imperfection or sub-optimalityin the recorded demonstrations
(for example, demonstrationsof stroke suffering subjects). We
express the reward functionr(x̄) as:
r(x̄) = − w1x̄Tf x̄f − w2 ˙̄x
Tf˙̄xf − w3 ¨̄x
Tt¨̄xt (4)
where, w1 weighs the cost for distance from the goal/origin
atthe end of the trial, w2 penalizes for any non-zero velocity
atthe end of the trial, and w3 is responsible for ensuring
smoothmovement in reaching the goal by minimizing the normof the
acceleration vector. Weights of the reward functionafter manual
tuning are: w1 = 5, w2 = 0.01, w3 =0.0001. Maximum velocity ẋmax
is set to 30 cm/s2 andthe simulations are carried till t = 2
seconds to prolong thepenalty by w1 and w2 after the end of trial
at t = 1 second.
Support vectors of the initial function act as basis
functionsfor the optimized function fo in the second stage.
Weightsof the support vectors α are optimized by stochastic
gra-dient ascent on the value function, J(x̄) = 1
T
∑Tt=0 r(x̄).
More precisely, we add noise η sampled from multivariateGaussian
with mean 0 and covariance matrix of σ2I withσ = 0.1 to the
parameters α, evaluate the value function,J(α + η), from episodic
roll-outs of the current optimizedfunction, ˙̄x = fo(x̄), and
adjust the parameter vector in thedirection of increasing value
function, i.e.,
∆α = β(J(α + η)− J(α)) (5)
where, β is a small step-size parameter set to 0.05 in
ourexperiments. The procedure is repeated till the parametervector
stops changing. In our experiments, the parametervector is improved
for 1500 iterations which increases the
1659
-
−1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1
0.1
0.3
0.5
0.7
0.9
Time (s)
Cla
ssif
icat
ion
Acc
ura
cyDirection Decoding
Fig. 2: Decoding goal direction from EEG signals of firsthealthy
subject (right arm). Red line shows the chance level;green line
indicates the time instant when the classificationaccuracy
significantly exceeds the chance level; shaded re-gion shows the
variation in accuracy over 5-folds.
value of the function parameters J(α) from −118.1 to−4.81.
In the proposed framework, the attractor of the
optimizeddynamical system is shifted from the origin to the
estimatedgoal from Eq. (1) which is updated after every time
windowof 250 milliseconds. After the end of trial, the
optimizeddynamical system moves the robot arm to the last
estimatedgoal at t = 1 seconds. Mathematically, the
optimizeddynamical system takes the form:
ẋ = fo(x̄ − xgt) (6)
III. RESULTS
A. Decoding Goal
To analyse the performance of the goal decoder fromEEG signals,
we show the topographic plots of selectedchannels to depict their
discriminatory power at differenttime instants starting 1 second
before the movement onset
TABLE I: Performance comparison of initial and
optimizeddynamical system using: MSE on the testing set;
averagecorrelation in time between simulated and
demonstratedposition trajectories on the testing set; end-point
distancefrom the goal for different initial conditions
Trajectory MSE Correlation End-PointDecoder cm/s2 [0 1] Distance
(cm)Initial
2.49 0.51 5.157SVR
Optimized- 0.23 0.09
SVR
in Fig. 1. As the exact time when movement intent occursin a
self-paced movement is unclear, the plots can provideinsights about
movement-related modulations in differentbrain regions during
planning and how they evolve over time.It is seen that the activity
is dominant in the frontal-parietalregions of brain consistent with
earlier reported studies [3].
Fig. 2 reports the classification accuracy of goal decoderin the
time window [−1 1] seconds. Classification accuracyis computed as
the ratio between the sum of correctly clas-sified diagonal entries
in the confusion matrix and the totalnumber of instances. The time
instant when the classificationaccuracy significantly exceeds the
chance level is used as a
−10 −5 0 5 10−10
−8
−6
−4
−2
0
2
4
6
8
10
x1
x2
Initial Dynamical System
Fig. 3: Performance of initial learned function with SVR.Black
crosses indicate the initial positions, while greencircles denote
the position at the end of the trial
−10 −5 0 5 10−10
−8
−6
−4
−2
0
2
4
6
8
10
x1
x2
Optimized Dynamical System
Fig. 4: Performance of optimized function with SVR. Blackcrosses
indicate the initial positions, while green circlesdenote the
position at the end of the trial. Different initialconditions
converge to the goal
1660
-
Fig. 5: Simulated trajectories of KUKA robot performing
center-out reaching task
metric to initiate the movement with the trajectory
decoder.Chance level is calculated by training the classifier on
arandomized permutation of the class labels of the trainingset.
Results are then averaged across 10 iterations each with5 fold
cross-validation. Best time for subject 1 is 687.5 mswith
classification accuracy of 0.34 before the movementonset (marked
with green line in Fig. 2). It is seen that theclassification
accuracy gradually improves afterwards with apeak accuracy of 0.85
at 0.5 seconds. Further experimentsto evaluate the false positive
rate, i.e., detecting intention tomove when there is actually no
movement, is subject to ourfuture work.
B. Decoding Trajectories
We evaluate the performance of our trajectory decoderusing three
metrics: 1) Mean-Square Error (MSE) on thetraining/testing set, 2)
Correlation in time of the simulatedposition trajectories with the
demonstrated ones, 3) Distanceto the goal at the end of the trial
computed by simulating thesystem from 12 different initial
conditions. Table I summa-rizes the performance of the initial and
the optimized SVR.The initial dynamical system learned using SVR
performswell in terms of MSE with training and testing error of2.66
and 2.49 cm/s2 respectively, and a high correlationin position of
0.51 with the demonstrated trajectories. Toevaluate the performance
of the system far from the trainingdata, we sample 12 different
initial points in the plane (shownin Fig. 3 with crosses) and
integrate the system forward intime for a period of 2 seconds. As
seen in Fig. 3, the initialdynamical system with SVR is not able to
generalize awayfrom the training data yielding a high end-point
distanceerror of 5.157 cm. Note that the the initial conditions
inthe cardinal directions correspond to the training set. Onthe
other hand, optimized SVR is able to drive the robotarm to the goal
from all the sampled initial conditions (seeFig. 4). This comes at
a cost of relatively low positioncorrelation of 0.23 suggesting the
need to further improvethe reward function. This generalization is
very desirable inour application since the user is expected to
control the armfrom all parts of the state space.
In Fig. 5, we test the performance of the integrated systemon
the simulated 7 degrees of freedom KUKA robotic arm.The optimized
dynamical system starts to move the robot arm687.5 milliseconds
before the movement onset and finallyguides the robot arm to the
last estimated goal at the end ofthe trial. Across all the trials,
the robot arm reaches the actualgoal with a net accuracy of 79.5%
on average. The figure
shows simulated trajectories of the robotic arm
reachingdifferent goal positions following the predicted goal
fromthe intention decoder and the optimal motion plans from
thetrajectory decoder.
IV. CONCLUSIONS
In this paper, we have presented a system that decodesthe
intention of the user using non-invasive slow corticalEEG signals
and generates optimal motion plans to drivethe robot arm to the
goal. The most desirable propertiesof the system include detection
of goal direction before themovement onset and generalization of
the motion plans awayfrom the training data. In future, we would
like to evaluatethe performance of the system when the user changes
hisintention/goal direction during execution of the movement.It
will also be interesting to incorporate the EEG signals inour
trajectory decoder similar to the works in [9] and [10].
ACKNOWLEDGEMENT
Authors thank E. Lew, R. Chavarriaga and I. Iturratefor their
helpful insights. This work is supported by theNational Center of
Competence in Research in Robotics(NCCR Robotics).
REFERENCES
[1] S. Waldert, H. Preissl, E. Demandt, C. Braun, N. Birbaumer,
A.Aertsen et. al. Hand movement direction decoded from MEG andEEG.,
Journal of Neuroscience, 28(4), pp. 1000 - 1008, 2008.
[2] S. Musallam, B.D. Corneil, B. Greger, H. Scherberger, R. A.
Andersen,Cognitive control signals for neural prosthetics.,
Science, 305 (5681),pp. 258 - 262, 2004.
[3] E. Lew, R. Chavarriaga, S. Silvoni, and J. d. R. Millán,
Detectionof self-paced reaching movement intention from eeg
signals, FrontNeuroeng, 5 (13), 2012.
[4] E. Lew, R. Chavarriaga, S. Silvoni, and J. d. R Millán,
Single trialprediction of self-paced reaching directions from EEG
signals, FrontNeuroprosthesis, 2014. submitted
[5] O. Bertrand, F. Perrin, and J. Pernier, A theoretical
justificationof the average reference in topographic evoked
potential studies,Electroencephalography and Clinical
Neurophysiology, 62, pp. 462 -464, 1985.
[6] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
classification, NewYork: John Wiley, Section, 10 (1), 2001.
[7] A. Billard, and S. Calinon, and R. Dillmann, and S. Schaal,
Survey:Robot Programming by Demonstration, Handbook of Robotics,
chap-ter 59, 2008.
[8] R. S. Sutton and A. G. Barto, Reinforcement Learning: An
Introduc-tion, MIT Press, 1998.
[9] T. J. Bradberry, R. J. Gentili, and J. L. Contreras-Vidal,
Reconstruct-ing three-dimensional hand movements from noninvasive
electroen-cephalographic signals, J. Neurosci., 30(9), pp. 3432 -
3437, 2010.
[10] N. J. Beuchat, R. Chavarriaga, S. Degallier, and J. d. R.
Millán, OfflineDecoding of Upper Limb Muscle Synergies from EEG
Slow CorticalPotentials, 35th Annual Conference on Engineering in
Medicine andBiology Society, 2013.
1661