Rewards-Driven Control of Robot Arm by Decoding EEG Signals · 2016. 1. 27. · Rewards-Driven Control of Robot Arm by Decoding EEG Signals Ajay Kumar Tanwani 1, Jose´ del. R. Milla´n

Rewards-Driven Control of Robot Arm by Decoding EEG Signals

Ajay Kumar Tanwani1, José del. R. Millán2, Aude Billard1

Abstract— Decoding the user intention from non-invasiveEEG signals is a challenging problem. In this paper, westudy the feasibility of predicting the goal for controlling therobot arm in self-paced reaching movements, i.e., spontaneousmovements that do not require an external cue. Our proposedsystem continuously estimates the goal throughout a trialstarting before the movement onset by online classification andgenerates optimal trajectories for driving the robot arm to theestimated goal. Experiments using EEG signals of one healthysubject (right arm) yield smooth reaching movements of thesimulated 7 degrees of freedom KUKA robot arm in planarcenter-out reaching task with approximately 80% accuracy ofreaching the actual goal.

I. INTRODUCTION

The world around us is going to change markedly with theuse of wearable robotic devices assisting humans in everydaytasks. Brain-Machine Interfaces (BMIs) are envisioned tofacilitate this integration in the most ‘natural’ way. Decodingbrain signals for controlling these devices, however, poses allkind of challenges to existing machine learning and controltechniques due to the high-dimensional and non-stationarynature of the data along with the large variability acrossusers. Despite the efforts, there has been a little focus onunderstanding the high-level intention of the user in decodingthe brain signals; a fundamental characteristic for practicalimplementation of such devices.

This paper investigates the use of slow cortical EEGsignals in decoding the intention of the user for self-pacedreaching movements of a robot arm. Intention here refersto an early plan to move that represents a high-level statesuch as the desired goal to reach as compared to thelow-level muscle activations for executing the movement.Contrary to decoding the cue-based movements [1], [2],we consider self-paced reaching movements where the userspontaneously executes the movement without an externalcue. Such reaching movements tend to better encapsulatethe natural motor behaviour in humans. In this paper, wecontinuously estimate the current goal/intention throughoutthe trial starting prior to the movement onset. Previousstudies indicate the modulation of slow cortical EEG signalsby the intention to move [3]. The decoded goal is used in thereward function to generate optimal trajectories for drivingthe robot arm to the goal. Our proposed trajectory decoderis easy to learn and generalizes effectively to unseen parts

1A. K. Tanwani and A. Billard are with Learning Algorithms andSystems Laboratory (LASA), Ecole Polytechnique Federale de Lausanne(EPFL), Switzerland. 2 J. d. R. Millán is the head of the Defitech Chairon Non-Invasive Brain-Machine Interface (CNBI), Center for Neuropros-thetics in EPFL, Switzerland. {ajay.tanwani, jose.millan,aude.billard} at epfl.ch

of the robot workspace. The integrated framework combinesthe high-level goals encoded in EEG signals with low-levelmotion plans to control the robot arm in continuous taskspace. Our target application of this work is to use EEGsignals for direct motor control of patients with possiblysevere upper-limb disabilities.

II. MATERIALS & METHODS

A. Experiments

Experiments were designed to perform center-out planarreaching movements to four goal targets in cardinal direc-tions located 10 cm away from the center, while holding thePHANTOM robotic arm. Four subjects – two healthy andtwo stroke patients – participated in the experiment carriedout at the San Camillo Hospital, Venice, Italy. One patienthad left paretic arm with left cerebellar hemorrhagic strokesince 2 months; while other had right paretic arm sufferingfrom left nucleo-capsular stroke since 2 years. After thetarget was shown to the subject, the subject was asked to waitfor at least 2 seconds to perform a self-paced movement (see[3] for details of experimental set-up). For each arm, subjectsperformed three runs each containing 80 trials each (20 trialsper target). Trials were extracted ranging from 2 s before themovement onset until 1 s after the task. For brevity, we onlyreport results of the right arm of the first healthy subject inthis work.

The EEG and EOG signals were simultaneously recordedwith a portable BioSemi ActiveTwo system using 64 elec-trodes arranged in an extended 10/20 montage. EOG chan-nels were placed above nasion and below the outer canthiof both eyes in order to capture horizontal and vertical EOGcomponents. The kinematics data of the robotic arm wasrecorded at 100 Hz, while EEG signals were captured at2048 Hz and then downsampled to 256 Hz. Preprocessingsteps to analyse EEG data required Common Average Ref-erencing (CAR) procedure to remove the global backgroundactivity [5]. Moreover, only 34 EEG channels were selected,excluding the peripheral channels and those having highcorrelation with the EOG activity. EEG signals were thenpassed through a zero-phase low-pass Butterworth filter withcut-off frequency of 120 Hz, further down-sampled at 128 Hzand finally low-pass filtered at 1 Hz to extract slow corticalpotentials. Each EEG channel and kinematic signal wasnormalized to have zero-mean and unit-standard deviation.

B. Framework

1) Intention/Goal Decoder: To decode the intention/goalin the EEG signals, we perform the online classification insliding window of 250 ms that shifts by 62.5 ms within the

978-1-4244-7929-0/14/$26.00 ©2014 IEEE 1658

−0.5 0 0.5−0.5

0

0.5

−1 −0.5 0 0.5 1

Fig. 1: Evolving EEG channels activity in the time interval [−1 1] seconds

trial period of [−2 1] seconds. Note that we start to decodethe goal prior to the movement onset to minimize any delaysin controlling the arm (see [3] for details). For each of thesewindows, the features are selected separately using CanonicalVariant Analysis (CVA) with 5 fold cross-validation takingone EEG sample per window at the end. 10 EEG channelswith best discriminant power are selected in each windowto classify among the 4 target goals. For classification, EEGdata is further downsampled to 16 Hz taking into account 4samples of 10 EEG channels for a total of 40 features. LinearDiscriminant Analysis (LDA) [6] is then used for predictingthe goal estimate xg in every time window from the givenEEG feature vector. For the EEG feature vector representedby ut at time instant t, the classification of the goal xgt isbased on the probability of belonging to each of the goals:

xgt = f(ut) = arg maxi=1...4

P (C = x(i)g |ut) (1)

2) Trajectory Decoder: The goal of the trajectory decoderis to continuously generate the motion plans to drive therobot arm to the goal. In this paper, we represent this decoderwith a dynamical system of the form:

˙̄x = f(x̄) + ǫ (2)

where, f is a continuously differentiable function that mapsthe 2D-planar Cartesian position of the robot arm x toits Cartesian velocity ẋ. For simplicity, we transform thecoordinates to x̄ = x − xg to signify the change of allgoal positions to the origin of the transformed system. Theevolution of robot motion can be computed by integratingEq. 2. Let α ∈ Rn represent the parameters of the functionf . We are required to learn the parameters α such that therobot follows the intended movement of the user. To this end,we take a two-step methodology: 1) learn the initial functionfrom demonstrations of the hand kinematics recorded fromthe subjects using Programming by Demonstration (PbD)[7], and 2) optimize the function parameters for effectivegeneralization using Reinforcement Learning (RL) [8].

In the first stage, we use Support Vector Regression (SVR)to estimate the initial function fi given data samples {x̄, ˙̄x}from the experiments, represented as:

˙̄x = fi(x̄) = αTφ(x̄) + b (3)

where, α represents the weights of the support vectors, φ(x̄)is the projection of the data x̄ in the n−dimensional feature

space, and b is the constant bias. Note that each outputdimension is learned separately in this model. To speed upthe learning process, we downsample the kinematic data to 5Hz for a total of 750 samples corresponding to the right armof the first subject in the training set. Hyper-parameters of theSVR are obtained after grid-search with size of the epsilon-tube, ǫ = 0.5, width of the radial basis kernel functionγ = 0.5, and complexity parameter C = 1.

In the second stage, we modify the landscape of thelearned function to generate optimal trajectories in the wholestate space by maximizing the reward function. The rationalehere is to decode the movement effectively far from the train-ing data (see Fig. 4 for clarity). Moreover, optimization inthe second stage caters for the imperfection or sub-optimalityin the recorded demonstrations (for example, demonstrationsof stroke suffering subjects). We express the reward functionr(x̄) as:

r(x̄) = − w1x̄Tf x̄f − w2 ˙̄x

Tf˙̄xf − w3 ¨̄x

Tt¨̄xt (4)

where, w1 weighs the cost for distance from the goal/origin atthe end of the trial, w2 penalizes for any non-zero velocity atthe end of the trial, and w3 is responsible for ensuring smoothmovement in reaching the goal by minimizing the normof the acceleration vector. Weights of the reward functionafter manual tuning are: w1 = 5, w2 = 0.01, w3 =0.0001. Maximum velocity ẋmax is set to 30 cm/s2 andthe simulations are carried till t = 2 seconds to prolong thepenalty by w1 and w2 after the end of trial at t = 1 second.

Support vectors of the initial function act as basis functionsfor the optimized function fo in the second stage. Weightsof the support vectors α are optimized by stochastic gra-dient ascent on the value function, J(x̄) = 1

T

∑Tt=0 r(x̄).

More precisely, we add noise η sampled from multivariateGaussian with mean 0 and covariance matrix of σ2I withσ = 0.1 to the parameters α, evaluate the value function,J(α + η), from episodic roll-outs of the current optimizedfunction, ˙̄x = fo(x̄), and adjust the parameter vector in thedirection of increasing value function, i.e.,

∆α = β(J(α + η)− J(α)) (5)

where, β is a small step-size parameter set to 0.05 in ourexperiments. The procedure is repeated till the parametervector stops changing. In our experiments, the parametervector is improved for 1500 iterations which increases the

1659

−1 −0.75 −0.5 −0.25 0 0.25 0.5 0.75 1

0.1

0.3

0.5

0.7

0.9

Time (s)

Cla

ssif

icat

ion

Acc

ura

cyDirection Decoding

Fig. 2: Decoding goal direction from EEG signals of firsthealthy subject (right arm). Red line shows the chance level;green line indicates the time instant when the classificationaccuracy significantly exceeds the chance level; shaded re-gion shows the variation in accuracy over 5-folds.

value of the function parameters J(α) from −118.1 to−4.81.

In the proposed framework, the attractor of the optimizeddynamical system is shifted from the origin to the estimatedgoal from Eq. (1) which is updated after every time windowof 250 milliseconds. After the end of trial, the optimizeddynamical system moves the robot arm to the last estimatedgoal at t = 1 seconds. Mathematically, the optimizeddynamical system takes the form:

ẋ = fo(x̄ − xgt) (6)

III. RESULTS

A. Decoding Goal

To analyse the performance of the goal decoder fromEEG signals, we show the topographic plots of selectedchannels to depict their discriminatory power at differenttime instants starting 1 second before the movement onset

TABLE I: Performance comparison of initial and optimizeddynamical system using: MSE on the testing set; averagecorrelation in time between simulated and demonstratedposition trajectories on the testing set; end-point distancefrom the goal for different initial conditions

Trajectory MSE Correlation End-PointDecoder cm/s2 [0 1] Distance (cm)Initial

2.49 0.51 5.157SVR

Optimized- 0.23 0.09

SVR

in Fig. 1. As the exact time when movement intent occursin a self-paced movement is unclear, the plots can provideinsights about movement-related modulations in differentbrain regions during planning and how they evolve over time.It is seen that the activity is dominant in the frontal-parietalregions of brain consistent with earlier reported studies [3].

Fig. 2 reports the classification accuracy of goal decoderin the time window [−1 1] seconds. Classification accuracyis computed as the ratio between the sum of correctly clas-sified diagonal entries in the confusion matrix and the totalnumber of instances. The time instant when the classificationaccuracy significantly exceeds the chance level is used as a

−10 −5 0 5 10−10

−8

−6

−4

−2

0

2

4

6

8

10

x1

x2

Initial Dynamical System

Fig. 3: Performance of initial learned function with SVR.Black crosses indicate the initial positions, while greencircles denote the position at the end of the trial

−10 −5 0 5 10−10

−8

−6

−4

−2

0

2

4

6

8

10

x1

x2

Optimized Dynamical System

Fig. 4: Performance of optimized function with SVR. Blackcrosses indicate the initial positions, while green circlesdenote the position at the end of the trial. Different initialconditions converge to the goal

1660

Fig. 5: Simulated trajectories of KUKA robot performing center-out reaching task

metric to initiate the movement with the trajectory decoder.Chance level is calculated by training the classifier on arandomized permutation of the class labels of the trainingset. Results are then averaged across 10 iterations each with5 fold cross-validation. Best time for subject 1 is 687.5 mswith classification accuracy of 0.34 before the movementonset (marked with green line in Fig. 2). It is seen that theclassification accuracy gradually improves afterwards with apeak accuracy of 0.85 at 0.5 seconds. Further experimentsto evaluate the false positive rate, i.e., detecting intention tomove when there is actually no movement, is subject to ourfuture work.

B. Decoding Trajectories

We evaluate the performance of our trajectory decoderusing three metrics: 1) Mean-Square Error (MSE) on thetraining/testing set, 2) Correlation in time of the simulatedposition trajectories with the demonstrated ones, 3) Distanceto the goal at the end of the trial computed by simulating thesystem from 12 different initial conditions. Table I summa-rizes the performance of the initial and the optimized SVR.The initial dynamical system learned using SVR performswell in terms of MSE with training and testing error of2.66 and 2.49 cm/s2 respectively, and a high correlationin position of 0.51 with the demonstrated trajectories. Toevaluate the performance of the system far from the trainingdata, we sample 12 different initial points in the plane (shownin Fig. 3 with crosses) and integrate the system forward intime for a period of 2 seconds. As seen in Fig. 3, the initialdynamical system with SVR is not able to generalize awayfrom the training data yielding a high end-point distanceerror of 5.157 cm. Note that the the initial conditions inthe cardinal directions correspond to the training set. Onthe other hand, optimized SVR is able to drive the robotarm to the goal from all the sampled initial conditions (seeFig. 4). This comes at a cost of relatively low positioncorrelation of 0.23 suggesting the need to further improvethe reward function. This generalization is very desirable inour application since the user is expected to control the armfrom all parts of the state space.

In Fig. 5, we test the performance of the integrated systemon the simulated 7 degrees of freedom KUKA robotic arm.The optimized dynamical system starts to move the robot arm687.5 milliseconds before the movement onset and finallyguides the robot arm to the last estimated goal at the end ofthe trial. Across all the trials, the robot arm reaches the actualgoal with a net accuracy of 79.5% on average. The figure

shows simulated trajectories of the robotic arm reachingdifferent goal positions following the predicted goal fromthe intention decoder and the optimal motion plans from thetrajectory decoder.

IV. CONCLUSIONS

In this paper, we have presented a system that decodesthe intention of the user using non-invasive slow corticalEEG signals and generates optimal motion plans to drivethe robot arm to the goal. The most desirable propertiesof the system include detection of goal direction before themovement onset and generalization of the motion plans awayfrom the training data. In future, we would like to evaluatethe performance of the system when the user changes hisintention/goal direction during execution of the movement.It will also be interesting to incorporate the EEG signals inour trajectory decoder similar to the works in [9] and [10].

ACKNOWLEDGEMENT

Authors thank E. Lew, R. Chavarriaga and I. Iturratefor their helpful insights. This work is supported by theNational Center of Competence in Research in Robotics(NCCR Robotics).

REFERENCES

[1] S. Waldert, H. Preissl, E. Demandt, C. Braun, N. Birbaumer, A.Aertsen et. al. Hand movement direction decoded from MEG andEEG., Journal of Neuroscience, 28(4), pp. 1000 - 1008, 2008.

[2] S. Musallam, B.D. Corneil, B. Greger, H. Scherberger, R. A. Andersen,Cognitive control signals for neural prosthetics., Science, 305 (5681),pp. 258 - 262, 2004.

[3] E. Lew, R. Chavarriaga, S. Silvoni, and J. d. R. Millán, Detectionof self-paced reaching movement intention from eeg signals, FrontNeuroeng, 5 (13), 2012.

[4] E. Lew, R. Chavarriaga, S. Silvoni, and J. d. R Millán, Single trialprediction of self-paced reaching directions from EEG signals, FrontNeuroprosthesis, 2014. submitted

[5] O. Bertrand, F. Perrin, and J. Pernier, A theoretical justificationof the average reference in topographic evoked potential studies,Electroencephalography and Clinical Neurophysiology, 62, pp. 462 -464, 1985.

[6] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, NewYork: John Wiley, Section, 10 (1), 2001.

[7] A. Billard, and S. Calinon, and R. Dillmann, and S. Schaal, Survey:Robot Programming by Demonstration, Handbook of Robotics, chap-ter 59, 2008.

[8] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc-tion, MIT Press, 1998.

[9] T. J. Bradberry, R. J. Gentili, and J. L. Contreras-Vidal, Reconstruct-ing three-dimensional hand movements from noninvasive electroen-cephalographic signals, J. Neurosci., 30(9), pp. 3432 - 3437, 2010.

[10] N. J. Beuchat, R. Chavarriaga, S. Degallier, and J. d. R. Millán, OfflineDecoding of Upper Limb Muscle Synergies from EEG Slow CorticalPotentials, 35th Annual Conference on Engineering in Medicine andBiology Society, 2013.

1661

Rewards-Driven Control of Robot Arm by Decoding EEG Signals · 2016. 1. 27. · Rewards-Driven Control of Robot Arm by Decoding EEG Signals Ajay Kumar Tanwani 1, Jose´ del. R. Milla´n

Documents