Top Banner
Biol Res 40: 415-437, 2007 BR Classification methods for ongoing EEG and MEG signals MICHEL BESSERVE 1, 2 , KARIM JERBI * , FRANCOIS LAURENT 1, 2 , SYLVAIN BAILLET 1,2 , JACQUES MARTINERIE 1, 2 and LINE GARNERO 1, 2 1 CNRS, UPR 640-LENA, Laboratoire Meurosciencies Cognitives el Imagerie Cérébrale, 75013 Paris Cedex 13, France 2 UPMC Univ Paris 06, F-75005, Paris, France * Current affiliation: LPPA, CNRS-URM7152, Collège de France, Paris, & Brain Dynamics and Cognition, INSERM U821, Lyon, France ABSTRACT Classification algorithms help predict the qualitative properties of a subject’s mental state by extracting useful information from the highly multivariate non-invasive recordings of his brain activity. In particular, applying them to Magneto-encephalography (MEG) and electro-encephalography (EEG) is a challenging and promising task with prominent practical applications to e.g. Brain Computer Interface (BCI). In this paper, we first review the principles of the major classification techniques and discuss their application to MEG and EEG data classification. Next, we investigate the behavior of classification methods using real data recorded during a MEG visuomotor experiment. In particular, we study the influence of the classification algorithm, of the quantitative functional variables used in this classifier, and of the validation method. In addition, our findings suggest that by investigating the distribution of classifier coefficients, it is possible to infer knowledge and construct functional interpretations of the underlying neural mechanisms of the performed tasks. Finally, the promising results reported here (up to 97% classification accuracy on 1-second time windows) reflect the considerable potential of MEG for the continuous classification of mental states. Key terms: brain computer interface, electroencephalography, magnetoencephalography, visuomotor control, Support Vector Machine. Corresponding author: Michel Besserve, CNRS UPR 640-LENA 47, bld de l’Hôpital 75013 Paris Cedex phone: +33 1 42 16 11 72 fax : +33 1 45 86 25 37 email : [email protected] Received: June 1, 2007. Accepted: April 2, 2008 1. INTRODUCTION The non-invasive detection of task-related neurophysiological changes occurring within the human brain is a significant challenge in biomedical engineering. There is a growing interest in using classification techniques to estimate the mental state of a subject, related to a performed task, from multivariate brain functional imaging signals such as functional magnetic resonance imaging (fMRI) (Carlson et al., 2003; Laconte et al., 2006; Haynes and Rees, 2006) or magneto- or electroencephalography (MEG/EEG) (Kubler et al., 2001; Wolpaw et al., 2002; Lal et al., 2005; Lotte et al., 2007) from ongoing, single-trial epochs. In principle, the related algorithms are statistical tools that can be trained to estimate a qualitative variable, the class label , from a set of quantitative variables. Therefore, classification techniques aim at predicting the qualitative mental state of the human brain and come in addition to the estimation of functional brain responses per se. There are two major reasons why these methods should be considered: first, classification tools let us evaluate the predictive power of functional signals and thus quantify the information they convey about the tasks being performed. Classification methods come in complement to massively univariate statistical tools like Statistical Parametric Mapping (Frackowiak et al., 1997) by mining the information contents of
24

Classification methods for ongoing EEG and MEG signals

May 15, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Classification methods for ongoing EEG and MEG signals

415BESSERVE ET AL. Biol Res 40, 2007, 415-437Biol Res 40: 415-437, 2007 BRClassification methods for ongoingEEG and MEG signals

MICHEL BESSERVE1, 2, KARIM JERBI*, FRANCOIS LAURENT1, 2,SYLVAIN BAILLET1,2, JACQUES MARTINERIE1, 2 and LINE GARNERO1, 2

1 CNRS, UPR 640-LENA, Laboratoire Meurosciencies Cognitives el Imagerie Cérébrale, 75013Paris Cedex 13, France2 UPMC Univ Paris 06, F-75005, Paris, France* Current affiliation: LPPA, CNRS-URM7152, Collège de France, Paris, & Brain Dynamics and Cognition,INSERM U821, Lyon, France

ABSTRACT

Classification algorithms help predict the qualitative properties of a subject’s mental state by extracting usefulinformation from the highly multivariate non-invasive recordings of his brain activity. In particular, applyingthem to Magneto-encephalography (MEG) and electro-encephalography (EEG) is a challenging and promisingtask with prominent practical applications to e.g. Brain Computer Interface (BCI). In this paper, we firstreview the principles of the major classification techniques and discuss their application to MEG and EEGdata classification. Next, we investigate the behavior of classification methods using real data recorded duringa MEG visuomotor experiment. In particular, we study the influence of the classification algorithm, of thequantitative functional variables used in this classifier, and of the validation method. In addition, our findingssuggest that by investigating the distribution of classifier coefficients, it is possible to infer knowledge andconstruct functional interpretations of the underlying neural mechanisms of the performed tasks. Finally, thepromising results reported here (up to 97% classification accuracy on 1-second time windows) reflect theconsiderable potential of MEG for the continuous classification of mental states.

Key terms: brain computer interface, electroencephalography, magnetoencephalography, visuomotor control,Support Vector Machine.

Corresponding author: Michel Besserve, CNRS UPR 640-LENA 47, bld de l’Hôpital 75013 Paris Cedex phone: +33 1 42 16 11 72fax : +33 1 45 86 25 37 email : [email protected]

Received: June 1, 2007. Accepted: April 2, 2008

1. INTRODUCTION

The non-invasive detection of task-relatedneurophysiological changes occurring withinthe human brain is a significant challenge inbiomedical engineering. There is a growinginterest in using classification techniques toestimate the mental state of a subject, relatedto a performed task, from multivariate brainfunctional imaging signals such as functionalmagnetic resonance imaging (fMRI)(Carlson et al., 2003; Laconte et al., 2006;Haynes and Rees, 2006) or magneto- orelectroencephalography (MEG/EEG)(Kubler et al., 2001; Wolpaw et al., 2002;Lal et al., 2005; Lotte et al., 2007) fromongoing, single-trial epochs. In principle, therelated algorithms are statistical tools that

can be trained to estimate a qualitativevariable, the class label, from a set ofquantitative variables. Therefore,classification techniques aim at predictingthe qualitative mental state of the humanbrain and come in addition to the estimationof functional brain responses per se.

There are two major reasons why thesemethods should be considered: f irst ,classification tools let us evaluate thepredictive power of functional signals andthus quantify the information they conveyabout the tasks being performed.Classif icat ion methods come incomplement to massively univariatestatistical tools like Statistical ParametricMapping (Frackowiak et al., 1997) bymining the information contents of

Page 2: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437416

mult idimensional s ignals . Theseapproaches are more powerful than theunivariate statistical tests since the formerseek information in many variables at thesame time whereas the latter can onlyprocess each variable independently(Carlson et al., 2003). The second reasonwhy classification tools are pertinent toneuroscience is their potential applicationsto real-time prediction of brain states.Their most popular applicat ion isundoubtedly Brain Computer Interfaces(BCI) which has witnessed considerableprogress during the past decade (Kubler etal., 2001, Wolpaw et al. 2002; Lebdev andNicolelis, 2006). A further promising real-time application is neurofeedback for thetreatment of motor or cognit ivedysfunctions such as Attention- Deficit/Hyperactivity Disorders (ADHD) (Fuchs2003, Strehl 2006).

For some of these applications,exploiting directly the invasive recordingsof neural populations can be more efficientthan using fMRI, EEG or MEG. Inparticular, invasive neural interfaces haveknown considerable progress (Taylor et al.,2002; Wessberg and Nicolelis, 2004); but alot of work remains to be done to achieve asafe and stable invasive BCI (Lebdev andNicolelis , 2006). Thus, even if theyprovide a slower information transfer,devices using non-invasive imagingmodalities are still competitive for thisapplication. Among these imagingtechniques, Magnetoencephalography(MEG) and electroencephalography (EEG)provide the highest temporal resolutionmaking these techniques ideal for theaforementioned real-time applications.Several parameters, however, need to becarefully taken into account when usingclassification methods with magneto-electroencephalographic (MEEG) data.First, the choice and design of theexperimental paradigm is an importantfactor that will partially determinesubsequent data processing procedures. Themost commonly used paradigms involvestimulus-locked subject responses wherethe spatial and temporal properties of brainresponses linked to a stimulus areinvestigated and referenced in direct

relationship to the stimulus onset. In thecontext of BCI, a synchronous set-up isdirected toward evoked brain responsestriggered by an external stimulus. Bycontrast, asynchronous BCI set-ups do notrequire any external stimulation to infermental states. Therefore, asynchronousapproaches are challenging because theyare expected to yield a less tedious andmore natural and efficient communicationdevice. In general, such systems operatewith data acquired while subjects alternatebetween two (or more) sustained mentaltasks. A major difference betweencontinuous data acquisition paradigms andclassical stimulus-locked experiments isthat the task is maintained on a longer timespan (typically tens of seconds).Henceforth, we will refer to this type ofparadigm as a continuous experiment.

The two types of experimentalparadigms, stimulus-locked and continuous,involve different types of data processingand investigate different quantitativemeasures estimated from the signals.Nevertheless, it is noteworthy that whilesome techniques like the analysis of evokedpotentials are only useful for synchronousBCI applications (Farwell and Donchin,1988; Wolpaw et al., 2002), other task-related phenomena, like sustainedoscillatory activities, can be used both withsynchronous and asynchronous paradigms(Keirn et al., 1990; Anderson et al., 1998;Pfurtscheller et al. 1997; Borisoff et al.,2004; Milan et al., 2004; Scherer et al.,2004). To elucidate these task-related brainrhythms, spectral analysis has become astandard procedure with two measurements:(a) power spectral density, (which providesan estimation of signal amplitude atdifferent frequencies or within variousfrequency bands) and (b) long- range signalcorrelation in the frequency domain (whichcan be measured via coherence or phasesynchrony). Whether they are performed atthe sensor or at the cortical level, powerestimations are assumed to reflect thesynchronization of local neural populations(Singer et al. 1999) while the long-rangesignal coupling represents long-distanceinteractions between two signals recordedin distinct brain regions.

Page 3: Classification methods for ongoing EEG and MEG signals

417BESSERVE ET AL. Biol Res 40, 2007, 415-437

The primary objective of this paper isto provide an overview of the majorprinciples of classification approachesapplied to EEG and MEG data, andsecondly to investigate the practicalbehavior of such techniques with realMEG data acquired during a continuousvisuomotor task. The results shed somelight on important issues such as theselection of a classifier and the evaluationof i ts performances, as well as theidentification of physiologically relevantdiscrimination features for continuoussensorimotor MEEG data.

The layout of the paper is as follows: InSection 2, we review the general principlesof classification methods by consideringdifferent classification schemes and variousmeasures generally extracted from MEEGrecordings for the purpose of classification.In Section 3, we report on the application ofclassification tools to MEG data recordedduring a continuous visuomotor task and weconclude by discussing the utili ty,interpretation and limitations of the resultsobtained.

2. USING CLASSIFICATION METHODS FOR MEEG

2.1 The different steps

Applying statistical analysis tools toMEEG signals usually requires temporalsegmentation of the recordings accordingto precise events of the experimentalparadigm. For stimulus-locked paradigms,segmentation bounds are set according tothe st imulus onset . In part icular , areference time window preceding thestimulus onset is usually taken as areference epoch (or baseline) to which thetemporal evolution of the cortical signalsafter s t imulus onset is compared.Obviously, this scheme can be temporallyreversed if the reference point is thesubject’s response instead of a stimulus (asin movement preparation or anticipationparadigms). Conversely, if continuousparadigms may contain specific events (forexample the beginning of the task), theydo not systematically impose a temporalordering of the whole recording. In this

case, if we assume that signal propertiesare largely invariant over the duration of agiven continuous behavioral condition, along-lasting epoch corresponding to thecontinuous task can be segmented intosuccessive t ime windows. Thissegmentation provides a large number oftime windows which can then be processedas multiple trials of the same experimentalcondition (although all reference-free).

To characterize brain activities fromrecordings, several features are thencomputed from the segmented data (detailsin next section). These features allowrepresenting each segment as a point in anormed vector space, whose entries arefeature values. Therefore, by applying thisquantification procedure to multiple trials,we obtain a set of distributed points thatmay form several ‘clouds’ in the multi-dimensional feature space (Fig. 1). Eachcloud of points corresponding to the dataof a specif ic behavioral condit ionrepresents a class. The next step of theclassif icat ion procedure consists inexploiting a first dataset as a «training set»to learn to partition the feature space in away that optimally dist inguishes thevarious classes from each other. Most ofthe widely used techniques (describedbelow) fit a separation surface, called thedecision boundary , between domainscorresponding to each class. Once theclassifier is “trained”, new trials can thenbe classified as belonging to one class oranother according to their position in thefeature space. In this study, we will restrictthe discussion to binary classificationwhere only two classes have to bedistinguished. In this case, the decisionboundary is usually determined using adiscriminant function (Duda and Hart,2000) f * (x) with the following rule: if f *(x) > 0 then the trial associated withfeature vector x is classified in class 1,else the trial is attributed to class 2. Wewill briefly present a few classificationmethods in addition to common problemsthat ar ise from these approaches. Aschematic overview of the MEEGclassif icat ion process in the binaryclassif icat ion case of a continuousparadigm is depicted in Figure 1.

Page 4: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437418

2.2. Characterizing brain activity fromMEEG measurements

To quantify task-related cortical activationsfrom recordings, the ongoing signals areusually segmented into time windows onwhich various computations are performed.Most of the estimated parameters can bedivided into families depending on thespatial extent of the physiologicalphenomenon under investigation: local orlong range. Local measurements generallyprovide a measure of task-related activitypicked up at a single sensor or electrode.This measure is assumed to reflect themodulation of neural activity at a focal brainarea located next to the detector. By contrast,measurements of long distance interactionsquantify the coupling between signalsdetected at two distinct sensors, possibly,although not necessarily, revealing aninformation transfer between two distantneural ensembles. The physiologicalinterpretation of such long-range phenomena

is the focus of a large body of literatureranging from realistic models of neuralnetworks to experimental evidence obtainedusing both invasive and non-invasiverecordings, both in human and non-humanprimates (Rodriguez et al., 1999; Varela etal., 2001; Brovelli et al., 2004.; Jerbi et al.,2007; Lachaux et al., 1999).

Given that the electromagnetic activityrecorded on the scalp is an indirect andattenuated signature of the underlying neuralprocesses, the main difference betweenMEEG and invasive recording techniques, isthat the tasks-related effects should besufficiently strong if they are to be observednon-invasively on the scalp surface. Thestrength of these signals presumably dependson various parameters including the Signal-to-Noise ratio of the system, the size of theneural populations involved, theexperimental paradigm and possibly thealertness or motivation of the subject. Tohelp detecting these signals, a large numberof signal processing techniques have been

Figure 1: Principle of a binary classification of MEEG recordings for a continuous experiment.First, signals are segmented in successive time windows, then features are computed representingeach time window as a point in a multidimensional space. Then, the classification algorithmseparates the features’ space in two domains according to the sign of a discriminant function fof a time window according to its feature vector. * (x), enabling to predict the class

Page 5: Classification methods for ongoing EEG and MEG signals

419BESSERVE ET AL. Biol Res 40, 2007, 415-437

exploited in non-invasive BCI systems(Bashashati et al., 2007); but most of themmake use of the same physiologicalphenomena and similar measures to thosethat will be reviewed in the followingsections.

Local activity measurements

Given that stimulus locked experimentalparadigms have been widely used both forbasic cognitive studies and for BCIapplications, a large range of analysis toolshave been tested and developed over theyears to specifically analyze event relatedactivities. The best-studied family of eventrelated activities is Event Related Potentials(ERP). Even if, to date, the physiologicalprocesses underlying ERP generationremain debated, many types of ERPinduced by specific stimulation set-ups arerobustly observed and can be exploited. Forinstance, a virtual keyboard, called theP300 speller, has been developed based ona specific ERP arising after the onset of anattended stimulus (Barett, 1996). Due tobaseline fluctuations, ERP can only bedetected by averaging data obtained overmultiple trials. Although, in theory, thisrequirement makes ERP unusable forclassification of single trial MEEGactivities, this problem has been solved inthe P300 speller paradigm by cumulatingmultiple trials before classification (Farwelland Donchin, 1988).

A further quantification of localmodulations is given by measures known asEvent Related Desynchronization (ERD) andSynchronization (ERS) (Pfurtscheller andLopes da Silva, 1999). ERD (or ERS)represent a decrease (or an increase) ofoscillatory power at specific frequencies andin specific locations as compared to thebaseline power estimated during a referenceperiod (i.e. a time window prior to stimulusonset or alternatively an epoch during acontrol condition). Many studies report thepresence of task specific rhythms in MEG andEEG recordings; In particular, in the 8-12Hzband occipital alpha rhythms are implied invision whereas mu rhythms (~11 Hz) arepresent in the motor cortex in resting statesand are suppressed during the execution of a

movement (Salmelin and Hari, 1994). Inhigher frequency bands, beta rhythms (15-30Hz) have been shown to be suppressed duringmovement and to display a strong increaseroughly 1-2 seconds after the movement isterminated (a phenomenon known as betarebound). Finally, gamma band powermodulations (30-90 Hz) seem to play animportant role in wide range of complexcognitive processes (Tallon-Baudry, 1999).Historically, the presence of ERS and ERD inspecific frequency bands is interpreted as anactivation or deactivation of the neuralassemblies, but this theory is questionedaccording to recent studies: the ERD/ERSphenomena are highly dependent of the taskand the brain area (Lopes Da Silva, 2006) andrhythms in the same frequency band can bothcorrespond to deactivation or activation of theunderlying network (Pfurtscheller, 2006).ERD in the alpha and beta bands appearing inthe motor areas has been extensively used toclassify motor imagery tasks (Pfurtscheller,1997). The usual way to detect ERD/ERS isto make a baseline normalization and then tocompute inter trial averages (Graimann andPfurtscheller, 2006); however studies in BCIprotocol have demonstrated theirmeasurability in single trial (Pfurtscheller,97).

For continuous paradigms, the startingtime of the mental process is either not usedor not available, which makes ERP toodifficult to detect. Moreover the transientnature of ERP is not compatible with thecharacterization of continuous states.Besides, even though ERD/ERS is usuallyreferenced to a baseline, this phenomenoncan be equally considered as induced by anevent or as task related activation. Hence acontinuous task can also be characterizedby ERD/ERS measurements. Technicallyspeaking, the alternative to stimulus- lockedERD/ERS when processing continuousbrain activities is to cut original recordingsinto sliding windows and perform thespectral power measurements on thesewindows in multiple frequency bands.

The usual tools to extract frequencyinformation from EEG and MEG signals areFourier transform and band pass filtering.In this case, the feature considered is thespectral power in different frequency bands

Page 6: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437420

Thus, the resulting quantity can bothreflect the correlation between spectralpower of the signal (i.e. amplitude) and atemporal synchronization (phaserelationship) of the signals.

By contrast, phase synchronyconcentrates on assessing the commonphase information between two signals. It iscalculated by converting the filtered data ofeach channel i in the frequency band f, zi (t)into an analytical signal ai (t) by the Hilberttransform H with ai (t) = zi (t) + H [zi (t)].This allows to define the instantaneousphase ϕi (t) of the filtered signal (Pikovskyet al., 2001) by the modulus-argumentdecomposition ai (t) = |ai (t)|ei(ϕi (t)), where iis the complex number of modulus 1 and

argument —. The phase locking value

between channels 1 and 2 in a givenfrequency band f is thus:

, (2)

In a previous study (Rodriguez et al.,1999), we estimated phase synchrony onsurface EEG recordings performed during avisual task. Task-related phase synchronymodulations between widely separatedelectrodes showed significantly differentpatterns between “face perception” and “noface perception”. Additionally, in an intra-cerebral study (Le Van Quyen et al., 2005)we provided evidence for the use of phasesynchrony to anticipate epileptic seizures.Moreover, recent findings in the field ofBrain-Computer Interfaces suggest thatcombining spectral power measurementswith phase synchrony improves accuracywhen classifying EEG signals (Gysels etal., 2004,2005; Wei et al., 2007).

In the present study, we also investigatethe combination of power and phasesynchrony measures for classificationpurposes applied to MEG visuomotor data(details in Section 3). An argument forusing both these tools is that they can beconsidered within the same biologicalframework of “Resonant Cell Assemblies”

f obtained by calculating the average powerof the filtered signal zi (t) of channel iduring each time window of length T:

(1)

These tools can be seen as nonparametric spectral analysis methods, as noassumption is made on the power spectrumof the signals. Another method, which hasbeen proved to be efficient for EEGclassification, is autoregressive modeling(Anderson, 1998). This is a parametricapproach as it assumes that the powerspectrum of the signals is described by afew coefficients (typically 3 to 10).

Spectral measurements are also aconvenient tool to detect Steady StateVisual Evoked Potentials (SSVEP) inducedby repetitive visual stimulations in new BCIparadigms (Middendorf et al. 2000).

Long distance interactions

It is reasonable to assume that anymechanism for brain integration mustinvolve interactions between theparticipating local neural ensembles (Varelaet al., 2001). Quantifying these interactionshas been investigated via multipletechniques that quantify the presence ofcommon information in the signals thatoriginate from different brain areas.Assessing this functional relationshipbetween distant signals has been named“functional connectivity” by contrast toanatomical connectivity. Functionalconnectivity investigates the structure ofthe brain network at a particular timesample and is supposed to characterize thecooperation between many specializedregions. The most widely used measures forlong- range interaction quantification arecoherence and phase synchrony (Lachaux etal. 1999).

Coherence measures the interaction oftwo signals in a particular frequency bandby computing the correlation between theirFourier transforms Z1 (f) and Z2 (f) of twosignals at this frequency (bars in thefollowing equations indicate averaging oversuccessive data segments).

2

1

)(1

)( ∑=

=T

t

f

ii tzT

fP

f

f

f f f

f

ff f

π2

fCoh12 ( f ) =

Z1( f )Z2( f )

Z1( f )2 Z2( f )2

2

∑=

−T

t

tt fi

fje

T 1

))()((1 ϕϕiS12 ( f ) = ϕ f1

ϕ f2

Page 7: Classification methods for ongoing EEG and MEG signals

421BESSERVE ET AL. Biol Res 40, 2007, 415-437

(Varela, 1995). In recent years, it hasbecome evident that the concept of neuronalsynchrony can improve our characterizationand our understanding of the global aspectof brain dynamics. Indeed neuroscience hasprovided abundant evidence ofsynchronization at all levels of the nervoussystem (Varela et al. 2001), ranging fromindividual pairs of neurons to larger scaleswithin or in between different local neuralassemblies. Generally, oscillations arethought to reflect local synchronization ofsuch assemblies, the underlying principlebeing that it is the phase locked neuralactivity that predominantly gives rise tomeasurable scalp oscillations. By contrast,the long range phase synchronizationbetween two electrodes is assumed to be asignature of the phase locked activitybetween distinct brain regions. Thereforethese two measures are complementary incharacterizing brain activity and essentialfor the global integration hypothesis.

2.3. Classification algorithms

Data classification is an issue of primaryimportance in the field of data mining: itconsists in estimating a qualitative variablethe class label using a set of othervariables. Numerous algorithms andmethods have been proposed in order toachieve data classification and to improveits efficiency. Most standard classificationmethods can be broadly described as a two-step procedure which consists first of alearning phase followed by actualestimation of unknown class labels. Duringthe learning phase, a discriminant functionis fitted to a portion of the data generallycalled the training data set and then, in thesecond phase, the trained model (achievingoptimized separation on the training set) isused to discriminate between the classesfrom new data sets. Two mainconsiderations can distinguish classificationalgorithms:

– Their decision boundary can be eitherlinear or non-linear

– The fitted discriminant function canresult from a probabilistic model(model-based or generative algorithms)

or be expressed directly using the datapoints from the training set (data-basedor discriminative algorithms).

We will present three algorithmsillustrating these fundamentalcharacteristics.

Linear Discriminant Analysis (LDA)

Linear classification algorithms, whichrepresent a large portion of the availabletechniques, are based on fitting a lineardiscriminant function to the data. Thislinear decision is of the form f (x) = wx + bwhere b is the bias and w the normal vectorto the decision boundary f (x) = 0. Oneapproach to fit this linear function to thedata is Linear Discriminant Analysis whichamounts to fitting a Gaussian probabilisticmodel to the data (Duda and Hart, 2001). Inits simplest formulation, assuming thefeature vector x has a Gaussian distributionwith different class conditional means μ1and μ2 for class 1 and 2 respectively and thesame covariance matrix Σ1 = Σ2 = Σ in thetwo cases, the optimal decision function ina Bayesian framework is of the form:

An illustration of this technique on toydata is provided Fig. 2 a), showing theaffine decision boundary between the twoclouds of points of the training set. LDA isan interesting algorithm for many reasons:it is fast, simple to implement and tounderstand. As a consequence it has beenwidely used in the BCI community (Garett2003, Bostanov 2004, Scherer 2004).

Linear Support Vector Machine (SVM)

Contrary to LDA, that constructs aprobabilistic model for each class using alldata points, the underlying principle oflinear Support Vector Machines (SVM) isto minimize a linear separation errorfocusing on neighboring points of the linearseparation surface (Vapnik, 1998). Pointsthat lie sufficiently far from the separationsurface are thus ignored in the learning

( ) ( ) LDALDA

T

LDAbxwf +=⎟

⎞⎜⎝

⎛ +−−= −

2

211

21

* μμμμ xÓx

Page 8: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437422

process. More precisely, the lineardiscriminant function has the same form asin LDA: f (x) = wx + b; moreover, wedenote by (xi, yi) the i-th data point of thetraining set of feature vector x and of classlabel yi (yi = +1 if the point belongs to stateor class 1, yi = -1 if the point belongs toclass 2). A margin is defined as the domainbetween two hyperplanes around the

decision boundary of equations f (x) = +1and f (x) = -1. The SVM algorithm tends tomaximize the width of that margin (of value ), which amounts to minimizing the

Euclidian norm of the vector w, whilemaintaining most of the data points outsidethe margin. The optimal discriminantfunction f of:

Figure 2: Examples of decision boundaries obtained with the same training set and differentclassification algorithms: a) LDA classifier, b) SVM classifier, c) classifier with five nearestneighbors using the Euclidian distance.

w2

2minarg),(

2

,

**w

Cbwi

ibw

+= ∑∀

ε⎩⎨⎧

∀≥

∀−≥+><

i

ibxwy

i

iii

,0

,1),(

ε

εsubjected to

Page 9: Classification methods for ongoing EEG and MEG signals

423BESSERVE ET AL. Biol Res 40, 2007, 415-437

where is the separation error term,

is the regularization term and C is a user-defined regularization parameter (Muller etal., 2001). As represented Fig. 2 b) somepoints close to the surface are calledsupport vectors and define the boundariesof the margin on both sides of the surface.

K-Nearest Neighbor

A further data driven method is the k-nearest-neighbors (KNN) algorithm (Hastieet al. 2001). The principle of this thirdmethod is as follows: The class label of anunlabelled point is attributed to thepredominant class within the k nearestlabeled points belonging to the training set(k is defined by the experimenter). TheKNN algorithm does not rely on any modelbut rather on the metric used to assess thesimilarity of any pair of points (usually theEuclidian distance). Using theaforementioned notation (xi, yi) for thepoints in the learning set the discriminantfunction associated to KNN is of the form:

where Nk (x) is the index of the K nearestneighbors of x in the training set. From analgorithmic point of view, KNN does notrequire the iterative learning phase sincethere is no need to fit a model to the data;however the whole training data set has tobe kept in memory to classify the new data.Compared to linear methods the decisionboundary fKNN (x) = 0 is strongly nonlinearas shown on an example Fig. 2 c).

Of course, there are a wide variety ofother existing techniques that have beenused, like neural networks (Hiraiwa et al.,1990, Anderson et al., 1998, Haselsteiner etal. , 2000) or Hidden Markov Models(Obermeier et al., 2001). A completereview of the existing techniques exceedsthe focus of this paper and the reader canrefer to (Duda and Hart, 2001) or (Hastie etal., 2001) for a general view of the field ofPattern Recognition and (Lotte et al., 2007)for an exhaustive review of the algorithmsalready used for EEG based BCI.

2.4. Validating classifier performance

Measuring classification accuracy

Training a classifier aims at minimizingclassification error generally quantified asthe ratio of the number of well-classifiedsamples to the total number of samples. Inorder to interpret it, one should take thechance level as a reference. When trying toclassify one sample by randomlydistinguishing between two balanced classes,one may expect a mean accuracy of 50%. Byanalogy, dealing with a three class problem,chance should not go past about 33%asymptotically. However, given that such alevel of accuracy might well be consideredas insufficient, other criteria have beenintroduced. One criterion, known as kappa,represents the rate of well-classifiedexamples after subtraction of the asymptoticchance accuracy (Townsend et al., 2006). Afurther measure of classifier performance isArea Under the ROC curve (AUC). Thereceiver operating characteristic (ROC)curve represents the evolution of the falsepositive rate versus the true positive ratewhich results from thresholding adiscriminant function as a function ofthreshold values (Green and Swets., 1974,Duda and Hart, 2001). A value of thediscriminant function above the thresholdpredicts class 1, whereas a discriminantfunction below the threshold yields a class 2prediction. In practice, this measure allowsfor a control of the acceptable false negativerate of a diagnosis. In summary, the areaunder the ROC curve is a global measure ofthe discriminant power of the discriminantfunction, regardless of the threshold to bechosen and if the area under this curvereaches 1 (i.e. AUC=1) the discrimination isconsidered to be perfect.

Finally, for BCI applications, it isobvious that the performance of a classifieralso depends on the time-length of the datato be classified and the number of classes tobe recognized. For example, taking twoclassifiers with the same accuracies, if thefirst performs classification on only 1 s time-windows whereas the other uses 2 swindows, then the first can transmit 2 timesmore information during the same period of

2

2w∑

∀iiε

∑∈

=)(

* )(xNn

nKNN

K

yxf

Page 10: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437424

)1

)1(log)1(log(log

1222 −

−−++=

N

ppppN

TITR

Figure 3: Schematic evolution of learning error (E learn) and validation error (Eval) as a functionof the fit to learning data. After a regular decrease, E val reaches a minimum and a better fit tolearning data leads to a worse performance on validation data.

data (for example by using more and morenon-linear discriminant functions). Byincreasing this fit, classification error onthe training data will steadily decreasesince the algorithm’s aim is to minimize it.But the classifier error on another data set(validation data) will stop decreasing andeven start increasing when the increased fitto the training set no longer representsrobust properties of the data to classify: thisis called overfitting or overtraining. Thisscenario is illustrated by Fig. 3 where thevalidation error curve appears useful toselect the best step at which the learningprocess should better be stopped. Due tothis issue, classifier performance hasalways to be computed on a validation ortest dataset to assess generalization ability.

Although they appear under variousnames in the literature, such as crossvalidation, leave- one-out, leave-k-out, Jackknife (Stone, 1974, Hastie et al., 2001),existing generalization ability assessmentframeworks share a common principle.They all consist in depriving the training setof some data samples in order to providetwo kinds of sets, training sets andvalidation sets, with several samplings ofthe whole data set. For each sampling,trained classifier accuracies are estimatedwith the items of data that were not used bythe learning algorithm.

time. Similarly, for equal accuracies, a fourclass classifier sends two times more binaryinformation than a two class classifier withthe same number of trials. To allowcomparison of classification resultsassociated to these experimental parameters,the notion of information transfer rate (ITR)has been borrowed from communicationtheory (Shannon 1964) and can be computedusing classifier accuracy p, the number ofclasses N and T the time length of the trialaccording to:

This equation enables the comparison ofall kinds of BCI paradigms: according tothe literature the current reachable limit ofITR is approximately 25 bits/minute fornon invasive BCI’s (Krepki et al. 2006).

Generalization

Generalization is the crucial notion to trulyquantify the performance of a classifier: itcorresponds to the ability of a trainedclassifier to classify accurately new data(out of the training set). In order toillustrate that point, consider a set oflearning algorithms with a parameterenabling to increase their fit to the training

Page 11: Classification methods for ongoing EEG and MEG signals

425BESSERVE ET AL. Biol Res 40, 2007, 415-437

Complexity

A last issue the authors would like toaddress here is the problem of complexity,a notion related to multiple mathematicalquantities. In statistical learning,complexity quantifies the ability of a set ofdiscriminant functions to separate with aperfect accuracy a training set when theclouds of points of the respective classesare very intricate. According to thisdefinition, the discriminant functions of aKNN classifier are clearly more complexthan LDA’s affine functions. The morecomplex the discriminant functions of aclassifier are, the better they can fit trainingdata and thus the increasing fit to the datacan be interpreted as an increase incomplexity. As previously mentioned, anoverly good fit to the training data mayseverely impair generalization accuracy ofthe method. In particular, for affineclassifiers such as LDA or linear SVM,complexity increases with the dimension ofthe feature space. Thus, in MEEGclassification, a high dimension due to thehuge number of features used to describeeach segment of signal may causeproblems. This problem is sometimesreferred as the curse of dimensionality,arising from the fact that the moredimensions one introduces into theclassification problem, the more data isneeded to tackle the subsequent increase ofpossible variability (degrees of freedom).The point is that including an additionaldimension would require also adding manymore than just one sample to each data set.One should therefore pay attention to theconcept of feature selection, which allowsus to control the dimension of the featurespace. A standard way to select variables isunivariate selection: a reduced number offeatures are drawn from the whole setaccording to their individual discriminativepower. This discriminative power is usuallyquantified by a Fisher T test. Moreadvanced multivariate selection methodstake into account the whole set of featuressimultaneously in order to choose theoptimized subset of variables for theclassifier (Garett et al., 2003). Anothermethod that can be used to limit classifier

complexity is Principal ComponentAnalysis (PCA), which reduces the highdimensional data to a few components usedfor classification.

3. APPLICATION TO VISUOMOTOR MEG DATA

We will now illustrate and investigate theuse of such procedures by applying linearSVM classification to continuous epochs ofMEG data recorded during ongoingvisuomotor coordination and during restingstates. The aim of this analysis was toassess the ability of the classifier todifferentiate between the two conditionsand thereby demonstrate its application toreal data and indirectly investigate possiblephysiological processes reflected by theparameters shown to provide bestdiscrimination between the two conditions.

3.1 Data

Behavioral task

The visuomotor experiment requiredsubjects to continuously manipulate a track-ball to compensate the random rotations ofa cube projected on a display screen. Thevisuomotor (VM) task was alternated with aresting condition (R) during which thesubjects relaxed while looking at amotionless cube. The subjects were cued toswitch between the two conditions every 8to 12 s, yielding continuous epochs ofsteady-state MEG data. All subjects gaveinformed consent and the study wasapproved by the local medical ethicscommittee (Jerbi et al. 2007).

Recordings and Preprocessing

The cerebral activity was recorded with awhole-head MEG system (151 sensors,VSM MedTech Ltd.) and digitized at 1.25KHz. The acquired data was first low-passfiltered (100 Hz cut-off), down-sampled to312.5 Hz and subject to visual inspection.All data segments contaminated by eye-blinks or unwanted swallowing, coughingor movement artifacts were rejected andheart-beat artifacts were corrected using

Page 12: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437426

template-matching. All 8-12s continuousMEG records of each condition (VM or R)were split into non-overlapping 1s- epochs.This yielded about 150 to 300 artifact-freetrials for each of the two conditions persubject.

3.2 Methods

Feature quantification

After multiplication by a Hamming window(to avoid edge effects), the 1-s segments ofraw MEG signal z(t) corresponding to thek-th time window W_k was band-passfiltered in 6 frequency bands using zero-phase FIR filters (computed by frequencysampling (Oppenheim 89)). The Spectralpower was determined for each MEGsensor using Equation 1 in each one of thefollowing standard physiological frequencybands: delta (2-4Hz), theta (5-8Hz), alpha(8-12Hz), beta (15-30Hz), gamma1 (30-60Hz) and gamma2 (60-90Hz). In additionto spectral power, we also computed phasesynchrony (i .e. phase locking value,(Lachaux 1999)) between each pair ofsensors in each frequency band usingEquation 2.

Classification process

The many types of features that have beencomputed to quantify the brain state areseparated into 12 subsets: 6 subsetscontaining spectral power features in eachof the 6 frequency bands and 6 subsetscontaining phase synchrony features, alsoin each frequency band. An exhaustivecomparison of these 12 feature subsets hasbeen carried out by using one feature typeat a time to predict the brain state (i.e. thebehavioral condition) with one of the 3classifiers : LDA, SVM and KNN. Theregularization constant of the SVM was setto C=.1 (based on a preliminaryinvestigation) and the number of neighborsfor KNN has been set to 5. The predictivepowers of the power and synchrony featureswere assessed using two methods:

1) Ten fold cross-validation (intra-session):the training set of each session of each

subject were split into 10 parts, learningwas performed on 9 parts and theclassifier was tested on the remainingpart.

2) Inter-session cross-validation: for eachsubject, learning was performed on onefull session and tested on the remainingsession(s). (At least two sessions wereavailable for each subject).

Moreover, the quantity of informationabout the mental state given by theclassifier was evaluated by two quantities:

1) Classification accuracy: this is theaverage between the percentages of goodclassified points in each class. Thismeasure gives an average rate of 50%for a random guess of the classifier evenif the proportion of each class in thedataset is unbalanced.

2) Area under the ROC curve (AUR): thisquantity gives values of the classinformation of the discriminant functionresulting from the classifier regardless ofthe threshold to be applied to actuallypredict the class. The advantage of thismethod is that it allows for an evaluationof the classifier independently of anypossible offset between the features of thetest set and the training set. For a randomguess the value of AUR is .5 and aperfect (i.e. systematically correct) guessgives AUR=1. In our results, AUR iscomputed from the test set to evaluate thegeneralization ability of the classifier.

3.3 Results

SVM versus LDA and feature selection

To gain preliminary insight into the effectof the number of features (i.e. possiblediscrimination parameters) on classificationaccuracy, we first performed variableselection on the features linked to the betapower. Using an intra-session 10 fold crossvalidation, only the most discriminativefeatures according to a t-test on the trainingset were fed into a linear SVM. Forcomparison, the same procedure wascarried out with an LDA classifier and a 5nearest neighbor classifier (KNN, K=5).

Page 13: Classification methods for ongoing EEG and MEG signals

427BESSERVE ET AL. Biol Res 40, 2007, 415-437

The classifier accuracy results of all threealgorithms are shown in Fig. 4. Ourfindings suggest that, when using a smalltraining set (150 time windows), theaccuracy of linear SVM and KNN remainsstable as the number of selected variablesgrows. In contrast, LDA accuracydrastically decreases when more than 100features are selected. Linear SVM and KNNmay therefore be considered to be morerobust than the LDA approach with thishigh dimensional data. Our results alsoindicate that the drop in LDA accuracy isless important when a larger training set

(300 points) is used. This is in line with theidea that high dimensional data is betterclassified using more data points.Moreover, the results in Fig. 4 also suggestthat even when using linear SVM, it mightnot be necessary to select a high number offeatures in order to obtain an optimalaccuracy with the data at hand. Accordingto these first results from Fig. 4, in the restof the study investigating the MEGvisuomotor data, we chose to use 36features (with a Fisher t-test) in each subsetbefore the training of a linear SVMclassifier.

Figure 4: Classifier accuracy as a function of the number of features and training samples.Average classifier accuracy in inter-session cross-validation for the 3 subjects as a function of thenumber of selected power features (from 5 to 151 variables) for LDA, KNN and SVM for twodifferent sizes of the learning set: 150 points (left panel) and 300 points (right panel). Note thatthese results were obtained by only investigating beta power features a s a preliminary analysisaimed at comparing the three techniques.

Page 14: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437428

Classifier validation

To assess the stability of classifierperformance across time, we estimated theclassifier accuracy of SVM derived fromeach feature subset both within the samesession and in between sessions using 10fold cross-validation and inter-session cross-validation respectively. The averageclassification results across the 3 subjects aregiven Fig. 5. Our results show that for theMEG visuomotor data used here, the bestaccuracy was obtained using beta activitiesas discriminant features. This result did notdepend on the type of feature (power or

synchrony) nor the type of validation used(inter or intra session). In particular, thehighest score (86%) was obtained for betapower in intra session cross-validation (Fig.5). Note that nearly all features are lesspredictive in inter session than in intrasession validation. In particular, theclassification accuracy based on phasesynchrony in the beta and gamma bands wasreduced (up to 25%) in inter-sessionvalidation. Furthermore, while the predictivepower was equivalent for spectral powerfeatures and phase synchrony in intra-session validation, spectral power wasclearly more predictive in intersession.

Figure 5: Inter-session versus intra-session validation. Classification accuracy of the classifiersderived from power (upper panel) and synchrony (lower panel) features in different frequencybands and for two validation methods: intra-session (using ten fold cross validation) and inter-session (learning on one session and testing on the other).

Page 15: Classification methods for ongoing EEG and MEG signals

429BESSERVE ET AL. Biol Res 40, 2007, 415-437

The information content of each featureset was also investigated with AUR whichwe computed for the values of thediscriminant functions on the test setresulting from each previously computedSVM classifier. The mean AUR acrossthree subjects for each feature subset aregiven in Fig. 6. The same general tendencyas in Fig. 5 is observed except that theROC area does not drop in intersessioncompared to intra session in particular forthe beta and gamma1 power features. Thisconfirms that, with the data at hand, betapower clearly carr ies discriminativeinformation which can be used to infer the

mental state from data in another recordingsession.

The same two cross-validations werealso carried out to compute theclassification accuracy of feature subsetsthat combine both spectral power and phasesynchrony for a given frequency range. Thecomparison of the results obtained withclassifications based only on power, onlyon synchrony and on the combination ofboth are given in Fig. 7. Our findingssuggest that whether this combination isbeneficial or not to the classifier stronglydepends on the frequency band: in the threelower frequency bands, including phase

Figure 6: Area under the ROC curve of the decision function of classifiers derived from eachfeature subset in intra and inter-session cross-validation. Upper panel and lower panels depictthe results obtained using power and synchrony features respectively in all six selected frequencybands. The mean classifier accuracy obtained using the selected features from all the frequencybands are also represented on the right.

Page 16: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437430

synchrony seems to decrease classifieraccuracy achieved with spectral poweralone, but conversely the combination ofpower and synchrony in higher frequencybands outperforms the results obtained witheach taken separately. Moreover,combining the selected features from allfrequency bands also improves classifieraccuracy, reaching 97% as shown on theright of Fig. 7. This accuracy leads to anexcellent ITR of 48 bits/minute.

Distribution of classifier coefficients

The final point we investigated in this studyusing the MEG data set is the spatialdistribution of classifier coefficients acrossthe sensor-space. When using a binary linearclassifier such as LDA or SVM, thediscriminant function is an affine function ofthe features. Therefore when dealing withfeatures of comparable amplitude, one wouldexpect that the features associated to thecoefficients with the highest absolute value

are the most discriminant features for theclassification. If this coefficient is positivethe increase of corresponding featureproduces the decision ‘class 1’ (i.e. VM)while its decrease classifies the data as ‘class2’ (Rest). Such interpretations are notcompletely rigorous as the function is highlymultivariate but it may provide usefulinformation on the task-specific features.

To illustrate this with an example, wecomputed the scalp distribution of the SVMclassifier coefficients obtained for delta,beta and gamma 2 power as well as withalpha synchrony for one representativesubject on Fig. 8. The upper row shows thetopographic distribution of the classifiercoefficients related to spectral powerfeatures. The beta band power classifiertopography shows that this frequency rangeis associated with negative classifiercoefficients in frontal central areas (blueblob in the central topography). In otherwords, a decrease in beta power over thesesensors draws the prediction towards

Figure 7: Mean classifier accuracy obtained by combining power and synchrony features,compared to power-only or synchrony-only classification. The histograms depict the relationshipbetween classifier accuracy (in all three modes) and the selected frequency band. The bar on theright side shows classification accuracy obtained using the selected features from all the frequencybands combining power and synchrony.

Page 17: Classification methods for ongoing EEG and MEG signals

431BESSERVE ET AL. Biol Res 40, 2007, 415-437

Figure 8: Spatial distribution of SVM classifier coefficients for power and synchrony features invarious frequency bands for one subject. Upper row: The distribution of delta (d), beta (b) andgamma 2 (g ) power classifier coefficients: positive coefficients are represented by black squaresand negative ones by gray stars (marker size is proportional to amplitude). Middle and lower panel:distribution of delta and alpha synchrony classifier coefficients. Note that each small topography onthese panel represents the non-zero coefficients of phase synchrony involving the sensor at thecurrent scalp position. Positive coefficients are represented by black lines linking the current sensorand the other sensors, negative ones 2 are represented by grey lines. The close-ups indicateinteresting sensors showing many long distance synchronies.

Page 18: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437432

classifying the state as an ongoingvisuomotor state (VM) whereas an increaseis associated with the resting state (Rest).Moreover, positive coefficients, whichcorrespond to labeling a data sample VM(i.e. a brain state corresponding tovisuomotor control) show prominent peaksover frontal areas contralateral to themoving limb in both the delta and gamma2frequency bands. In contrast to the rathercentral beta suppression, the location ofthese delta and high gamma peaks mostlikely corresponds to sensors most sensitiveto neural activation originating in thecontralateral sensorimotor cortex.Furthermore, although one might betempted to interpret the positivecoefficients over posterior regions as beingtask-related and reflecting induced powerincrease in all three frequency bands duringvisuomotor coordination, this should bedone with great caution. Indeed, gammapower modulations over posterior occipitalareas are thought to be involved in visualprocessing, but one cannot exclude that theposterior modulations at the bottom of thesensor array are not, at least in part, causedby muscle artifacts. Muscle artifacts areknown to be broad band (extending alsoover gamma, beta and even lowerfrequencies) and may well be selectivelypresent during the visuomotor task but notduring the resting condition and thus pickedup by our SVM classifier as a discriminantfeature. This said, the classifier coefficientpeaks over central and contralateralsensorimotor areas are too far from muscleactivities to be contaminated by any suchartifacts and they clearly represent task-related neural power modulations.Interestingly, these findings are in line withour previous results on task-related powermodulation in visuomotor control that werenot obtained by any classification procedurebut rather by contrasting powertopographies across conditions both atsensor level and source level (Jerbi et al.2004, 2005). This converging evidencefrom analysis of task-related cortical powermodulations on the one hand and linearclassification based on multi-frequencypower on the other, underline the fact thatchanges in oscillatory power represent a

strong candidate for classification methodsbecause they represent fundamentalmodulations in the neurophysiologicalprocesses associated with various states. Asfar as delta and alpha synchrony areconcerned (middle and lower panel of Fig.8), the spatial distribution of positive andnegative coefficients seems to be morecomplex than the power topographies. Thismay be in part due to the high number offeatures available. The delta synchronydistribution shows a high number ofpositive coefficients for synchronizationcouples between the left motor cortex andother areas, also in line with previousresults (Jerbi et al., 2007). Conversely, it isnoteworthy that right motor cortex yieldsnegative synchrony coefficients with otherdistant areas. Whether this reflects thedisengagement of the ipsilateral motorcortex (versus the engagement of thecontralateral motor cortex) remains to bechecked on a complete group of subjects.Also interesting are the positive alphasynchronization coefficients betweencontralateral parietal cortex andcontralateral frontal areas that could reflectthe enhanced interactions between visualand frontal areas during the visuomotortask.

3.4 Discussion

The results reported here illustrate severalimportant issues related to the classificationof MEG states. First of all, we have shownthat the number of chosen features caninfluence classification performancedrastically. This is especially true for LDAand any other model- based classifier thatrequires the estimation of many parametersof the probability distribution of the data. Inpractice, the number of parameters toestimate has to be small compared to thenumber of data points in the training set. InLDA for example, all the elements of thecovariance matrix of the data have to beestimated: their number is proportional tothe square of the number of features. UsingLDA in high dimensions thus requires alarge amount of data points in the trainingset. Moreover, even though SVM seems tobe less sensitive to the dimension of the

Page 19: Classification methods for ongoing EEG and MEG signals

433BESSERVE ET AL. Biol Res 40, 2007, 415-437

data set used here, this is not necessarilytrue in general. In fact, if we had chosen alarger regularization constant C,classification would have dropped due tothe high complexity of the classifier.Finally, a non linear classifier like KNNcan achieve competitive classificationaccuracy levels and may very well bechosen for MEEG classification, but doesnot clearly outperform a linear SVM. It thusseems that, in the high dimensional featurespace considered here, linear classifiers arerobust and flexible enough to perform theclassification task with acceptableaccuracy.

After having chosen a reasonablenumber of parameters and a reliableclassifier, the quality of the results is stillsubject to the validation method. We haveshown for example that predicting themental state of data from one session aftertraining the classifier on another session ismore difficult than predicting across thesame session using cross-validation. Thiscan be due to many factors related to thereproducibility of the experiment andsession specific artifacts or parametervariations (e.g. variable head positionwithin the MEG helmet). This may lead tosignificant changes in the MEG recordingsthat could then impair classification results.In some cases, a simple change of thethreshold of the discriminant function maybe sufficient to improve classification. Thisis precisely the case we discussed when weobserved that in inter-session validation theROC area did not drop off whereasclassifier accuracy did (i.e. ROC accountsfor offset variations related to variablebaseline levels in between sessions). Suchsimple corrections might not be sufficientwhen using phase synchrony measurementsas discriminant features, especially in thehigher frequency ranges which can beextremely sensitive to artifacts. A furtherpossible cause for the drop in classificationaccuracy between recording sessions is achange in the state of the subjectsthemselves, due to learning, a change ofstrategy or simply alertness and motivationlevels. This is particularly relevant forcontinuous mental states which areinternally paced and thus less stable than

exogenous activities triggered by astimulus. In general, the difficulty topredict one session using another is acrucial and challenging issue which has tobe examined carefully for applications suchas Brain Computer Interfaces.

In this paper, we discussed the use ofpower and phase synchrony to perform dataclassification. A crucial limitation of phasesynchrony analysis is that it is extremelydifficult to differentiate between truephysiological coupling between distinctneural assemblies and false synchronizationthat appears due to the spreading on thescalp of the electromagnetic field of aunique cortical source. According to Fig. 7combining power and synchrony features inthe upper frequency bands improvesclassification accuracy which is anargument to say that the recordedsynchrony does not carry exactly the sameinformation as power, and thus may notuniquely be explained by diffusion.Whether, and if so to which extent,synchrony between neighboring sensorsreflects local amplitude modulations of thesame neuronal assembly is not the aim ofthis specific study and it remains of coursean open issue. What is more important,however, is that long-range synchronizationbetween widely separated sensors is lesslikely to represent a single sourceespecially if one can show that the couplingbetween the two sensors is not zero-phase(Nolte et al. 2004). Finally, it might benoteworthy to recall that although insightinto the dynamics of local and large-scalesynchrony and differentiating the role of thetwo would be well appreciated from a basicphysiological stand point, the ultimate goalof a classification method is above all toseparate two or more conditions in the mostrobust and efficient way possible.

The scalp distributions of classifiercoefficients obtained here with thevisuomotor MEG data set are in agreementwith previous work in the field of ERD andERS analysis in particular in thesensorimotor cortex using non-invasive(Pfurtscheller and Lopes Da Silva, 1999)and invasive (Crone et al. 1998a, 1998b)recordings. Although the spatialdistribution of classifier coefficients was

Page 20: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437434

reported here with real visuomotor MEGdata in a subset of frequency bands forillustrative purposes, the obtainedtopographies are in remarkable agreementwith our knowledge of the modulation anddistribution of cortical oscillatory activityduring sensorimotor behavior. First, thedistribution of the central negativecoefficients for beta power is in line withthe known suppression of beta powerduring movement over bilateralsensorimotor cortices. The powersuppression, which seems to be enhancedover the contralateral cortex, probablyinvolves primary sensory, motor andpremotor areas, including Brodmann Area 4(BA4) and most likely the supplementarymotor area (SMA) (Jerbi et al. 2004, 2005).Furthermore, in the light of the positivecoefficients revealed here in the highgamma range over the contralateralsensorimotor area, and bearing in mindprevious findings on the presence of focalgamma activity in the activated motorcortex (Crone et al. 1998b), it is alsotempting to suggest that the bilateral betaband activations might be less task-specificor functionally relevant than the gammapower modulations. More studies will beneeded in order to systematically addressthe functional relationship between betasuppression and gamma power increaseduring motor tasks. Finally, a furtherfrequency range that showed strong positiveclassification coefficients betweenvisuomotor and resting conditions is thedelta (2-4 Hz) band. Although surprisinglylow, this frequency band has recently beenshown to host slow sensorimotoroscillations related to the ongoing controlof hand speed (Jerbi et al. 2007). Whetheran increase in delta-range cortical powerchanges in the motor cortex representsmodulation of an intrinsic sensorimotorrhythm or whether it is rather a directreflection of task parameters is still an openquestion. Similarly, it would be difficult todraw qualitative conclusions by visualinspection of the complex synchronizationtopographies (middle and lower panel ofFig. 8), and our observations aboutenhanced contra-lateral synchronization inthe delta and alpha band have to be

confirmed by future work on a largerpopulation of subjects. What matters here,is that the classification method we usedidentified the sensorimotor delta power asone of the major features that best separatesthe two conditions, a finding fully in linewith a separate analysis of the data. Again,these converging findings from basicresearch analysis and data classificationprocedures confirm the fact that successfulclassification inherently relies ondistinguishing task-related physiologicalphenomena. Moreover, althoughclassification techniques can be improvedby taking neuroscience a priori knowledgeinto account (such as frequency bands orregions of interest), they also hold thepotential of indirectly enhancing ourunderstanding of brain dynamics. Indeed,the features revealed by such methods to bemost relevant for the optimization ofclassification may lead to new hypothesesabout the dynamics of human cognition inspecific tasks.

CONCLUSION

Beyond reviewing the basic principles ofBCI-oriented data classification, we usedMEG data to illustrate the limitations anddifferences between several approaches andto shed some light on the effect of varioususer-defined parameters. Furthermore, thesuccessful MEG data classification reportedhere in offline analysis, demonstrates thatthe real-time detection of visuomotoractivity using power and synchronymeasurements is feasible and suggests thatclassification of motor or visuomotorimagery (without actual movement) can beobtained using the same methodologydeveloped here. This is particularly thecase, since motor imagery is known toactivate (although to a lesser extent)predominantly the same areas involved inreal movement execution (Beisteiner et al.,1995; Leocani et al. , 1999) and thesubject’s ability to enhance thecorresponding oscillatory modulations canalso be trained. On the whole, the accuracyof the classification achieved here during acontinuous task (i.e. not event-related)

Page 21: Classification methods for ongoing EEG and MEG signals

435BESSERVE ET AL. Biol Res 40, 2007, 415-437

suggests the possibility of achieving a hightransmission rate asynchronous BCI. Inaddition, the agreement between theextracted classification features on onehand and the knowledge of thephysiological processes underlying thevisuomotor control data on the other,suggests that basic neuroscience researchcan also benefit from M/EEG dataclassification methods as they may yieldnovel insights to the study of themechanisms underlying ongoing braindynamics during mental tasks.

ACKNOWLEDGEMENTS

The authors would like to thank Dr MarioChavez and Fréderique Amor for theiruseful comments on the manuscript.

BIBLIOGRAPHY

ANDERSON C, STOLZ E, SHAMSUNDER S (1998)Multivariate Autoregressive Models for Classificationof Spontaneous Electroencephalogram During MentalTasks. IEEE Trans Biomed Eng 45: 277-286

BARRET G (1996). Event-related potentials (ERPs) as ameasure of complex cognit ive function.Electroencephalogr Clin Neurophysiol 46: 53-63.

BASHASHATI A, FATOURECHI M, WARD RK, BIRCHGE (2007) A survey of signal processing algorithms inbrain-computer interfaces based on electrical brainsignals. J Neural Eng 4: R32-R57

BEISTEINER R, HOLLINGER P, LINDINGER G, LANGW, BERTHOZ A. (1995) Mental representations ofmovements. Brain potentials associated withimagination of hand movements. Electroenceph ClinNeurophysiol, 96:183-193

BORISOFF JF, MASON SG, BASHASHATI A, BIRCHGE (2004) Brain-Computer Interface Design forAsynchronous Control Applications: Improvements tothe LF- ASD Asynchronous Brain Switch. IEEE Trans.Biomed Eng 51: 985-992

BOSTANOV V (2004) {BCI} competition 2003-data setsib and iib: feature extraction from event-related brainpotentials with the continuous wavelet transform andthe t-value scalogram. IEEE Trans Biomed Eng 51:1057-1061

BROVELLI A, DING M, LEDBERG A, CHEN Y,NAKAMURA R, BRESSLER S (2004) Betaoscillations in a large-scale sensorimotor corticalnetwork: directional influences revealed by grangercausality. Proc Natl Acad Sci USA 101: 9849-9854

CARLSON TA, SCHRATER P, HE S (2003). Patterns ofactivity in the categorical representation of objects. JCogn Neurosci 15: 704-717

CRONE N, MIGLIORETTI D, GORDON B, LESSER R(1998a) Functional mapping of human sensorimotorcortex with electrocorticographic spectral analysis. ii.event- related synchronization in the gamma band.Brain 121: 2301-2315

CRONE N, MIGLIORETTI D, GORDON B, SIERACKI J,WILSON M, UEMATSU S, LESSER R (1998b)Functional mapping of human sensorimotor cortex withelectrocorticographic spectral analysis. i. alpha andbeta event-related desynchronization. Brain 121: 2271-2299

DUDA RO, HART PE, STORK DG (2001) PatternClassification Wiley-Interscience

FARWELL LA, DONCHIN E (1988) Talking off the top ofyour head: toward a mental prosthesis utilizing event-related brain potentials . Electroenceph ClinNeurophysiol 70: 510-523

FRACKOWIAK RSJ, FRISTON KJ, FRITH CD, DOLANRJ, MAZZIOTTA JC (1997) Human Brain Function.San Diego, Calif. Academic Press

FUCHS T, BIRBAUMER N, LUTZENBERGER W,GRUZELIER JH, KAISER J (2003) NeurofeedbackTreatment for Attention-Deficit/Hyperactivity Disorderin Children: A Comparison with Methylphenidate.Applied Psychophysiology and Biofeedback 28: 1-12.

GARRETT D, PETERSON DA, ANDERSON CW,THAUT MH (2003) Comparison of linear, nonlinear,and feature selection methods for EEG signalclassification. IEEE Trans Neural Syst Rehabil Eng.11: 141-144

GEORGOPOULOS AP, LANGHEIM FJ, LEUTHOLD AC,MERKLE ANMSPMTISEBR (2005)Magnetoencephalographic signals predict movementtrajectory in space. Exp Brain Res 167: 132-135.

GRAIMANN B, PFURTSCHELLER G (2006)Quantification and visualization of event-relatedchanges in oscillatory brain activity in the time-frequency domain In: NEUPER C, KLIMESCH W(eds) Progress in Brain Research 159: 79-97

GREEN DM, SWETS JA (1974). Signal Detection Theoryand Psychophysics. Huntington, NY: R. E. Krieger

GYSELS E, CELKA P (2004) Phase Synchronisation forthe recognition of Mental Tasks in a Brain ComputerInterface. IEEE Trans Neural Syst Rehabil Eng 12:206-415

GYSELS E, RENEVEY P, CELKA P (2005) SVM-basedrecursive feature elimination to compare phasesynchronization computed from broadband andnarrowband EEG signals in Brain-Computer Interfaces.Signal Processing 85: 2178-2189

HASELSTEINER E, PFURTSCHELLER G (2000) Usingtime-dependent neural networks for EEG classification.IEEE Trans Rehab Eng 8: 457-463

HASTIE T, TIBSHIRANI R, FRIEDMAN J (2001) TheElements of Statistical Learning Springer Verlag

HAYNES JD, REES G (2006) Decoding mental states frombrain activity in humans. Nat Rev Neurosci 7: 523-534

HIRAIWA A, SHIMOHARA K, TOKUNAGA Y (1990)EEG topography recognition by neural networks. IEEEEng Med Biol Mag 9: 39-42

JERBI K, BAILLET S, LACHAUX JP, PANTAZIS D,LEAHY R, GARNERO L (2005) Modulations ofPower and Synchronization of Neural Activity duringSustained Visuomotor Coordination, a MEG Study.Proc. 11th Int Conf on Human Brain Map

JERBI K, LACHAUX JP, N’DIAYE K, PANTAZIS D,LEAHY RM, GARNERO L, BAILLET S (2007)Coherent neural representation of hand speed inhumans revealed by MEG imaging Proc Nat Ac SciUSA 104: 7676-7681

JERBI K, LACHAUX J-P, BAILLET S, GARNERO L(2004). Imaging Cortical Oscillations during SustainedVisuomotor Coordination In MEG. Proc. IEEEInternational Symposium on Biomedical Imaging:From Nano to Macro.

Page 22: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437436

KEIRN ZA,AUNON JI (1990) A new mode ofcommunication between man and his surroundings.IEEE Trans Biomed Eng 37: 1209-1214

KREPKI R (2006) Berlin Brain-Computer Interface. TheHCI communication channel discovery. Int J Human-computer Studies 65: 460-477

KUBLER A, KOTCHOUBEY B, KAISER J, WOLPAWJR, BIRBAUMER N (2001) Brain-computercommunication: unlocking the locked in. Psychol Bull127: 358-375

LACHAUX JP, RODRIGUEZ E, MARTINERIE J,VARELA FJ (1999) Measuring phase synchrony inbrain signals Hum Brain Mapp 8: 194-208

LACONTE S, STROTHER S, CHERKASSKY V,ANDERSON J, HU X (2005) Support vector machinesfor temporal classification of block design fMRI data.Neuroimage 26: 317-329

LAL TN, SCHRODER M , HILL J, PREISSL H,HINTERBERGER T, MELLINGER J, BOGDAN M,ROSENSTIEL W, HOFMANN T, BIRBAUMER N,SCHOLKOPF B (2005) A Brain Computer Interface withOnline Feedback based on Magnetoencephalography. In:DE RAEDT L, WROBEL S (eds) Proceedings of the22nd International Conference on Machine Learning:465-472

LEBEDEV M, NICOLELIS MA (2006) Brain-machineinterfaces: past, present and future. Trends Neurosci29: 536-546

LEOCANI L, MAGNANI G, COMI G (1999) Event RelatedDesynchronisation during execution, imagination andwithholding of movement. Handbook ofElectroencephalography and Clinical Neurophysiology.PFURTSCHELLER GALDS, F H 6: 245-268

LOPES DA SILVA FH (2006) Event related oscillations:what about phase? In: NEUPER C, KLIMESCH W(eds) Progress in Brain Research 159: 1-17

LOTTE F, CONGEDO M, LECUYER A, ARNALDI B(2007) A review of classification algorithms for EEG-based Brain-Computer Interfaces. J Neural Eng 4: R1-R13

MIDDENDORF M, MCMILLAN G, CALHOUN G,JONES KS (2000) Brain-computer interfaces based onthe steady-state visual-evoked response. IEEE TransRehabil Eng 8: 211-214

MILLAN JR, RENKENS F, MOURINO J, GERSTNER W(2004) Noninvasive brain-actuated control of a mobilerobot by human EEG. IEEE Trans Biomed Eng 51:1026-33

MULLER KR, MIKA S, RATSCH G, TSUDA K,SCHOLKOPF B (2001) An introduction to kernel-based learning algorithms. IEEE Tran Neur Netw 12:181-202

MULLER-GERKING J, PFURTSCHELLER G,FLYVBJERG H (1999) Designing optimal filters forsingle-trial EEG classification in a movement task.Clin Neurophysiol 110: 787-798

NOLTE G, BAI O, WHEATON L, MARI Z, VORBACH S,HALLETT M (2004) Identifying true brain interactionfrom EEG data using the imaginary part of coherency.Clin Neurophysiol 115: 2292-2307

OBERMEIER B, GUGER C, NEUPER C,PFURTSCHELLER G (2001) Hidden Markov modelsfor online classification of single trial EEG. PatternRecognit Lett 22: 1299-1309

OPPENHEIM AV, SCHAEFER RW (1989) Digital signalprocessing. Prentice Hall, (please replace the date by1975 for the original edition)

PFURTSCHELLER G (2006) The cortical activation model(CAM) In: NEUPER C, KLIMESCH W (eds) Progressin Brain Research 159: 19-27

PFURTSCHELLER G, NEUPER C, FLOTZINGER D,PREZENGER M (1997) EEG-based discriminationbetween imagination of r ight and left handElectroencephalogr. Clin Neurophysiol 103: 642-651

PFURTSCHELLER G, LOPES DA SILVA FH (1999)Event-related synchronization and desynchronization:basic movement. EEG/MEG principles ClinNeurophysiol 110: 1842-1857

PIKOVSKI A, ROSENBLUM M, KURTHS J (2001)Synchronization: A Universal Concept in NonlinearScience

WEI Q, WANG Y, GAO X, GAO S (2007) Amplitude andphase coupling measures for feature extraction in anEEG-based brain-computer interface. J Neural Eng 4:120-129

QUYEN MLV, SOSS J, NAVARRO V, ROBERTSON R,CHAVEZ M, BAULAC M, MARTINERIE J (March2005) Preictal state identification by synchronizationchanges in long-term intracranial EEG recordings. ClinNeurophysiol 116: 559-568

RODRIGUEZ E, GEORGE N, LACHAUX JP, MARTINERIEJ, RENAULT B (1999) Perception shadow: long distancegamma band synchronisation and desynchronisation on thehuman scalp. Nature 397: 430-433

SALMELIN R, HARI R (1994) Spat iotemporalcharacter is t ics of sensor imotor neuromagnet icrhythms related to thumb movement. Neuroscience60: 537-550

SCHERER R, MÜLLER GR, NEUPER C, GRAIMANN B,PFURTSCHELLER G (2004) An AsynchronouslyControlled EEG-Based Virtual Keyboard :Improvement of the Spelling Rate. IEEE Trans BiomedEng 51: 979-984

SHANNON CE, WEAVER W (1964) Mathematical Theoryof Communication. Champaign, IL: University IllinoisPress

STONE M (1974) Cross-val idatory choice andassessement of statistical predictions. J Roy StatistSoc 36: 111-147

STREHL U, LEINS U, GOTH G, KLINGER C,HINTERBERGER T, BIRBAUMER N (2006) Self-regulation of Slow Cortical Potentials: A NewTreatment for Children With Attention-Deficit /Hyperactivity Disorder. Pediatrics 118: 1530-1540

TALLON-BAUDRY C,BERTRAND O (1999) Oscillatorygamma activity in humans and its role in objectrepresentation. Trends Cogn Sci 3: 151-162

TALLON-BAUDRY C, BERTRAND O, DELPUECH C,PERMIER J (1997) Oscillatory gamma-band (30-70Hz) activity induced by a visual search task in humans.J Neurosci 17: 722-734

TAYLOR DM, TILLERY SI, SCHWARTZ AB (2002)Direct cortical control of 3D neuroprosthetic devices.Science 296:1829-1832

TOWNSEND G, GRAIMANN B, PFURTSCHELLER G(2006) A comparison of common spatial patterns withcomplex band power features in a four-class BCIexperiment IEEE Trans Biomed Eng 53: 642- 651

VAPNIK V (1998) Statistical Learning Theory. JohnWiley, New York

VARELA F, LACHAUX JP, RODRIGUEZ E,MARTINERIE J (2001) The Brainweb: Phasesynchronisation and Large-scale integration. NatureRev Neurosci 2: 229-239

VARELA FJ (1995) Resonnant cell assemblies : a newapproach to cognit ive function and neuronalsynchrony. Biol Res 28: 81-95

WALTRE DO (1968) Coherence as a mesure ofrelationship between EEG records. Electroenceph ClinNeurophysiol 24: 282

Page 23: Classification methods for ongoing EEG and MEG signals

437BESSERVE ET AL. Biol Res 40, 2007, 415-437

WESSBERG J, NICOLELIS MA (2004) Optimizing alinear algorithm for real-time robotic control usingchronic cortical ensemble recordings in monkeys. JCogn Neurosci 16:1022-1035

WOLPAW JR, BIRBAUMER N, MCFARLAND DJ,PFURTSCHELLER G, VAUGHAN TM (2002) Brain-computer interfaces for communication and control.Clin Neurophysiol 113: 767-791

Page 24: Classification methods for ongoing EEG and MEG signals

BESSERVE ET AL. Biol Res 40, 2007, 415-437438