Top Banner
IOP PUBLISHING JOURNAL OF NEURAL ENGINEERING J. Neural Eng. 5 (2008) 9–23 doi:10.1088/1741-2560/5/1/002 A self-paced brain–computer interface system with a low false positive rate M Fatourechi 1 , R K Ward 1,2 and G E Birch 1,2,3 1 Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada 2 Institute for Computing, Information and Cognitive Systems, Vancouver, BC, Canada 3 Neil Squire Society, Burnaby, BC V5M 3Z3, Canada E-mail: [email protected], [email protected] and [email protected] Received 2 August 2007 Accepted for publication 9 November 2007 Published 10 December 2007 Online at stacks.iop.org/JNE/5/9 Abstract The performance of current EEG-based self-paced brain–computer interface (SBCI) systems is not suitable for most practical applications. In this paper, an improved SBCI that uses features extracted from three neurological phenomena (movement-related potentials, changes in the power of Mu rhythms and changes in the power of Beta rhythms) to detect an intentional control command in noisy EEG signals is proposed. The proposed system achieves a high true positive (TP) to false positive (FP) ratio. To extract features for each neurological phenomenon in every EEG signal, a method that consists of a stationary wavelet transform followed by matched filtering is developed. For each neurological phenomenon in every EEG channel, features are classified using a support vector machine classifier (SVM). For each neurological phenomenon, a multiple classifier system (MCS) then combines the outputs of the SVMs. Another MCS combines the outputs of MCSs designed for the three neurological phenomena. Various configurations for combining the outputs of these MCSs are considered. A hybrid genetic algorithm (HGA) is proposed to simultaneously select the features, the values of the classifiers’ parameters and the configuration for combining MCSs that yield the near optimal performance. Analysis of the data recorded from four able-bodied subjects shows a significant performance improvement over previous SBCIs. (Some figures in this article are in colour only in the electronic version) Abbreviation ANOVA analysis of variance AUC Area under the ROC BCI brain–computer interface CPBR changes in the power of beta rhythms CPMR changes in the power of mu rhythms DWT discrete wavelet transform EEG electroencephalogram ERP event-related potential FA false activation FAR false activation rate FP false positive FPR false positive rate FIR finite impulse response FN false negative GA genetic algorithm HGA hybrid genetic algorithm IC intentional control LF-ASD low frequency-asynchronous switch design LFASD Path A variation of the LF-ASD that uses the knowledge of the path of features LFASD User-Customized A variation of the LF-ASD whose design parameter values are user- customized MCS multiple classifier system MF matched filtering MRP movement-related potential NC no control NN nearest neighbor 1741-2560/08/010009+15$30.00 © 2008 IOP Publishing Ltd Printed in the UK 9
15

A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

Jun 29, 2018

Download

Documents

dodiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

IOP PUBLISHING JOURNAL OF NEURAL ENGINEERING

J. Neural Eng. 5 (2008) 9–23 doi:10.1088/1741-2560/5/1/002

A self-paced brain–computer interfacesystem with a low false positive rateM Fatourechi1, R K Ward1,2 and G E Birch1,2,3

1 Department of Electrical and Computer Engineering, University of British Columbia, Vancouver,BC V6T 1Z4, Canada2 Institute for Computing, Information and Cognitive Systems, Vancouver, BC, Canada3 Neil Squire Society, Burnaby, BC V5M 3Z3, Canada

E-mail: [email protected], [email protected] and [email protected]

Received 2 August 2007Accepted for publication 9 November 2007Published 10 December 2007Online at stacks.iop.org/JNE/5/9

AbstractThe performance of current EEG-based self-paced brain–computer interface (SBCI) systemsis not suitable for most practical applications. In this paper, an improved SBCI that usesfeatures extracted from three neurological phenomena (movement-related potentials, changesin the power of Mu rhythms and changes in the power of Beta rhythms) to detect an intentionalcontrol command in noisy EEG signals is proposed. The proposed system achieves a high truepositive (TP) to false positive (FP) ratio. To extract features for each neurologicalphenomenon in every EEG signal, a method that consists of a stationary wavelet transformfollowed by matched filtering is developed. For each neurological phenomenon in every EEGchannel, features are classified using a support vector machine classifier (SVM). For eachneurological phenomenon, a multiple classifier system (MCS) then combines the outputs ofthe SVMs. Another MCS combines the outputs of MCSs designed for the three neurologicalphenomena. Various configurations for combining the outputs of these MCSs are considered.A hybrid genetic algorithm (HGA) is proposed to simultaneously select the features, thevalues of the classifiers’ parameters and the configuration for combining MCSs that yield thenear optimal performance. Analysis of the data recorded from four able-bodied subjects showsa significant performance improvement over previous SBCIs.

(Some figures in this article are in colour only in the electronic version)

Abbreviation

ANOVA analysis of varianceAUC Area under the ROCBCI brain–computer interfaceCPBR changes in the power of beta rhythmsCPMR changes in the power of mu rhythmsDWT discrete wavelet transformEEG electroencephalogramERP event-related potentialFA false activationFAR false activation rateFP false positiveFPR false positive rateFIR finite impulse response

FN false negativeGA genetic algorithmHGA hybrid genetic algorithmIC intentional controlLF-ASD low frequency-asynchronous switch

designLFASDPath A variation of the LF-ASD that uses

the knowledge of the path of featuresLFASDUser-Customized A variation of the LF-ASD whose

design parameter values are user-customized

MCS multiple classifier systemMF matched filteringMRP movement-related potentialNC no controlNN nearest neighbor

1741-2560/08/010009+15$30.00 © 2008 IOP Publishing Ltd Printed in the UK 9

Page 2: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

M Fastourechi et al

ROC receiver operating characteristicSBCI self-paced brain–computer interfaceSBCIFully-Automated The SBCI system proposed in this

studySBCISemi-Automated The SBCI system proposed in [1]SVM support vector machineSWT stationary wavelet transformTN true negativeTP true positiveTPR true positive rate

1. Introduction

Self-paced brain–computer interface (SBCI) systems allowindividuals to control a device or an object using their brainsignals only, and at their own pace, i.e. whenever they wish.This is unlike the traditional synchronized approach, where theuser is only able to control the device during periods specifiedby the system [2].

The performance of SBCIs is usually determined viatwo objective functions: (1) the true positive (TP) rate, i.e.the percentage of intentional control (IC) commands that arecorrectly detected by the SBCI system, and (2) the falsepositive rate (FP), i.e. the rate of false positives generatedby the system during the periods for which the user does notintend control (No Control (NC) periods). In other words, theFP rate is calculated as the percentage of decisions in the NCperiods that are false.

Currently, the performance of EEG-based SBCIs is notsuitable for most practical applications. For example, thelatest variation of an SBCI system, called the low frequency-asynchronous switch design (the LF-ASD) generates a falsepositive every 12 s on average (with average TP rate = 41.1%)[3]. Such frequent false activations may cause user frustrationand limit the application of the system. In this paper, we focuson improving the performance of SBCI systems in terms ofdecreasing the FP rate (FPR) so that the system is more suitablefor practical applications. There are several ways of improvingthe performance of EEG-based SBCIs. These include theuse of sophisticated signal-processing schemes; exploringspatial, temporal and frequency-related information of EEGsignals; and taking advantage of the information providedby different control sources (neurological phenomena) of thebrain (see [4] for a review of current neurological phenomenaused in BCI systems).

To improve the performance of SBCIs, the simultaneoususe of three neurological phenomena as sources of controlhas been recently proposed [1]. These phenomena consistedof movement-related potentials (MRP) [5–7], changes in thepower of Mu rhythms (CPMR) and changes in the power ofBeta rhythms (CPBR) [8, 9]. The main rationale behind usingthese specific neurological phenomena is that they are timelocked to the onset of a movement. Thus, when a movementoccurs, they are expected to be present in the EEG. A numberof papers provide some evidence that these MRP and changesin the Mu rhythms provide complementary information toexplore the cognitive functions of the brain [7, 10–14]. There

is also some evidence regarding the differences between theMu and Beta rhythms [15, 16]. See [1] for more details.

In [1], an EEG-based SBCI is proposed thatuses information extracted from these three neurologicalphenomena and achieves low FP rates. One feature is extractedfor each phenomenon in each EEG channel, resulting inthe generation of three features per EEG channel. Eachfeature is extracted by matched filtering (MF) the signal witha template of the corresponding neurological phenomenon(created through averaging the IC epochs). Each featureis classified using a K-nearest neighbor (K-NN) classifier.Increasing the number of neurological phenomena from oneto three has the disadvantage of tripling the dimensionalityof the feature space. To reduce the dimensionality of thefeature space, therefore, a new algorithm is developed in[1] that uses a two-stage multiple classifier system (MCS)to classify the features. An MCS forms a strong classifierby using an ensemble of ‘weaker’ classifiers. For anSBCI, the number of training IC patterns is usually limited.Therefore, the proposed two-stage MCS allows the system toexamine a large number of features, thus exploring as muchinformation as possible. To reduce the dimensionality of thefeature space, a genetic algorithm (GA) is used to select asubset of features that yield near optimal performance. Forsimplicity, the parameter values of all classifiers are assumed tobe the same and are selected through an exhaustive search.The proposed system is shown to achieve low FP rates (anaverage FP rate of 0.5% for four subjects). The TP rate,however, is also low (the average TP rate is 27.3%). Toimprove the performance of the system, we note that ithas a total of 3N classifiers, where N is the number ofEEG channels. In [1], it is assumed that the parametersof all classifiers have the same value. The parametervalues are then found using an exhaustive search. Thisprocess is clearly suboptimal. Furthermore, because of thecomputational complexity involved, the corresponding MCSfor each neurological phenomenon is designed separately. Foreach MCS, a separate GA is employed to select the featuresthat produce the best performance. A better design would be tohave the process of feature selection carried out simultaneouslyfor all three MCSs.

In this paper, we have expanded the methodologyproposed in [17]. We propose improvements to the SBCIsystem designed using three neurological phenomena to boostits performance. A method that uses a stationary wavelettransform (SWT) and matched filtering is developed for featureextraction. Support vector machines (SVMs) are used forclassification because they have the advantage of minimizingthe empirical risk (the training error), as well as the confidenceerror (the test error) [18]. We also used bipolar EEG signalsinstead of monopolar EEG signals as in [1]. This is doneby first recording the EEG signals in a monopolar fashion(e.g., electrodes F1 and FC1, referenced to ear electrodes).Then the bipolar EEG signals are generated by calculatingthe differences between each adjacent pairs of EEG electrodes(e.g. F1-FC1 in the above example). Bipolar signals werecalculated because it has been shown that bipolar EEG signalsmay result in the generation of more discriminant wavelet

10

Page 3: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

A self-paced brain–computer interface system with a low false positive rate

Figure 1. The overall structure of the improved SBCI.

features (extracted from MRPs) than when monopolar EEGsignals are used [19]. Since using bipolar EEG signalsleads to an increase in the number of EEG signals, thedimensionality of the feature space as well as the numberof classifier parameters whose values need to be estimatedincrease. A hybrid genetic algorithm (HGA) is proposedto automate the design process of the improved SBCI. Theproposed HGA simultaneously selects the features, estimatesthe classifiers’ parameters and chooses how the outputs ofMCSs developed for each neurological phenomenon, shouldbe combined together. Analysis of the data obtained fromfour able-bodied subjects (coded as AB1 to AB4) shows thatthe improved SBCI performs significantly better than previousEEG-based SBCIs.

2. Methods

The structure of the improved SBCI is shown in figure 1. Foreach neurological phenomenon in every EEG signal (thereare N EEG signals in total), features are extracted using anSWT. To reduce the dimensionality of the wavelet featurespace, we propose the use of a matched filter. For eachneurological phenomenon in an EEG channel, an SVM isdesigned (resulting in a total of 3N classifiers). The output ofeach SVM is a logical state ‘1’, when an IC pattern is detectedand is ‘0’ in other cases. For each neurological phenomenon,an MCS classifies the outputs of N SVMs using the majorityvoting rule. A second-stage MCS uses the outputs of thethree MCSs to decide the outputs of which MCSs shouldbe combined together and how this combination shall bedone. An HGA is employed to simultaneously find (1) thesubset of features, (2) the parameter values for each SVM and(3) the configuration of the three MCSs that leads to nearoptimal performance (defined as the TPR

FPR ratio). In the rest of

this section, we describe the details of the components of thistwo-stage MCS.

2.1. Feature extraction

The discrete wavelet transform (DWT) is a powerful tool forextracting time-frequency features. It has been extensivelyapplied in the analysis of event-related potentials (ERPs)[20, 21], as well as in the design of BCI systems [22–26].

The DWT is defined as the convolution of a signal x(t)with a wavelet function ψa,b(t), where ψa,b(t) is the dilatedand shifted version of the wavelet function ψ(t) and is definedas follows:

ψa,b(t) = 1√a

· ψ

(t − b

a

), (1)

where a and b are the scale and translation parameters,respectively. The DWT thus maps a signal of one independentvariable t into a function of two independent variables a, b,such that

aj = 2−j , bj,k = 2−j · k (j, k are integers). (2)

The contracted versions of the wavelet function matchthe high-frequency components of the original signal and thedilated versions match the low-frequency oscillations. Thenby correlating the original signal with the wavelet functions ofdifferent sizes, the details of the signal at different scales areobtained. The resulting correlation features can be arranged ina hierarchical scheme called multi-resolution decomposition[27], which separates the signal into ‘details’ at differentfrequency bands and a coarser representation of the signalcalled an ‘approximation’. See [27] for more details.

DWT, however, is shift variant, and the values ofwavelet coefficients may vary even with small shifts intime [28]. Therefore, we propose using the shift-invariant

11

Page 4: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

M Fastourechi et al

stationary wavelet transform (SWT) to detect the neurologicalphenomenon of interest. A SWT resolves the shift-variance problem associated with the DWT by eliminatingthe downsampling operator from the multi-resolution analysis[29]. We first describe the application of the SWT to extractfeatures from MRPs and then discuss feature extraction fromCPMR and CPBR.

Consider the set of all training data consisting of NIC

IC commands. Suppose that each of the training epochs isdecomposed using a wavelet function ψ(t). If the waveletcoefficients are to be used as features, the number of featuresbecomes

NFeatures = (NLevel + 1) × NSamples, (3)

where NFeatures, NLevel and NSamples denote the total number ofwavelet features per EEG signal, the number of decompositionlevels and the number of samples per epoch, respectively.It is apparent that the size of the feature space becomesvery large, even for a small number of EEG signals. Toreduce the dimensionality of the feature space, we proposeusing a matched filter. A linear matched filter is knownto be a simple yet useful tool for measuring the similaritiesbetween two sequences. Assuming that cj,k,p and dj,k,p are theapproximation and detail coefficients at scale j and translationk of the pth epoch in the training set of the IC commands, theaverages of the approximation and detail coefficients at scalej and translation k (cj,k , dj,k) are

cj,k = 1

NIC.

NIC∑p=1

cj,k,p (4)

dj,k = 1

NIC.

NIC∑p=1

dj,k,p. (5)

The approximation template at scale j (Template Cj ) andthe detail template at scale j (Template Dj ) are then obtainedusing the following formulae:

Template Cj = {cj,k}(j = 1, 2, . . . , NLevel,

k = 1, 2, . . . , NSamples) (6)

Template Dj = {dj,k}(j = 1, 2, . . . , NLevel,

k = 1, 2, . . . , NSamples). (7)

Let Cj,p = {cj,k,p} (j = 1, 2, . . . , NLevel, k = 1, 2, . . . ,

NSamples, and p = 1, 2, . . . , N) be the set of all approximationcoefficients at scale j of the pth epoch. The cross-covariancebetween Template Cj and Cj,p is then calculated as follows:

XCORj,p(n) = E((Template Cjm− µTemplate Cj

)

× (Cj,pm+n− µCj

)∗) (8)

where E is the expected value operator. After calculatingXCORj,p(n) for each epoch, the following features,representing the maximum of the cross-correlogram over aperiod of 0.125 s, are extracted [1]:

Fj,p = Maxn[XCORj,p(n)], where

× n ∈ [tstart + tfinish − 0.0625, tstart + tfinish + 0.0625], (9)

where (tfinish − tstart) is the length of the epoch, and tfinish andtstart show the start and finish of an epoch as discussed in detailin section 3. Figure 2 demonstrates an example of this feature

Figure 2. An example of how features are extracted using theproposed cross-covariance method.

extraction method, assuming that the length of the epoch andthe template are both tfinish − tstart = 2 s (the duration of thecross-covariance signal will thus be 4 s). The feature extractorconsiders a window of width 0.125 s around the middle pointof the cross-correlogram (i.e. at time t = tfinish − tstart = 2 s).This window covers from t1 = 0.125

2 = 0.0625 s before tot2 = 0.125

2 = 0.0625 after t = 2 s. The maximum valuein (9) is then calculated over this window with a width of0.125 s, because MRPs lie in frequencies below 4 Hz [19],and the sampling rate is 128 Hz. Features are then generatedby sliding a window over the EEG signal by shifts of 0.125 s.

Apart from the above features, the following features arealso extracted:

Tj,p = t (XCORj,p(n) = Fj,p) (10)

where t is the time operator. This feature provides informationabout the time instant when the maximum of the cross-correlogram occurs. Similar formulae can be obtained forthe detail coefficients as well as for the features extracted fromthe NC epochs. This process is repeated for all EEG channels.We select the features belonging to the coarsest approximationand detail levels. As a result, four MRP features are generatedfor each EEG channel.

For the CPMR and CPBR phenomena, all epochs areband-pass filtered before feature extraction. For CPMR, theband pass is chosen from 8 to 12 Hz, as recommended by otherBCI studies [30–32]. For CPBR, because of the relativelywide range of the Beta rhythms, a user-customized bandpass is chosen for each individual, as explained below. Bothfilters are linear phase 32-point FIR filters. The amplitudes ofthe bandpass-filtered signals are squared to obtain the powervalues. The SWT is then applied and the wavelet coefficientsof the power signals are calculated. The rest of the featureextraction process is similar to that used for MRPs and ityields four CPMR features and four CPBR features for eachEEG channel.

2.1.1. The choice of the proper wavelet function. In theanalysis of ERPs, the wavelet function is usually chosen solely

12

Page 5: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

A self-paced brain–computer interface system with a low false positive rate

based on the similarity between the neurological phenomenonand the shape of the wavelet function [22, 33, 34]. Thedownside of this approach is that the choice of wavelet functionmay become subjective. Moreover, it has been shown thatthe shape of the neurological phenomenon may vary fromone subject to another [35]. As a result, to achieve a betterperformance, this process needs to be carried out separatelyfor each subject. Even if a separate wavelet function ischosen for each subject, the use of a single wavelet functionfor all channels may not be optimal because the amount ofinformation varies from one EEG channel to another. Foreach subject, and for each neurological phenomenon in eachEEG channel, a Fisher ratio is defined, as follows [36]:

C(p, q, r, s) = (µIC(p, q, r, s) − µNC(p, q, r, s))2

σ 2IC(p, q, r, s) + σ 2

NC(p, q, r, s)

(p = 1, 2, 3; q = 1, 2, . . . , N; r = 1, 2, . . . , NFeatures;× s = 1, 2, . . . , NWavelet Functions), (11)

where µIC(p,q,r,s) and µNC(p,q,r,s) are the means andσ 2

IC(p, q, r, s) and σ 2NC(p, q, r, s) are the variances of the IC

and NC classes for features r of neurological phenomena p andfor channels q extracted using wavelet function s. For eachpair of channels q and neurological phenomena p, the waveletfunction that achieves the following objective is chosen forthat particular pair:

Maxr,s[C(p, q, r, s)] (p = 1, 2, 3, q = 1, 2, . . . , N).

(12)

The wavelet functions are selected from a pool of Daubechies,Biorthogonal, Symlet and Coiflet wavelet functions(46 wavelet functions in total). Some of these waveletfunctions were chosen because of their similarities withthe shape of neurological phenomena. As an example,in [23] Symlet wavelets were found to be suitable for theanalysis of the event-related desynchronization of the brainrhythms. Similarly, in [22], Daubechies wavelets were chosenfor the analysis of event-related potentials. Biorthogonalwavelets also carry resemblance to the shape of bipolar MRPs.However, as stated earlier, we used an automatic methodfor selecting the type of wavelet functions to minimize thesubjectivity in the choice of wavelet functions. Features arenormalized prior to the calculation of the Fisher ratios.

2.1.2. The choice of the proper CPBR frequency band.Because of the wide range of the Beta rhythms, and toselect more discriminant features for CPBR, equation (11) iscalculated for seven frequency bands. These frequency bandsare selected based on recommendations from papers in theliterature as follows:

– [14–18] Hz ([37, 38])– [18–22] Hz ([39, 40])– [22–26] Hz ([41])– [18–26] Hz ([42])– [22–30] ([43, 44])– [14–30] Hz ([45, 46]), as well as– [26–30] Hz frequency band.

Each frequency band is analyzed separately. The averagesof the Fisher ratios are compared, and the frequency band thatresults in the highest average is selected. The reason fordifferent Beta frequency bands were considered for this studywas to find subject-specific frequency bands that resulted inmore discriminant features (based on Fisher’s ratio). Pleasenote that although some of the frequency bands describedabove are covered by other frequencies (e.g., f1 = [22 − 26]is covered by f2 = [18 − 26]), this does not mean thatfeatures extracted from f2 are necessarily more discriminant.This is because if the feature extracted from frequency bandf3 = [18 − 22] does not provide discriminant information,adding f3 to f1 may even result in decreasing the amount ofdiscriminancy between the classes.

2.2. Feature classification

The features for each neurological phenomenon in an EEGchannel are classified as an IC or NC state using anSVM classifier. For each neurological phenomenon, theclassifiers’ outputs are combined using an MCS. Prior toclassification, outliers were removed as follows. Suppose theMahalanobis distance for a feature vector with K variables,x = [x1, x2, . . . , xK ] with an assumed central point µ =[µ1, µ2, . . . , µK ] is defined as

Mahal(x, µ) = (x − µ)−1∑

(x − µ)T , (13)

where∑

is the covariance matrix evaluated from the data.The outliers are then removed using the following algorithm[47, 48]:

(1) Round p. If there exists x such that Mahal(x, µ) > λ, letFS = {x|Mahal(x, µ) � λ}. Retain only the points in FS.The value of λ was chosen such that the training samplesthat were further than four standard deviations from themean, were considered as outliers [49].

(2) Repeat until the above condition is not met.

After applying this algorithm, the maximum percentage offeatures recognized as outliers was 3% for NC features and1% for IC features.

2.2.1. Support vector machines (SVMs). A total of 3N SVMclassifiers are used for each subject. Kernel-based learningcombines the beneficial properties of the linear classificationmethods, such as simplicity, but since the feature and inputspaces are nonlinearly related to the overall classification isnonlinear in the input space [50]. We used the LIBSVMsoftware for implementing the SVMs [51], and a Gaussiankernel as the kernel function. The classifier’s performancedepends on the regularization parameter C and the bandwidthσ of the kernel. Since there are 3N classifiers, 3N values hadto be estimated for each parameter. The output of each SVMis a binary label that indicates if the input pattern belongs toan IC or a NC class.

13

Page 6: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

M Fastourechi et al

2.2.2. Multiple classifier systems (MCSs). For eachneurological phenomenon, an MCS with a majority voting ruleclassifies the binary outputs of the SVMs (there are N SVMsfor each neurological phenomenon). In the case of ‘even’number of classifiers and if both classes have equal numberof votes, the more-frequent class (NC) is chosen as the labelfor the input pattern. The outputs of the three MCSs are thencombined using a second-stage MCS as shown in figure 1. ThisMCS can have five configurations for combining the outputs ofthe three MCSs as follows: (1) configuration 1 uses the ANDrule to combine the binary outputs of MCS1 and MCS2 relatedto MRP and CPBR, respectively. The default class is an NC(the logical state ‘0’), unless both MCS1 and MCS2 identifyan IC command (the logical state ‘1’); (2) configuration 2uses the AND rule to combine the binary outputs of MCS1and MCS3 that are related to MRP and CPMR, respectively;(3) for configuration 3, the AND rule is used to combine thebinary outputs of MCS2 and MCS3 related to MRP and CPBR,respectively; (4) for configuration 4, the outputs of all threeMCSs are combined according to the majority voting rule;(5) for configuration 5, the AND rule is used to combine theoutputs of all MCSs. The choice of the best configuration isdone by a HGA as explained in the next section.

2.3. Hybrid genetic algorithm (HGA)

A hybrid genetic algorithm (HGA) is designed so that it(1) selects the best features; (2) determines the values of theclassifiers’ parameters; and (3) selects the best of the fiveMCS configurations described in section 2.2.2. In applyinggenetic algorithms to optimize the performance of the system,each parameter of interest is first coded in the form of arandomly generated binary string. Each bit in this binarystring is called a gene. The concatenation of all the binarystrings forms a ‘chromosome’, and the set of ‘chromosomes’forms a ‘population’. Each chromosome is then evaluatedand a fitness value assigned. The chromosomes are thencombined using operators such as ‘selection’, ‘crossover’ and‘mutation’ to generate new chromosomes. The ‘selection’operator selects a proportion of the existing population to breeda new generation. The selected chromosomes are usuallythose with higher fitness compared to other chromosomes inthe population. After selection of the ‘fitter’ chromosomes,a pair of ‘parent’ chromosomes is selected for generatingthe ‘child’ chromosomes. A child chromosome is a newsolution that typically shares many of the characteristics ofits ‘parents’. The ‘crossover’ operator ensures that this is thecase by copying some of the genes of each parent to the child.The ‘mutation’ operator is used to maintain genetic diversityfrom one generation of a population to the next. This process isrepeated until a new population of chromosomes is generated.It is expected that the population evolves gradually and thatfitness improves over generations. This process is continueduntil some criteria for stopping the GA are met [52].

To represent each possible combination of features, abinary chromosome of length LChromosome is defined (see,figure 3(a)). Bit i of the first Nfeatures bits of the binarychromosome specifies whether or not feature i is selected by

(a)

(b)

Figure 3. (a) The structure of a chromosome; (b) representation ofthe parameter values for each SVM in a chromosome.

the HGA. A value of ‘1’ indicates the presence of feature i anda value of ‘0’ indicates its absence in the chromosome. Thesecond part of the chromosome is used to select the parametervalues of the classifiers. For each of the 3N SVM classifiers,two parameter values need to be determined: the regularizationparameter C and the bandwidth of the Gaussian kernel (σ ). Aportion of the chromosome with length 8 bits is used for thetwo-parameter values (see, figure 3(b)). The first four bits areused to represent the value of C and the second half is usedto represent the value of σ . Exponentially growing sequencesare used for C and σ , i.e., their values vary from 2−8 to 27.

For each chromosome, a local exhaustive search is thencarried out to find the best of the five configurations in thesecond-stage MCS. Suppose x denotes a model in figure 1.

In order to add a larger weight to solutions with lowerfalse activation rates, the objective function for the HGA isdefined as in equation (14):

f1 : Maxx

×[K × mean(TPR(x))

mean(FAR(x)) × (Var(TPR(x)) + Var(FAR(x)))

]

K =

⎧⎪⎪⎨⎪⎪⎩

1 if mean (TPR(x)) � T2

mean (TPR(x)) − T1

T2 − T1, if T1 � mean(TPR(x)) < T2

0, if mean (TPR(x)) < T1

,

(14)

where the false activation rate (FAR) is the percentage of NCepochs that are affected by one ore more false detections. Themain difference between the FAR and FPR rate is that multipleFPs in an epoch are counted as one FA. The values of T1 and T2

in equation (14) are selected as 50% and 80%, respectively, forall subjects except for subject AB3. Please note that currentlythere is no consensus amongst BCI researchers, as what isthe acceptable threshold for the performance of a BCI system(this is especially the case for SBCI systems). For this study,the value of T1 was chosen as 50%, for two reasons: first, wewanted to prevent the solutions with low TP rates that alsohad low FA rates to become dominant in the population. Anexample of such solution is the one with TPR = 20%, FAR =1%. In this example, although the FAR is very low, the TPR isalso low (corresponding to the successful identification of onlyone out of 5 IC commands). Second, we postulated that thevalue of T1 = 50% will be a reasonable minimum requirementfor the TP rate, as it corresponds to the identification of oneout of every two IC commands on average (please note that IC

14

Page 7: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

A self-paced brain–computer interface system with a low false positive rate

commands should be separated from the periods of NC). Anyconfiguration that resulted in the average TP rate of less thanT1 = 50% was then penalized with a zero fitness value. Thevalue of T2 = 80% was chosen, as it has been stated that (atleast for synchronized BCI systems) accuracies above 70% areconsidered to be acceptable [53, 54]. We thus chose the valueof T2 = 80% and none of the solutions whose performancesyield TP � 80% were penalized. The solution whose fitnesslies between these two extremes was penalized according tothe formula described by (14). For subject AB3, the valueof T1 resulted in the generation of chromosomes with high FAvalues. For this subject, the values of T1 and T2 were chosen as33% and 50%, respectively. The ‘mean’ operator was appliedover the inner-validation sets (see section 3).

We implemented a lexicographic approach for sorting thechromosomes in the HGA population [55]. In this approach,the chromosomes are compared and ranked according tothe values of f1(x) in (14). Any ties were resolved bycomparing the relevant chromosomes again with respect toanother objective. If there is also a tie, a third objectivefunction is used for comparison and so on. A total of sixobjective functions were used as follows (in the order ofpriorities):

f2 : Minx[mean(FAR(x)) × Var(FAR(x))] (15)

f3 : Maxx

[mean(TPR(x))

Var(TPR(x))

](16)

f4 : Minx[mean(FPR(x)) × Var(FPR(x))] (17)

f5 : Minx[(N(x))] (18)

f6 : Minx[(NFeatures(x))], (19)

where NFeatures is the number of features. The ‘mean’ operatoris applied over the results obtained from the inner-validationsets (see section 3).

The remaining operators of the HGA are tournament-based selection (tournament size = 3), uniform crossover(p = 0.9) and uniform mutation (p = 0.01). The sizes of theinitial population and the rest of the populations are chosen as200 and 100, respectively. The HGA is randomized initially.Elitism is used to keep the best-performing chromosome ofeach population in the subsequent populations. The numberof evaluations is set to 5000. If for more than 10 consecutivegenerations, the improvement in the first objective of thebest solution was found to be less than 1%, the algorithmis terminated.

3. Experimental results

In this section, the results of the experimental analysis ofthe data of four able-bodied subjects are presented and theresults are compared to those reported in previous EEG-basedBCI studies. A theoretical analysis of the performance of theproposed two-stage MCS is addressed in the appendix.

3.1. Data collection and evaluation

Data from three male and one female able-bodied subject(denoted as subjects AB1 to AB4) were used in this study.The subjects were right handed and between 31 and 56 yearsold. They had signed consent forms prior to participation inthe experiment.

IC data were collected as subjects performed a guidedright index finger flexion movement. At random intervals of5.6–7.0 s (mean of 6.7 s), a white circle of 2 cm diameterwas displayed on the subject’s monitor for 1/4 s, promptingthe subjects to perform a movement. In response to thiscue, the subject had to perform a right index finger flexion1 s after the cue appeared. The 1s delay was used to avoidvisual evoked potential (VEP) effects caused by the cue.This is the time that the subject is expected to attempt themovement, but this time may vary from one subject to anotherand from one movement attempt to another (see [56] for moredetails).

As mentioned in the introduction, a SBCI shoulddifferentiate between IC and NC epochs (in contrast tosynchronized BCI systems that need to differentiate differentIC commands from each other). For this reason, the data inNC sessions are also needed to represent the epochs for whichthe user did NOT intend to perform a control. During a NCsession, subjects were asked to count the number of timesthat a white ball bounced off the monitor’s screen. The NCsessions thus contained attentive as well as non-attentive NCdata. Each NC session lasted for approximately 2 min andduring each recording day, up to two such NC sessions wererecorded.

EEG signals were recorded from 13 monopolarelectrodes positioned over the F1,Fz,F2,FC3,FC1,FCz,CF2,FC4,C3,C1,Cz,C2 and C4 locations according to the International10–20 system. The cutoff frequency of the amplifier wasset at 30 Hz. An ocular artifact was detected when thedifference between the electro-oculugram (EOG) electrodes(placed at the corner of and below the right eye) exceeded±25 µV. This threshold was determined during data recordingand by carefully monitoring the recorded EOG activity duringthe calibration stage. It was chosen such that most of theprominent eye movement activities were captured (see [56] fordetails). All signals were sampled at 128 Hz and referencedto linked ear electrodes.

The recorded signals were converted to bipolar EEGsignals, since it has been shown that bipolar EEG signals mayresult in the generation of more discriminant wavelet featuresfor MRPs compared to the case where monopolar EEG signalsare used [19]. The conversion was carried out by calculatingthe difference between adjacent EEG channels, and resulted inthe following 18 bipolar EEG signals: F1–FC1, F1–Fz, F2–Fz,F2–FC2, FC3–FC1, FC3–C3, FC1–FCz, FC1–C1, FCz–FC2, C1–Cz, C2–C4, FC2–FC4, FC4–C4, FC2–C2, FCz–Cz, C3–C1, Cz–C2

and Fz–FCz. Table 1 shows the timetable of recording the datafor all subjects. For each subject, ‘Day 1’ was considered asthe origin date, and the dates when the rest of the data werecollected, were numbered relative to ‘day 1’.

An IC epoch consisted of data collected over an intervalcontaining the onset of movement (measured as the switch

15

Page 8: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

M Fastourechi et al

Table 1. The time schedule of recording the data and the numbers of artifact-free IC and NC epochs.

The schedule of data recording Number of artifact-free epochs

Total Total numberSubject 1st 2nd 3rd 4th 5th number 1-secondIDs session session session session session IC epochs NC epochs

AB1 1 3 5 8 10 70 194AB2 1 3 4 8 9 59 254AB3 1 2 4 8 9 48 182AB4 1 3 5 8 10 89 206

Figure 4. Method of calculating the TP rate; (a) EEG Signal;(b) output of the finger switch; (c) output of the SBCI.

activation) as long as no artifact was detected in that particularinterval. The interval started at tstart = −1 s, i.e. 1 s before theonset of movement, and ended at tfinish, i.e. 1 s after the onsetof movement. In this study, NC epochs were collected fromNC sessions only. The NC epochs were selected as follows: awindow of width (tfinish − tstart) s was slid over each EEG signalcollected during an NC session by a step of 16 time samples(0.1250 s), resulting in eight classifier decisions per second.For each 1 s window where artifacts were not detected, featureswere extracted. The last two columns of table 1 show the totalnumber of IC and 1s NC epochs that are not contaminated withartifacts.

The method of calculating the TP rate is shown infigure 4. In figure 4(a), a sample EEG signal and in figure 4(b)the output of the physical switch are shown. As stated earlier,data (from 1 s before to 1 s after a decision point) are usedfor classification. Assuming that the system has no processingdelay and the SBCI system has the ideal detection rate, theoutput of the SBCI system should be as demonstrated infigure 4(c). In other words, the IC command should be detected1 s after pressing the switch. Although, the exact timing ofthe switch activation is known, the neurological phenomenamay not be completely time-locked to the switch activation.As a result, we have also considered any activation in the timerange [−0.125, +0.125] s around the expected activation ofthe switch as a true positive (see, figure 4(c)). The rest ofactivations were treated as false positives.

3.2. Results

A five-level SWT decomposition resulted in the generationof wavelet coefficients in the following frequency bands:[32–64], [16–32], [8–16], [4–8], [2–4] and [0–2] Hz. For allneurological phenomena, the features were calculated for thelowest approximation and detail levels (which are attributedto the [0–2] and [2–4] Hz frequency bands, respectively). Forsubjects AB1–AB4, the selected CPBR frequency bands were[22–30], [14–30], [22–30] and [14–18] Hz, respectively (seesection 2.1.2 for details). Although the selected frequencybands resulted in more discriminant features compared tofeatures selected from other frequency bands, the results werenot necessarily significant (p > 0.05). This observation wasconsistent among all subjects.

We used a nested cross-validation to analyze theperformance of the SBCI. In a nested cross-validation, theinner cross-validation set is used for training the classifier andmodel selection (in our proposed design, the model selectionprocedure involves the selection of the best chromosome).The outer cross-validation set is used to test the performance.The data collected for all five sessions were combined togetherfirst. IC and NC epochs were then randomly selected and weredivided into five sets. For each outer cross-validation set, 20%of the data were used for testing and the rest were used fortraining. The training datasets were further divided into fivefolds. For each fold, 80% of the data were used for training theSVM and 20% were used for choosing the best chromosome.Since the size of the NC data is much larger than the size of theIC data, this results in an imbalanced classification problem.For this reason, during the training of the SVM classifier, theNC features were randomly sub-sampled so that the numberof training samples, for both IC and NC classes, remained thesame.

The test results are shown in table 2. The first rowshows the selected configuration for the two-stage MCS. Forsubjects AB1 and AB4, the combination of MRP and CPBRled to superior results (configuration 1), while for subjectsAB2 and AB3, the combination of all three neurologicalphenomena using the AND rule was the best configuration(configuration 5).

The next three rows in table 2 show the total numberof selected bipolar signals, the number of selected channel-neurological phenomenon combinations (please note that thereare three neurological phenomena and 18 bipolar EEG signals,resulting in a total of 18 × 3 = 54 EEG channel-neurologicalphenomena combinations) and the total number of selected

16

Page 9: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

A self-paced brain–computer interface system with a low false positive rate

Table 2. The performance results for the proposed SBCI system.

AB1 AB2 AB3 AB4

Selected configuration 1 5 5 1Average number of selected bipolar channels 18 18 18 18Average number of selected channel- 32.8 (1.8) 50.8 (2.2) 50.8 (1.5) 34.4 (1.5)neurological phenomenon combinationsAverage number of selected features 71.0 (5.7) 112.0 (3.8) 119.0 (8.3) 81.4 (5.3)Average TPR 58.6 (8.6) 64.2 (7.5) 46.9 (10.4) 55.1 (5.3)Average FAR 1.2 (1.1) 0.2 (0.3) 2.3 (1.4) 0.8 (0.8)Average FPR 0.1 (0.1) 0.0 (0.0) 0.3 (0.2) 0.1 (0.1)

Table 3. Comparison of the performance results. Please note that in the LFASDUser-Customized study, the TP rates were approximatelyestimated from the ROC curves.

Results obtained Results obtained Results obtained Results obtainedin SBCIFully-Automated in SBCISemi-Automated in LFASDUser-Customized in LFASDPathSubject

IDs TPR FPR TPRFPR TPR FPR TPR

FPR TPR FPR TPRFPR TPR FPR TPR

FPR

AB1 58.6 0.1 390.9 26.0 (9.5) 0.1 (0.1) 200 <10.0 0.2 <50.0 42.7 1.0 42.7AB2 64.2 0.0 3256.8 25.2 (5.7) 0.3 (0.1) 87.0 <10.0 0.2 <50.0 47.6 1.0 47.6AB3 46.9 0.3 167.5 36.7 (3.7) 1.1 (0.4) 33.30 <8.0 0.2 <40.0 45.8 1.0 45.8AB4 55.1 0.1 458.8 21.4 (6.2) 0.4 (0.4) 49.80 ? 0.2 ? 28.3 1.0 28.3Average 56.2 0.1 401.3 27.3 0.5 55.8 <10.0a 0.2 <50.0a 41.1 1.0 41.1

aAveraged on the data of three subjects.

features (averaged over the five outer validation sets). Thelast three rows show the average of the TP, FA and FP rateson the five outer cross-validation test sets. The numbersin parentheses show the standard deviations. These resultsindicate that the proposed SBCI achieves a very low FP rate ata modest TP rate. In the next section, we will show that theseresults are significantly better than those of previous relatedEEG-based SBCI systems.

4. Discussion and conclusions

It is theoretically possible to design a multiple classifiersystem such that very good classification accuracy can beobtained (see the appendix). This can be achieved even ifthe performance of individual classifiers is only slightly betterthan chance. To achieve high performance, the classifiers needto be diverse. In this paper, we explored the information fromthree neurological phenomena (movement-related potentialsand changes in the power of Mu and Beta rhythms) andlocations of EEG channels to create diverse classifiers. Ahybrid genetic algorithm (HGA) was designed to maximizethe performance under the computational constraints (i.e., timeand computational resources). The proposed design is denotedas SBCIFully-Automated, as in this design, features, parametervalues of classifiers and the method of classifier combinationhave all been automatically determined.

We showed that the proposed SBCI achieves low FP ratesat a modest TP rate. To our knowledge this is the first timethat such low FP rate has been reported for a modest TP ratein an EEG-based SBCI. This brings the design of a practicalEEG-based SBCI system with low false positive rate closer tothe reality.

It is, however, difficult to directly compare the result ofthis study with those of other SBCI studies. This is because

the recording protocol, the neurological phenomena used, thedecision rate and the evaluation methodology vary amongstdifferent studies. Furthermore, the method of labeling theoutput samples varies between different SBCI studies. Thisdifficulty in comparing SBCI systems has been discussed indetail in a technical report recently published by researchersfrom leading research laboratories in the field of SBCI systems[57].

We compared the results of SBCIFully-Automated with thoseof SBCISemi-Automated as reported in [1]. Both studies usesimilar experimental paradigms (we denote the latter designas SBCISemi-Automated as in this design, only feature extractionwas automated for each neurological phenomenon.). Theperformance of both studies is summarized in table 3. Atwo-way analysis of variance (ANOVA) using ‘subject’ and‘study’ as independent variables was carried out. The sampleswere TP and FP rates for different outer cross-validation folds(five samples per individual). The results of our analysis showsthat the TP rate increases from 27.1% to 56.2% (p < 10−5)in SBCIFully-Automated and the average FP rate decreases from0.5% to 0.1% (p < 10−5). These results indicate thatSBCIFully-Automated achieves a superior performance comparedto SBCISemi-Automated.

Compared to the SBCISemi-Automated, the performance ofthe SBCI was improved because of

(1) Automation of the design. In [1], the classifier parametervalues and the structure of the 2nd-stage classifier werenot automatically determined. The proposed method inthis paper achieves full automation by reformulating thechromosomes and incorporating these parameters in thestructure of each chromosome. We have proposed ahybrid GA for this automation process.

17

Page 10: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

M Fastourechi et al

(2) New feature extraction method. We have proposed a newfeature extraction method that applies a stationary wavelettransform (SWT) as a pre-processing stage and matchedfiltering (MF) for the final feature extraction stage. Wealso proposed a criterion for the automatic selection of thewavelet function (which is usually done subjectively bythe designer).

However, this improvement comes at the expense ofincreased system complexity.

The LF-ASD is another state-of-the-art SBCI, previouslydeveloped by the brain interface laboratory of the Neil Squiresociety [19]. During the past few years, different variationsof the LF-ASD have been proposed by the members of ourresearch group [3, 35, 58] as well as by others [59]. The LF-ASD uses features extracted from six bipolar EEG channels todistinguish an IC command (if present) from the backgroundNC states. In table 3, we also compare our results with thoseof two of the latest variations of the LF-ASD. These studiesused the same experimental paradigm and the same datasets.In the first variation (denoted as LFASDUser-Customized), theeffects of user-customization of the system’s parameter valuesby an expert are studied [35]. In the second (denoted asLFASDPath), the knowledge of the path of features is usedto improve the performance [3]. Both papers focused onimproving the TP rate at a fixed FP rate. Please note thatin the LFASDUser-Customized study, the TP rates were estimatedfrom the ROC curves and thus only approximate values canbe derived. For one participant (AB4), the ROC curves werenot plotted in [35] and thus we could not estimate the TP rateat FPR = 0.2%.

As shown in table 3, the TP rates of LFASDUser-Customized

drop below 10% for FP rates equal to or below 0.2%. Thevalues of TP rates at 0.2% were estimated from the receiver-operating characteristic (ROC) curves plotted in [35]. Sincethe ROC curves for able-bodied subjects were not available in[3], we used the reported TP results for FP = 1%. The resultsin table 3 show the improvement achieved in terms of TP

FP rates.As seen from this table, for low FP rates, the SBCIFully-Automated

achieves higher TP rates than both the LFASDUser-Customized

and LFASDPath. A t-test between the performance obtained bySBCIFully-Automated and those achieved by LFASDPath shows thatthe TP rates in SBCIFully-Automated are higher (p < 0.02), whilethe FP rates are lower (p = 0). The same comparison with theresults obtained for FP = 0.2% in LFASDUser-Customized showsa highly significant improvements in the TP rates (p < 10−5),while the decrease in FP rates is not statistically significant(p = 0.16).

The results in table 3 also show that SBCIFully-Automated hasan average of 1.2 FPs every 100 s. The original design ofthe LF-ASD had an average of one FP every 6 s [58] and theimproved design had an average of one FP for every 12 s [3]).Thus, SBCIFully-Automated is able to recognize a longer period ofthe NC state without having a false positive.

Although the results in table 3 show that SBCIFully-Automated

achieved a superior performance compared to the rest of SBCIsexamined in this study, the results also indicate a great dealof inter-subject variability in terms of performance. As anexample, the TP and FP rates for subject AB2 were 64.2%

and 0.0%, respectively, while the values obtained for subjectAB3 were TP = 46.9% and 0.3%, respectively. One reasonthat can be stated for this is the variability of the quality of theneurological phenomenon from one subject to another. Forexample, when the IC epochs of subject AB2 were averaged,very distinct MRP patterns emerged, however, for subjectAB3, the MRP patterns were less pronounced (see [35] formore discussion on the variability of MRPs amongst differentindividuals). An interesting area that needs further explorationis to see how the qualities of the neurological phenomena willimprove after subjects get more training. This is left to futurestudies.

One concern in BCI studies is the effect of artifacts onthe performance of the system. Particularly, systems thatuse slow potentials such as MRPs, may be vulnerable to thepresence of eye movement artifacts. One advantage of ourproposed system is that it uses three neurological phenomenaeach belonging to a different frequency band. While eyemovements are mostly low frequency components that mayaffect MRPs, their effect on the changes in the power of Muand Beta rhythms is much less significant. Since our systemdepends on observing movement-related patterns in more thanone neurological phenomenon when detecting an IC pattern, itis thus more robust to the presence of artifacts. Nevertheless,when detecting EOG artifacts using a thresholding scheme,smaller EOG artifacts may not be detected. Thus in our futurestudies, we plan to explore the use of more sophisticatedartifact-removal methods such as independent componentanalysis (ICA) to improve the artifact-monitoring system.

Another research area that needs further attention isthe choice of a suitable evaluation metric for SBCIs. Theevaluation of the performance of any SBCI system greatlydepends on the evaluation metric used. Currently there is noconsensus amongst BCI researchers as to which performancemetric summarizes the performance of a given SBCI moreefficiently [57]. As an example, although in a number of SBCIpapers, the receiver-operating characteristic (ROC) curve andthe area under ROC (AUC) have been used for evaluating theperformance, the suitability of this metric in the field of SBCIwas recently questioned. This is because when the ROC curveis plotted over the whole range of the (TPR, FPR) domain,the solution looks like a perfect answer, which is usually notthe case [57]. Future research in this area can result in thegeneration and selection of more suitable cost functions thatguide the model search procedure more efficiently.

Table 2 also shows that in SBCIFully-Automated, almost allbipolar EEG channels are selected. The use of fewer EEGchannels is preferable, since it reduces the complicity of thefeature space and may also speed up the setup of the datarecording. Future work explores decreasing the number ofbipolar EEG channels used by the SBCIFully-Automated.

Scale-up is another important issue that is part of ourfuture research. Scaling can be done in two stages. In stage 1,the system detects IC commands (using the proposed method).In the second stage, a second detector differentiates differentIC patterns (e.g., related to left/right movements) from eachother. This approach has been successfully implemented byanother research group [60].

18

Page 11: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

A self-paced brain–computer interface system with a low false positive rate

This study is based on the use of executed movements.Future studies investigate the performance of the proposedSBCI system using the data of individuals with motordisabilities (attempted movements).

An important future study is the online testing of thesystem. So far only couple SBCI studies have been conductedunder specific conditions in an online fashion [60, 61]. Themain reason can be attributed to the high FP rates in SBCIsystems. Since our proposed SBCI system has resulted inmuch lower false positives compared to other EEG-basedSBCI systems, future research should focus on the onlinetesting of the performance of the system. Because neurologicalphenomena may vary over time, methods for adapting thesystem need to be developed. These methods can be dividedinto two parts: the adaptation of the classifier and theadaptation of the parameter values of the system. To adaptthe SVM classifiers, data recorded during an online testingof the system can be used to re-train the classifiers offline.Since the training data need to be marked, reporting methodssuch as sip and puff switch can be used to mark new ICand NC epochs [61]. Tuning of the system’s model (i.e.,adaptation of the parameter values) can be done using localsearch algorithms. Both adaptation procedures can be doneoffline on a separate computer and the ‘updated’ SBCI systemcan then replace the existing system. Investigating thesemethods is left for future studies.

Acknowledgments

This work was supported in part by NSERC under Grant90278-06 and CIHR under Grant MOP-72711. The authorsalso would like to thank Mr. Craig Wilson for his valuablecomments on this paper. This research has been enabled by theuse of WestGrid computing resources, which are funded in partby the Canada Foundation for Innovation, Alberta Innovationand Science, BC Advanced Education, and the participatingresearch institutions.

Appendix. Theoretical analysis of the proposedSBCI

The majority of studies that theoretically analyzed thecombinations of classifiers have made some assumption aboutthe independence of classifiers (for more details, see [62–64]).In this appendix, the performance of the proposed SBCIis analyzed theoretically using the framework developed in[64, 65]. This theoretical framework applies linearprogramming to determine the lower and higher bounds ofperformance of an MCS, but it does not make any assumptionabout the independence of classifiers. For simplicity, wefocus on the upper and lower bounds of the fitness functionformulated as a TP

FP ratio (instead of the more complex functiondefined in (14)). To obtain these bounds, the maximum andminimum of the TP and FP rates of the two-stage MCS aredetermined by linear programming.

3x

1x2x

7x

5x 6x

4x

0x

MCS2

MCS1

MCS3

Figure A1. The Venn diagram for three MCSs.

A.1. Formulating the problem

Let us denote the first-stage MCSs {MCS1,

MCS2, . . . , MCSK}, where K is the number of MCSsin the first stage (in the proposed SBCI, M is 2 or 3). Tocalculate the maximum and minimum of the TP

FP ratio ofthe SBCI system, we represent the classification labelsgenerated by all K classifiers with a binary string. Letbit(j,K) be such a bit string that denotes the K bit binaryexpansion of j. Each classifier is represented by a bit inbit(j,K). A value of ‘0’ indicates that the classifier didnot correctly classify an IC command (a FN) and a valueof ‘1’ indicates that the classifier correctly identified an ICcommand (a TP). We use the convention that if there areK classifiers MCS1, MCS2, . . . , MCSK , MCS1 is the leastsignificant bit (LSB) and MCSK is the most significant bit(MSB). Let x = [x0, x1, . . . , x(2K−1)]T be the vector of jointprobabilities of the correct detection of an IC command.Since for K classifiers, there are 2K possible combinationof correct/incorrect classifiers, vector x will be of length2K . This combination of the classifier can be shown using aVenn diagram, as in figure A1. In this figure, x0 shows thepercentage of IC commands that all MCSs failed to correctlyidentify, and x1 shows the percentage of IC commands that thefirst classifier (MCS1) correctly identified but that the rest ofthe classifiers (MCS2 and MCS3) could not correctly identify,and so on.

A.2. Constraints

Let TPSBCI(x) represent the probability of the correctclassification of the IC commands in the proposed SBCI. Wewish to find the values of x that yield the maximum and theminimum of TPSBCI(x). The constraints of this optimizationproblem are as follows:

(1) The values xi are non-negative and are smaller than 1:

0 � xi � 1, i = 0, 1, . . . , (2K − 1) (A.1)

(2) The sum of the joint probabilities is 1:2K−1∑i=0

xi = 1. (A.2)

19

Page 12: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

M Fastourechi et al

(3) The sum of the joint probabilities for which classifier rcould correctly identify an IC command must equal pr, thenormalized TP rate of the classifier r. Or, mathematically,

Aeq x = d (A.3)

where d is the vector of the normalized TP rates of theclassifier, as represented below:

d = [p1, p2, . . . , pK ]T (A.4)

pj = TPj

NICj = 1, 2, . . . , K. (A.5)

NIC is the number of IC commands and TPj is the TP rateof the jth classifier, and Aeq is a K × 2k matrix, whoserth row corresponds to the rth classifier. Aeq is defined asfollows:

Aeq = [b1, b2, . . . , bK ]T (A.6)

where b1, b2, . . . , bK are bit strings of length 2K and canbe calculated as follows:

b1 = [01, . . . , 01]T

b2 = [0011, . . . , 0011]T

...

bK = [

2(K−1)︷ ︸︸ ︷00 . . . 00,

2(K−1)︷ ︸︸ ︷11 . . . 11]T

. (A.7)

A.3. Objective functions

Let fTP(SBCI)(i) denote the entry at ith position in TPSBCI(x).We can then define the following fitness function for the two-stage MCS (configurations 1–3 and 5 in section 4) as follows:

fTP(SBI)(i) =

⎧⎪⎨⎪⎩

1, if N1 = K

0, if N1 = 0

PN0IC , otherwise

, (A.8)

where N1 is the number of ones in bit(i,K) and N0 is the numberof zeros in bit(i,K) and pIC is the probability of the IC state,calculated as follows:

PIC = NIC

NTotal, (A.9)

where NIC is the number of IC epochs and NTotal is the totalnumber of epochs. NTotal is calculated as follows:

NTotal = NIC + NNC. (A.10)

Equation (A.8) implies that only when all the classifiersparticipating in the two-stage MCS correctly identify an ICcommand, the output of the two-stage MCS will be ‘1’. Ifall of them fail to recognize an IC command, the outputis zero. In other cases, the decision is made based on theprobability of the IC state. It can be seen that as N0 increases,the SBCI has a higher probability of generating an FN. WhenpIC is sufficiently small (e.g., pIC < .01), and N0 is sufficientlylarge, p

N0IC ≈ 0, and fTP(SBCI) will take a form of an AND

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

TN (Individual MCS)(%)

TN

(2-s

tage

MC

S) (

%)

TNMin(N=2,3)

TNMax(N=2)

TNMax(N=3)

(a)

(b)

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

TP(Individual MCS)(%)

TP

(2-

stag

e M

CS

)(%

)

TPMax(N=2,3)

TPMin(N=2)

TPMin(N=3)

Figure A2. (a) TNMax and TNMin for two and three MCSs; (b) TPMax

and TPMin for two and three MCSs.

operator, as described in section 2.2.2., For configuration 4 insection 2.2.2, (A.8) becomes

fTP(SBCI)(i) =⎧⎨⎩1, if no. of 1s in bit(i,K) >

K

20, Otherwise.

(A.11)

Similarly, by applying the following replacements inequations (A.1)–(A.11), the optimization problem for the NCstate can be formulated:⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

x → y

PIC → PNC

TP → TN

NIC → NNC

NNC → NIC

. (A.12)

As can be seen, the formulation is the same as that for theTP rate. The main difference is in the formulation of (A.8).Here, because of the high probability of pNC (e.g.,pNC > 0.99),

20

Page 13: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

A self-paced brain–computer interface system with a low false positive rate

020

4060

80100 0

2040

6080

1000

20

40

60

80

100

(a)

(b)

(c)

0

50

100 0

20

40

60

80

1000

20

40

60

80

100

TP (%)FP (%)

TP

/FP

TP (%)FP (%)

TP

/FP

TP (%)FP (%)

TP

/FP

0

50

100 0

50

1000

1000

2000

3000

4000

5000

6000

Figure A3. (a) TPFP for an individual MCS; (b) Min

(TPFP

)of the

two-stage MCS (for two MCSs in the first stage); (c) Max(

TPFP

)of

the two-stage MCS (for two MCSs in the first stage).

the number of elements closer to ‘1’ grows compared to theoptimization problem for the TP rate. Similar formulations foreach of the MCSs in the first stage can be developed. The onlydifference will be in the case when the number of classifiersis even. In this case, the function f for the case of TP isformulated as

feven(i) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

1, if N1 >K

2

PN0IC , if N0 = K

20, otherwise

. (A.13)

In the case of a tie and for a relatively large N0, the probabilityof correct identification of an IC command will be close tozero. The opposite case is true for NC trials.

A.4. Results

Figure A2 depicts the TNMax and TNMin values of the SBCIas functions of the TN rates of the individual MCSs in thefirst stage (for simplicity, it is assumed that all individualMCSs have the same TN rates). The optimal values arefound by maximizing and minimizing the TP and TN valuesusing linear programming. The value of PNC is estimatedfrom the experimental protocol (described in section 3) to bePNC ≈ 0.99. Figure A2(a) shows that even for MCSs with highFP rates (e.g., 20%<FP<50%), it is theoretically possible thatthe proposed SBCI will achieve low FP rates. For FP rates<10%, the FP rate of the proposed SBCI can theoreticallyapproach zero.

Figure A2(b) shows the TPMax and TPMin values. It isassumed that all individual MCSs have the same TP rate.As figure A3(a) shows, the theoretically low FP rate of theSBCI comes at the expense of a lower TP rate. Assuming allMCSs have the same TP and FP rates, the TP

FP of the SBCIcan be demonstrated graphically. Figure A4 shows the TP

FP ofan individual MCS. Figures A3(b) and (c) show the min

(TPFP

)and max

(TPFP

)of the SBCI, respectively for two MCSs in the

first stage. Similar figures were obtained for three MCSs inthe first stage. As figure A3(c) shows the proposed SBCI cantheoretically have a much higher TP

FP than that of an individualMCS (figure A3(a)).

References

[1] Fatourechi M, Birch G E and Ward R K 2007 A self-pacedbrain interface system that uses movement related potentialsand changes in the power of brain rhythms J. Comput.Neurosci. 23 21–37

[2] Mason S G and Birch G E 2005 Temporal control paradigmsfor direct brain interfaces—rethinking the definition ofasynchronous and synchronous Proc. HCI Conference (LasVegas)

[3] Bashashati A, Mason S, Ward R K and Birch G E 2006 Animproved asynchronous brain interface: making use of thetemporal history of the LF-ASD feature vectors J. Neural.Eng. 3 87–94

[4] Mason S G, Bashashati A, Fatourechi M, Navarro K F andBirch G E 2007 A comprehensive survey of brain interfacetechnology designs Ann. Biomed. Eng. 35 137–69

[5] Deecke L, Grozinger B and Kornhuber H H 1976 Voluntaryfinger movement in man: cerebral potentials and theoryBiol. Cybern. 23 99–119

[6] Hallett M 1994 Movement-related cortical potentialsElectromyogr. Clin. Neurophysiol. 34 5–13

[7] Babiloni C et al 1999 Human movement-related potentials vsdesynchronization of EEG alpha rhythm: a high-resolutionEEG study Neuroimage 10 658–65

21

Page 14: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

M Fastourechi et al

[8] Pfurtscheller G and Aranibar A 1977 Event-related corticaldesynchronization detected by power measurements ofscalp EEG Electroencephalogr. Clin. Neurophysiol.42 817–26

[9] Leocani L, Toro C, Manganotti P, Zhuang P and Hallett M1997 Event-related coherence and event-relateddesynchronization/synchronization in the 10 Hz and 20 HzEEG during self-paced movements Electroencephalogr.Clin. Neurophysiol. 104 199–206

[10] Pfurtscheller G and Lopes da Silva F H 1999 Event-relatedEEG/MEG synchronization and desynchronization: basicprinciples Clin. Neurophysiol. 110 1842–57

[11] Toro C, Deuschl G, Thatcher R, Sato S, Kufta C and Hallett M1994 Event-related desynchronization and movement-related cortical potentials on the ECoG and EEGElectroencephalogr. Clin. Neurophysiol. 93 380–9

[12] Arroyo S, Lesser R P, Gordon B, Uematsu S, Jackson D andWebber R 1993 Functional significance of the mu rhythm ofhuman cortex: an electrophysiologic study with subduralelectrodes Electroencephalogr. Clin. Neurophysiol.87 76–87

[13] Narici L, Pizzella V, Romani G L, Torrioli G, Traversa R andRossini P M 1990 Evoked alpha- and mu-rhythm inhumans: a neuromagnetic study Brain Res.520 222–31

[14] Feige B, Kristeva-Feige R, Rossi S, Pizzella V andRossini P M 1996 Neuromagnetic study ofmovement-related changes in rhythmic brain activity BrainRes. 734 252–60

[15] Pfurtscheller G 1981 Central beta rhythm during sensorimotoractivities in man Electroencephalogr. Clin. Neurophysiol.51 253–64

[16] Szurhaj W, Derambure P, Labyt E, Cassim F, Bourriez J L,Isnard J, Guieu J D and Mauguiere F 2003 Basicmechanisms of central rhythms reactivity to preparation andexecution of a voluntary movement: astereoelectroencephalographic study Clin. Neurophysiol.114 107–19

[17] Fatourechi M, Birch G E and Ward R K 2007 Proc. IEEEICASSP’07 (Hawaii, USA, April 2007) pp 1157–60

[18] Yoon H, Yang K and Shahabi C 2005 Feature subset selectionand feature ranking for multivariate time series IEEE Trans.Knowl. Data Eng. 17 1186–98

[19] Mason S G and Birch G E 2000 A brain-controlled switch forasynchronous control applications IEEE Trans. Biomed.Eng. 47 1297–307

[20] Demiralp T, Yordanova J, Kolev V, Ademoglu A, Devrim Mand Samar V J 1999 Time-frequency analysis ofsingle-sweep event-related potentials by means of fastwavelet transform Brain Lang. 66 129–45

[21] Samar V J, Bopardikar A, Rao R and Swartz K 1999 Waveletanalysis of neuroelectric waveforms: a conceptual tutorialBrain Lang. 66 7–60

[22] Hinterberger T, Kubler A, Kaiser J, Neumann N andBirbaumer N 2003 A brain–computer interface (BCI) forthe locked-in: comparison of different EEG classificationsfor the thought translation device Clin. Neurophysiol.114 416–25

[23] Graimann B, Huggins J E, Levine S P and Pfurtscheller G2004 Toward a direct brain interface based on humansubdural recordings and wavelet-packet analysis IEEETrans. Biomed. Eng. 51 954–62

[24] Glassman E L 2005 A wavelet-like filter based on neuronaction potentials for analysis of human scalpelectroencephalographs IEEE Trans. Biomed. Eng.52 1851–62

[25] Fukuda S, Tatsumi D, Tsujimoto H and Inokuchi S 1998 Proc.IEEE EMBS Conf. Engineering (Hong Kong, Oct./Nov.1998) vol 3 pp 1458–60

[26] Jansen B H, Allam A, Kota P, Lachance K, Osho A andSundaresan K 2004 An exploratory study of factorsaffecting single trial P300 detection IEEE Trans. Biomed.Eng. 51 975–8

[27] Mallat S G 1989 Multifrequency channel decompositions ofimages and wavelet models IEEE Trans. ASSP 37 2091–106

[28] Bradley A P and Wilson W J 2004 On wavelet analysis ofauditory evoked potentials Clin. Neurophysiol.115 1114–28

[29] Nason G P and Silverman B W 1995 The stationary wavelettransform and some statistical applications Wavelets Stat.103 281–99

[30] Wolpaw J R, McFarland D J, Vaughan T M and Schalk G 2003The Wadsworth Center brain-computer interface (BCI)research and development program IEEE Trans. NeuralSyst. Rehab. Eng. 11 204–7

[31] Blankertz B et al 2004 The BCI Competition 2003: progressand perspectives in detection and discrimination of EEGsingle trials IEEE Trans. Biomed. Eng. 51 1044–51

[32] Roberts S J, Penny W and Rezek I 2003 temporal and spatialcomplexity measures for EEG-based Brain ComputerInterfacing Med. Biol. Eng. Comput. 37 93–9

[33] Citi L, Poli R, Cinel C and Sepulveda F 2004 Proc.GECCO2004 (Seattle, USA, June 2004)

[34] Graimann B, Huggins J E, Levine S P and Pfurtscheller G2003 Proc. IEEE EMBS Conference on Neural Engineering(Capri Island, Italy, March 2003) pp 614–7

[35] Bashashati A, Fatourechi M, Ward R K and Birch G E 2006User customization of the feature generator of anasynchronous brain interface Ann. Biomed. Eng.34 1051–60

[36] Duda R O and Hart P E 1973 Pattern Classification and SceneAnalysis (New York: Wiley)

[37] Egner T, Zech T F and Gruzelier J H 2004 The effects ofneurofeedback training on the spectral topography of theelectroencephalogram Clin. Neurophysiol. 115 2452–60

[38] Shr-Da Wu P C L and Wu Y C 2004 Proc IEEE Int. Conf.Networking, Sensing and Control (Taipei, Taiwan, March2004) pp 825–8

[39] Hill N J, Lal T N, Schroder M, Hinterberger T, Widman G,Elger C E, Scholkopf B and Birbaumer 2006 ClassifyingEvent-Related Desynchronization in EEG, ECoG and MEGSignals ed G Dornhege, J R Millan, T Hinterberger,D J McFarland and K R Muller pp 404–13 (Lecture Notes inComputer Science vol 4174) (Cambridge, MA: MIT Press)

[40] Lal T N, Schroder M, Hill N J, Preissl H, Hinterberger T,Mellinger J, Bogdan M, Rosenstiel W, Hofmann T andBirbaumer N 2005 Proc ACM Int. Conf. Machine learning(Bonn,Germany, 2005) pp 465–72

[41] Li Y, Guan C and Qin J 2005 Enhancing feature extractionwith sparse component analysis for brain-computerinterface Proc. IEEE EMBS Conf. (Shanghai, China,September 2005) vol 5 pp 5335–8

[42] McFarland D J, Miner L A, Vaughan T M and Wolpaw J R2000 Mu and beta rhythm topographies during motorimagery and actual movements Brain Topogr.12 177–86

[43] Ramoser H, Muller-Gerking J and Pfurtscheller G 2000Optimal spatial filtering of single trial EEG during imaginedhand movement IEEE Trans. Rehabil. Eng. 8 441–6

[44] Foffani G and Croci S 2003 IEEE EMBS Conf. (Cancun,Mexico, September 2003) pp 2292–4

[45] Wang T, Deng J and He B 2004 Classifying EEG-based motorimagery tasks by means of time-frequency synthesizedspatial patterns Clin. Neurophysiol. 115 2744–53

[46] Slobounov S and Sebastianelli W 2006 Introductory ChapterConcussion in Athletics: Ongoing Controversy (Berlin:Springer) pp 1–16

22

Page 15: A self-paced brain--computer interface system with a low ...ipl.ece.ubc.ca/bci_files/J3.pdf · in the power of Mu rhythms and changes in the ... two objective functions: (1) the true

A self-paced brain–computer interface system with a low false positive rate

[47] Blum A, Frieze A M, Kannan R and Vempala S 1998 Apolynomial-time algorithm for learning noisy linearthreshold functions Algorithmica 22 35–52

[48] Dunagan J and Vempala S 2004 Optimal outlier removal inhigh-dimensional spaces J. Comput. Syst. Sci. 68 335–73

[49] Prastawa M, Bullitt E, Ho S and Gerig G 2004 A brain tumorsegmentation framework based on outlier detection Med.Image Anal. 8 275–83

[50] Muller K R, Anderson C W and Birch G E 2003 Linear andnonlinear methods for brain-computer interfaces IEEETrans. Neural Syst. Rehab. Eng. 11 165–9

[51] Chang C and Lin C 2001 LIBSVM: a library for support vectormachines software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm

[52] Goldberg D E 1989 Genetic Algorithms in Search,Optimization and Machine Learning (Reading, MA:Addison-Wesley) p 1989

[53] Sellers E W, Kubler A and Donchin E 2006 Brain-computerinterface research at the University of South FloridaCognitive Psychophysiology Laboratory: the P300 SpellerIEEE Trans. Neural Syst. Rehabil. Eng. 14 221–4

[54] Kubler A, Mushahwar V K, Hochberg L R and Donoghue J P2006 BCI Meeting 2005-Workshop on Clinical Issues andApplications IEEE Trans. Neural Syst. Rehabil. Eng.14 131–4

[55] Back T, Fogel D B and Michalewicz T 2000 EvolutionaryComputation (Bristol: Institute of Physics Publishing)

[56] Birch G E, Bozorgzadeh Z and Mason S G 2002 Initial on-lineevaluations of the LF-ASD brain-computer interface withable-bodied and spinal-cord subjects using imaginedvoluntary motor potentials IEEE Trans. Neural Syst.Rehabil. Eng. 10 219–24

[57] Mason S G, Kronegg J, Huggins J, Fatourechi M andSchloegl A 2006 Evaluating the performance of self-pacedBCI technology Tech Report (Available Online at http://www.bci-info.tugraz.at/Research Info/documents/articles/self paced tech report-2006–05-19.pdf )

[58] Borisoff J F, Mason S G, Bashashati A and Birch G E 2004Brain-computer interface design for asynchronous controlapplications: improvements to the LF-ASD asynchronousbrain switch IEEE Trans. Biomed. Eng. 51 985–92

[59] Yom-Tov E and Inbar G F 2003 Detection of movement-related potentials from the electro-encephalogram forpossible use in a brain-computer interface Med. Biol. Eng.Comput. 41 85–93

[60] Scherer R, Muller G R, Neuper C, Graimann B andPfurtscheller G 2004 An asynchronously controlledEEG-based virtual keyboard: improvement of the spellingrate IEEE Trans. Biomed. Eng. 51 979–84

[61] Bozorgzadeh Z, Birch G E and Mason S G 2000 Proc. IEEEICASSP 2000 (Istanbul, Turkey, June 2000) pp 2385–8

[62] Kittler J, Hatef M, Duin R P W and Matas J 1998 Oncombining classifiers IEEE Trans. Pattern Anal. Mach.Intell. 20 226–39

[63] Lam L and Suen S Y 1997 Application of majority voting topattern recognition: an analysis of its behavior andperformance IEEE Trans. Syst. Man Cybern. 27 553–68

[64] Narasimhamurthy A 2005 Theoretical bounds of majorityvoting performance for a binary classification problemIEEE Trans. Pattern Anal. Mach. Intell. 27 1988–95

[65] Demrekler M and Altincay H 2002 Plurality voting-basedmultiple classifier systems: statistically independent withrespect to dependent classifier sets Pattern Recognit.35 2365–79

23