Using Deep Learning To Classify Emotions in
Tibetan Monks
Bachelor’s Project Thesis
Pim van der Loos, [email protected],
Supervisors: Dr M.K. Van Vugt & P. Kaushik
Abstract: Monastic debate is a form of practice that plays an important role in the trainingof Tibetan Buddhist monks. The goal of these debates is to deepen the monk’s understandingof their study materials and the world. During these debates, many emotions such as anger andhappiness arise, which makes this a good environment to study naturally occurring emotions.For this study, the electroencephalography (EEG) data for both participants and the videos of46 debates were recorded. After manually annotating the debates for anger and happiness, twodeep learning algorithms were used to classify the emotions of happiness and anger from the EEGdata in a subject-independent approach. A long short-term memory (LSTM) and a 1-dimensionalconvolutional neural network (1D CNN) were used. The LSTM achieved the highest accuracyat 91.0%, with the 1D CNN following at 88.8%. These findings show that deep learning can beused to create a robust classifier for anger and happiness.
1 Introduction
Emotions play an important role in human life, forexample in bonding with other people and strength-ening social ties (Rime, 2009). When looking specif-ically at positive versus negative emotions, Ginoand Schweitzer (2008) have shown that these emo-tions play a role in taking advice from other people.They found that negative emotions such as angermade the participants less likely to take followuseful advice than a positive emotion (gratitude),which resulted in lower performance for those ig-noring the advice.
However, despite the big role emotions play inour lives, not much is known about emotions inreal-life situations, as described by Trampe et al.(2015). In their study, aimed at discovering whatemotions we experience in everyday life, they useda phone application that asked participants to fillin a short questionnaire to monitor their emotionsat random times throughout the day. In total, overhalf a million questionnaires were completed. Intheir research, they found that the participantsexperienced positive emotional states such as joy,love, and gratitude 2.5 times more than negativeemotional states such as anger, anxiety, and sad-
ness. However, they suspect that certain emotionsmight be under- or over-represented in their resultsbecause the decision to respond to the survey wouldbe influenced by that very emotion.
One way of overcoming this limitation could beto decode the emotions experienced by the partic-ipants using deep learning and wearable EEG de-vices such as some described by Casson (2019). Thismethod would also have the advantage of being acontinuous stream of data, rather than unevenlydistributed data points.
Matiko et al. (2015) found promising resultsin classifying emotions using a wearable, solar-powered EEG headband. Their fuzzy logic-basedemotion classifier reached an accuracy of 90%(±9%) when classifying positive versus negativeemotions in a subject-dependent approach.
One thing almost all research into emotion clas-sification using EEG data has in common is thatit uses artificially induced emotions. Matiko et al.(2015), for example, used randomly selected se-quences of 24 images from the Geneva affectivepicture database (Dan-Glauser and Scherer, 2011).These images are supposed to elicit a specific emo-tional response. For instance, images of snakes aresupposed to cause a negative emotional response,
1
while images of human babies should result in apositive one. Lin et al. (2014), on the other hand,used music to induce the desired emotions. Af-ter the experiment, the participants had to de-scribe their emotions in terms of valence (positivevs negative) and arousal (high vs low). Using anSVM, they obtained a valence classification accu-racy of 61.09% in a subject-independent approach.Subject-independent here means that their modeldid not account for individual variability betweenparticipants. This has an advantage over takinga subject-dependent approach because you do nothave to train a classifier for every participant, but itusually results in lower accuracy because the clas-sifier cannot account for the individual variability.
Because artificially induced emotions may notbe the same as naturally occurring emotions, theclassification of naturally occurring emotions willbe explored in this paper. I expect the generaliz-ability of classifiers trained on naturally occurringemotions to be higher than the accuracy in otherstudies that used emotions evoked in a laboratorysetting, as I think that letting people get angry ateach other or have fun together naturally, results inmore ’real’ and strong emotions.
The dataset used in this paper was obtained dur-ing the real-life setting of monastic debate as prac-ticed by Tibetan monks. This is a good environ-ment to study naturally occurring emotions, as thedebates can get heated at times, involving a lot ofshouting, laughing, and wild movements.
Monastic debate, as described by Sera Jey Sci-ence Centre (2015), is a kind of clearly structureddialectical debate that is centered around logicalreasoning and does not rely on scriptural author-ity. The debate involves two people: A defender anda challenger. The role of the defender is to stateand subsequently defend a hypothesis. Through theuse of logic, the challenger must then constructcounter-arguments for the hypothesis in such a waythat the defender is forced into a corner of eitheraccepting absurd statements or contradicting him-self (Dreyfus (2008)). The defender can only replyto the challenger’s statements with one of four re-sponses: ”I agree”, ”Please state a reason / Why”,”Reason is not established”, or ”No pervasion”.The goal of these debates is not to convince theopponent or any potential spectators of a certainpoint, but rather to deepen the understanding ofthe current topic through the use of logic.
Figure 2.1: Still taken from a debate recording.The sitting participant on the left is the de-fender, the standing participant on the right isthe challenger. The red caps both participantsare wearing are the EEG caps.
This study aims to examine whether it is possibleto use deep learning to classify the emotions of hap-piness and anger from EEG obtained in a real-lifesetting. More specifically, a long short-term mem-ory (LSTM) and a 1D convolutional neural network(1D CNN) will be used in a subject-independentapproach. Deep learning was selected because itshows great promise in the area of EEG classifica-tion, as shown by Roy et al. (2019). My hypothesisis that this is indeed possible with good accuracy.
2 Methods
Participants
To find out whether we can classify emotions fromEEG data, we used EEG data from a group ofmonks while they were practicing monastic debate.24 monks from the Sera Jey Monastery in Indiaparticipated. All monks were male and between theages of 20 and 30. Of the 24 monks, there were tenexperienced monks with at least 15 years (18750hours) of experience, and 14 inexperienced monksthat had at least three years (3750 hours) of expe-rience.
2
Data Gathering
The monks were recorded during 46 separate one-on-one debates lasting either 10 or 15 minutes. The10-minute debates were about easy topics and the15-minute debates were about more difficult topics.For every debate, audio and video were recordedin addition to the EEG signals from both the de-fender and the challenger. The camera capturedthe face of the defender and the back of the chal-lenger as demonstrated in figure 2.1. To synchro-nize the video feed with the EEG data, two sepa-rate methods were employed. The first method wasto start recording audio and video first and thencount down from 3 to 1. The EEG recording wasstarted at 1. This method was used for all debates.The second method to let the challenger blink fivetimes in a row at the start of the debate. These pur-posefully created EEG artifacts allowed us to moreaccurately synchronize the data. This method wasused for only 15 debates, as this idea came up afterseveral debates had already been recorded.
Video Annotation
The videos were analyzed for emotions usingBORIS: Behavioral Observation Research Interac-tive Software (Friard and Gamba, 2016). In BORIS,we annotated anger and happiness as periods witha start point and an endpoint. The emotions wererecorded separately for the challenger and the de-fender. We classified the emotions by reading thebody language and listening for audible cues. Forexample, laughing loudly was considered to be aclear indicator of happiness, while shouting withclenched fists was considered an indication of anger.
Because of the subjective nature of this process,every recording was annotated by three separatestudents. Once the annotation was completed, theresults were merged. Only the moments at which atleast two of the raters agreed on a certain emotionwere used.
EEG Data
Continuous EEG data were recorded from 32 elec-trodes (Fp1/2, AF3/4, F3/4/7/8/z, FC1/2/5/6,C3/4/z, T7/8, CP1/2/5/6, P3/4/7/8/z, PO3/4,O1/2/z). A Biosemi EEG cap was used at a sam-pling rate of 256 Hz. We received the EEG data of
the debates after it had already been processed.A low-pass filter of 45Hz was used because thefrequencies higher than that contained too muchnoise, caused by the monks’ wild movements. Inde-pendent component analysis (ICA) (Iriarte et al.,2003) was used to detect and remove artifacts suchas those caused by movements and eye blinks fromthe EEG data.
For this research, the following frequency bandswere used: Delta (0.5-4Hz), Theta (4-8Hz), Alpha(8-12Hz), Beta (12-30Hz), and Gamma (30-80Hz).Because of the low-pass filter, the gamma channelwas limited to 30-45Hz. These bands were obtainedusing a discrete wavelet transform. Additionally,the ’raw’ EEG data was used. ’Raw’ here meansthat it was not treated further after it had beencleaned and therefore includes all channels.
The Dataset
The dataset consists of 1759 batches of 256timesteps with 192 features each. The 192 featurescan be broken up into the raw, alpha, beta, gamma,delta, and theta channels for each of the 32 elec-trodes. There are 256 timesteps per batch becausethe sampling rate of the EEG equipment was set to256Hz, meaning that every batch contains the dataof 1 second.
The 1759 batches consist of 1205 instances ofhappiness and 554 instances of anger. Because thesedata are clearly imbalanced, every measurement ofanger was duplicated, resulting in a total of 1108instances of anger. We chose this approach over bal-ancing the dataset by removing instances of happi-ness, as the dataset is already relatively small fordeep learning and removing about a third of thedata would have given the models even less data towork with.
Deep Learning Algorithms
Two different algorithms were used to classify thedata. To find the best settings, many different mod-els were constructed for both algorithms and thencompared based on accuracy, sensitivity, and speci-ficity. In the dataset, ’1’ (true) denotes happiness,and ’0’ (false) anger. This means that the sensi-tivity, also known as the true positive rate, is thepercentage of correct classifications of happiness.Specificity, also known as the true negative rate,
3
is therefore the percentage of correct classificationsof anger. Because the output for both models wasbinary (either anger or happiness), the accuracy isthe average of the sensitivity and specificity.
In all cases, the models were tested using k-foldcross-validation with k=5 in a 60:20:20 split. So60% of the data were used for training, 20% forvalidation, and 20% for testing. For each model,the average of the k results from the k-fold cross-validation is used as its final results.
Algorithm 1: LSTM
The first model we used was a long short-termmemory (LSTM) neural network (Hochreiter andSchmidhuber, 1997). This algorithm was selectedbecause it is well-suited for time series data. LSTMsare a specialized type of recurrent neural network(RNN). RNNs can use previous information for thecurrent task, as they use the output of the previouscalculation for the next one. However, the perfor-mance of RNNs decreases as the gap between de-pendent pieces of information increases, as shownby Bengio et al. (1994). This means that the infor-mation persistence or ’memory’ in RNNs is limitedto the short-term.
LSTMs, on the other hand, expand on RNNs bymaking use of a cell state. The cell state is the long-term memory aspect of the system, as it is used tohold information over long periods. The cell stateis accessed by a set of gates that control what isadded to and removed from it. At each step, a for-get gate is used to determine how much of the cellstate should be discarded. The forget gate does thisusing a sigmoid function and the output of the pre-vious step as well as the current input. Then, aninput gate decides which values to update. Thisis achieved using a sigmoid function to determinewhich values will be updated and a tanh functionto supply the updated values. These updated val-ues are then added to the cell state. Lastly, there isan output gate that calculates the result of the stepfrom the new cell state, the output of the previousstep, and the current input.
Algorithm 2: 1D CNN
The second algorithm we used was a one-dimensional convolutional neural network (1DCNN). This algorithm was selected because of its
strengths in identifying relevant patterns in timeseries data. For example, 1D CNNs have been use-ful in voice analysis, as shown by Fujimura et al.(2020). A CNN uses convolutional layers to extractfeatures from the data, which are put in featuremaps. A fully connected layer then uses these fea-ture maps to classify the data.
3 Results
Both the LSTM and the 1D CNN models were ableto classify the EEG data with high levels of accu-racy. The layout of the dataset will be discussedfirst, followed by the results of the LSTM, the re-sults of the 1D CNN, and a comparison of the two.
LSTM Results
The best result obtained by the LSTM was an av-erage accuracy of 91.0% with a specificity of 90.8%and a sensitivity of 91.3%. The model that obtainedthese results used 2 LSTM layers with 32 units eachand a tanh activation function. The model used adropout layer of 40% to avoid overfitting and asingle-unit dense layer with a sigmoid activationfunction to get binary output, as the only availableoptions are ’happy’ and ’angry’. The RMSProp op-timizer was used with a learning rate of 0.001 andthe model was trained over 32 epochs with a batchsize of 32.
To find out which selection of the data gave thebest results, the model was run for every permuta-tion of the available data (ignoring their order): Theraw data and the alpha, beta, gamma, delta, andtheta waves, resulting in 63 different data sets. Theaverage results of 5-fold cross-validation for the tenbest combinations are shown in Figure 3.1. Theseresults clearly show the importance of the raw EEGdata for this model, as not only does using only theraw channel give the fourth-highest accuracy, butall ten of the best models are various combinationswith the raw data. The best results were obtainedusing the beta and theta channels in combinationwith the raw data. The results of all 63 combina-tions can be seen in Appendix A.
When only looking at single-channel perfor-mance, as shown in Figure 3.2, we can see thatthe raw data channel is indeed the most impor-tant channel with an average accuracy of 90.7% and
4
rawbetatheta
rawdelta
rawbeta
raw rawalpha
rawalpha
gamma
rawalphadelta
rawdeltatheta
rawgamma
deltatheta
rawtheta
Channel
80
90
100
Scor
e (%
)
91 91 91 91 91 9190 90 90 90
91
8789
88 8889
8890
88 88
91
9593 93 93
9293
9092 92
Comparison of the best channel combinations for the LSTMAverage Accuracy Average Sensitivity Average Specificity
Figure 3.1: Comparison of the 10 best combinations of data for the LSTM. Ordered by highestaverage accuracy.
raw delta theta alpha beta gammaChannel
70
80
90
100
Scor
e (%
)
91
73 74 73
7779
88
71 70 7174 74
93
7679
75
81
85
Comparison of the different channels for the LSTMAverage Accuracy Average Sensitivity Average Specificity
Figure 3.2: Comparison of the standalone channels for the LSTM. Ordered by frequency.
5
rawbetatheta
rawbetadelta
rawalphabeta
rawalpha
gammatheta
rawgamma
delta
rawbeta
gammadeltatheta
rawalpha
gammadelta
rawalphabetadelta
rawtheta
rawdeltatheta
Channel
80
90
100
Scor
e (%
)
89 8988 88 88 88 88 88 88 88
86 8687
86
89
84 8486
84 84
9291
9091
87
9392
9092 92
Comparison of the best channel combinations for the CNNAverage Accuracy Average Sensitivity Average Specificity
Figure 3.3: Comparison of the 10 best combinations of data for the 1D CNN. Ordered by highestaverage accuracy.
raw delta theta alpha beta gammaChannel
70
80
90
100
Scor
e (%
)
87
7573
7577 77
85
76 75 76 7775
90
7472
74
78 78
Comparison of the different channels for the CNNAverage Accuracy Average Sensitivity Average Specificity
Figure 3.4: Comparison of the standalone channels for the 1D CNN. Ordered by frequency.
6
LSTM 1D CNNModel
80
90
100
Scor
e (%
)
91.088.8
90.8
85.7
91.3 92.2
Comparison of the different modelsAverage Accuracy Average Sensitivity Average Specificity
Figure 3.5: Results of the LSTM compared to the 1D CNN when using their optimal dataset.
an average sensitivity and specificity of 88.0% and93.5%.
1D CNN Results
The best results obtained by the 1D CNN were ob-tained using 2 layers of 24 filters and a kernel sizeof 3 each. Both layers had a relu activation functionand used the L1L2 bias regularizer. These 1D CNNlayers were followed by a 20% dropout layer anda single-unit dense layer with a sigmoid activationfunction. The model used the binary cross-entropyloss function and the RMSProp optimizer.
This model reached an average accuracy of 88.8%with a sensitivity of 85.7% and a specificity of92.2%. As shown in Figure 3.3 the combination ofthe raw, beta, and theta channels gave the bestresults, as they did for the LSTM. When lookingat single-channel performance shown in Figure 3.4,the raw EEG data is the best predictor, as was thecase for the LSTM. Using only the raw channel re-sulted in an accuracy of 87.0% and a sensitivity andspecificity of 84.4% and 89.5%. The results of all 63combinations can be seen in Appendix B.
Comparison of the models
As shown in Figure 3.5, the LSTM model slightlyoutperforms the 1D CNN model by 2.2%-point
when only looking at the accuracy of the models.However, the LSTM also has a smaller differencebetween its sensitivity and specificity scores of only0.5%-point, while the 1D CNN has a difference of6.5%-point.
4 Discussion
The results confirm our hypothesis that deep neuralnetworks, and specifically an LSTM and a 1D CNN,can be used to accurately classify the emotionsof anger and happiness from EEG data obtainedin the real-life setting of monastic debate using asubject-independent approach. Of the two models,the LSTM has the highest accuracy at 91.0%, andthe smallest difference between its sensitivity andspecificity results at 90.8% and 91.3% respectively.Therefore, when computing power is not a factor,the LSTM is the clear winner. However, becausethe 1D CNN is faster than the LSTM, the 1D CNNmight be the algorithm of choice in situations wherelittle processing power is available, such as a wear-able EEG headset as used by Matiko et al. (2015).
Furthermore, we explored the effects of the dif-ferent combinations of available data on the per-formance of the algorithms. In both cases, the rawdata performed much better than any of the otherchannels but was beaten by the combination of rawdata and the beta and theta bands. This makes
7
sense because the raw EEG data contains the mostinformation, which gives the models more to workwith. Given the high performance of just the rawdata, the trade-off between slightly lower resultsand using a much smaller dataset that requires bothless computing power and less memory to use isworth consideration for lower power devices.
When comparing these results to Matiko et al.(2015), this new approach has a clear advantagewhen looking at the subject-independent approach,which eliminates the need for training the classi-fier on each individual subject. While their fuzzylogic-based classifier had an accuracy of 90% (±9%)in their subject-dependent method, the same sys-tem only had an accuracy of 62.62% in a subject-independent approach. Lin et al. (2014) reacheda similar accuracy of 61.09% using an SVM ina subject-independent approach. Both results aremuch lower than the one achieved by the modelsdiscussed here. However, it should be noted thatthese direct comparisons between classifiers are notalways completely fair, given the wildly different se-tups of each study. Matiko et al. (2015), for exam-ple, used a real-time classifier and a wearable EEGheadset that generally results in much lower qualityof data than EEG caps such as used in this study.
During our research, we ran into several issues.While annotating the videos, it was especially dif-ficult to assess the emotion of the challengers, asthe camera was positioned behind them, so wecould only see their faces on the rare occasions theyturned around. Further complicating the annota-tion process was the fact that none of the annota-tors knew more than a few words in Tibetan, so wedid not know what the participants were saying. Attimes, this lack of context made it even harder toevaluate their emotions.
It should also be noted that this debate is a kindof game for the monks and one that even the inex-perienced monks have practiced for at least 3750hours. Therefore, it is conceivable that what weconsidered to be anger was not actually anger. Thisis supported by the fact that the monks would regu-larly start laughing after or even between shouting.Additionally, there was no instance where at leasttwo annotators agreed on when the defender wasangry. All instances of anger were taken from theattacker.
If it was not actually anger, there are a few ex-planations of what our models found instead. The
models might have picked up on a difference be-tween happiness and another emotion or mixtureof emotions that, to us, looked like anger. Alter-natively, there might be a difference in how muchor how wildly the monks moved while they were’angry’, resulting in more noise and artifacts in theEEG data, which was then picked up by the mod-els. Especially the higher frequencies used for thegamma band are susceptible to this, which mightexplain why it scored so high compared to the otherfrequency bands. In the case that either explana-tion is true and our models did not actually classifyanger, this would likely impact the generalizabilityof our models on other real-world situations.
Without the raw EEG and the gamma frequencyband, the average accuracy of the LSTM drops to78.8% with a sensitivity and specificity of 76.0%and 81.7% for the combination of the alpha, beta,and theta frequency bands, as can be seen in FigureA.3. The results of the CNN drop to an accuracyof 77.3% with a sensitivity and specificity of 76.4%and 78.1%, as shown in Figure B.3 for the combina-tion of the alpha, beta, delta, and theta frequencybands. Future research may explore whether thegood results obtained using the raw EEG data andthe data from the gamma frequency band can beattributed to only the emotions themselves or fromother factors such as movement.
This is the first step in decoding naturally occur-ring emotions using deep learning with high relia-bility. Future research could look at how generaliz-able these models are in other real-life situations.
5 References
Y. Bengio, P. Simard, and P. Frasconi. Learn-ing long-term dependencies with gradient de-scent is difficult. IEEE Transactions on Neu-ral Networks, 5(2):157–66, 1994. URL https://
rug.on.worldcat.org/oclc/264469100.
Alexander J. Casson. Wearable EEG andbeyond. Biomedical Engineering Letters, 9(1):53–71, 2019. 10.1007/s13534-018-00093-6. URL https://rug.on.worldcat.org/oclc/
8031624869.
Elise S. Dan-Glauser and Klaus R. Scherer. Thegeneva affective picture database (GAPED): anew 730-picture database focusing on valence
8
and normative significance. Behavior ResearchMethods, 43(2):468–477, 2011. 10.3758/s13428-011-0064-1. URL https://rug.on.worldcat
.org/oclc/5660412651.
G. B. Dreyfus. What is debate for? the rationalityof tibetan debates and the role of humor. Ar-gumentation, 22(1):43–58, 2008. URL https://
rug.on.worldcat.org/oclc/229725701.
Olivier Friard and Marco Gamba. BORIS: a free,versatile open-source event-logging software forvideo/audio coding and live observations. Meth-ods in Ecology and Evolution, 7(11):1325–1330,2016. 10.1111/2041-210X.12584. URL https://
rug.on.worldcat.org/oclc/6869782890.
Shintaro Fujimura, Tsuyoshi Kojima, YusukeOkanoue, Kazuhiko Shoji, Masato Inoue, KoichiOmori, and Ryusuke Hori. Classification ofvoice disorders using a one-dimensional convo-lutional neural network. Journal of Voice, 2020.10.1016/j.jvoice.2020.02.009. URL https://rug
.on.worldcat.org/oclc/8553814172.
Francesca Gino and Maurice E. Schweitzer. Blindedby anger or feeling the love: How emotions influ-ence advice taking. Journal of Applied Psychol-ogy, 93(5):1165–1173, 2008. 10.1037/0021-9010.93.5.1165. URL https://rug.on.worldcat
.org/oclc/4643329145.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–80, 1997. URL https://rug.on.worldcat.org/
oclc/120718902.
J. Iriarte, E. Urrestarazu, M. Valencia, M. Alegre,A. Malanda, C. Viteri, and J. Artieda. Indepen-dent component analysis as a tool to eliminateartifacts in EEG: a quantitative study. Jour-nal of clinical neurophysiology : official publica-tion of the American Electroencephalographic So-ciety, 20(4):249–57, 2003. URL https://rug.on
.worldcat.org/oclc/112440307.
Y. P. Lin, Y. H. Yang, and T. P. Jung. Fusionof electroencephalographic dynamics and musicalcontents for estimating emotional responses inmusic listening. Frontiers in neuroscience, 8:94,2014. 10.3389/fnins.2014.00094. URL https://
rug.on.worldcat.org/oclc/5579182259.
Joseph W Matiko, Yang Wei, Russel Torah, NeilGrabham, Gordon Paul, Stephen Beeby, andJohn Tudor. Wearable EEG headband us-ing printed electrodes and powered by energyharvesting for emotion monitoring in ambientassisted living. Smart Materials and Struc-tures, 24(12), 2015. 10.1088/0964-1726/24/12/125028. URL https://rug.on.worldcat.org/
oclc/5914133367.
Bernard Rime. Emotion elicits the social shar-ing of emotion: Theory and empirical re-view. Emotion Review, 1(1):60–85, 2009. 10.1177/1754073908097189. URL https://rug.on
.worldcat.org/oclc/4634563507.
Y. Roy, J. Faubert, H. Banville, A. Gram-fort, I. Albuquerque, and T. H. Falk. Deeplearning-based electroencephalography analysis:A systematic review. Journal of Neural En-gineering, 16(5), 2019. 10.1088/1741-2552/ab260c. URL https://rug.on.worldcat.org/
oclc/8371875269.
Sera Jey Science Centre. Science - brief introduc-tion to science debate, 2015.
Debra Trampe, Jordi Quoidbach, Maxime Taquet,and Alessio Avenanti. Emotions in everydaylife. PLOS One, 10(12):e0145450, 2015. 10.1371/journal.pone.0145450. URL https://rug
.on.worldcat.org/oclc/5954511641.
9
A Appendix: LSTM Results
rawbetatheta
rawdelta
rawbeta
rawraw
alpharaw
alphagam
ma
rawalphadelta
rawdeltatheta
rawgam
ma
deltatheta
rawtheta
rawbetadeltatheta
rawalpha
gamm
atheta
rawbeta
gamm
a
rawalpha
gamm
adeltatheta
rawbeta
gamm
atheta
rawalphadeltatheta
Channel
70 80 90
100Score (%)
9191
9191
9191
9090
9090
9090
8989
8989
91
87
8988
8889
88
90
8888
8887
86
8887
89
91
95
9393
9392
93
90
9292
9293
9391
92
89
Comparison of possible com
binations for the LSTMAverage Accuracy
Average SensitivityAverage Specificity
Figure A.1: The best 16 combinations of the data for the LSTM. Ordered by highest averageaccuracy.
10
rawalphabeta
gamm
a
rawalphabeta
gamm
adelta
rawalphabetadelta
rawalpha
gamm
adelta
rawalphabetadeltatheta
rawalphabetatheta
rawgam
ma
delta
rawgam
ma
theta
rawalphabeta
rawgam
ma
rawalphatheta
rawbetadelta
rawbeta
gamm
adeltatheta
rawalphabeta
gamm
atheta
rawalphabeta
gamm
adeltatheta
rawbeta
gamm
adelta
Channel
70 80 90
100
Score (%)
8989
8989
8989
8888
8888
8888
8888
8888
8586
8787
8687
85
88
83
89
84
8687
8485
85
94
9291
9192
9192
90
94
87
93
9090
9291
91
Comparison of possible com
binations for the LSTMAverage Accuracy
Average SensitivityAverage Specificity
Figure A.2: Data combinations 17-32 for the LSTM. Ordered by highest average accuracy.
11
alphabeta
gamm
adeltatheta
gamm
adelta
gamm
atheta
gamm
adeltatheta
betagam
ma
delta
alphagam
ma
deltatheta
alphabeta
gamm
a
betagam
ma
deltatheta
alphagam
ma
theta
alphagam
ma
delta
betagam
ma
alphagam
ma
alphabeta
gamm
adelta
gamm
aalphabetatheta
alphabeta
gamm
atheta
Channel
70 80 90
100
Score (%)
8282
8282
8282
8181
8181
8079
7979
7978
8079
80
77
7979
7475
7877
74
78
7574
76
80
8586
85
88
8484
89
87
8485
87
81
8485
82
76
Comparison of possible com
binations for the LSTMAverage Accuracy
Average SensitivityAverage Specificity
Figure A.3: Data combinations 33-48 for the LSTM. Ordered by highest average accuracy.
12
betagam
ma
theta
betatheta
deltatheta
betadelta
betaalphabetadeltatheta
betadeltatheta
alphabetadelta
alphabeta
alphadeltatheta
alphatheta
alphadelta
thetadelta
alpha
Channel
70 80 90
100
Score (%)
7878
7877
7777
7777
7676
7675
7473
7371
73
76
71
7473
71
77
7172
70
74
7071
71
87
84
80
84
8181
83
76
82
80
82
7879
7675
Comparison of possible com
binations for the LSTMAverage Accuracy
Average SensitivityAverage Specificity
Figure A.4: Data combinations 49-63 for the LSTM. Ordered by highest average accuracy.
13
B Appendix: CNN Results
rawbetatheta
rawbetadelta
rawalphabeta
rawalpha
gamm
atheta
rawgam
ma
delta
rawbeta
gamm
adeltatheta
rawalpha
gamm
adelta
rawalphabetadelta
rawtheta
rawdeltatheta
rawbeta
rawbetadeltatheta
rawgam
ma
rawalphadeltatheta
rawgam
ma
deltatheta
rawbeta
gamm
adelta
Channel
80 90
100Score (%)
8989
8888
8888
8888
8888
8887
8787
8787
8686
8786
89
8484
86
8484
88
8584
83
86
82
9291
9091
87
9392
90
9292
87
9192
92
89
93
Comparison of possible com
binations for the CNN
Average AccuracyAverage Sensitivity
Average Specificity
Figure B.1: The best 16 combinations of the data for the CNN. Ordered by highest averageaccuracy.
14
rawalphabeta
gamm
adelta
rawalphabeta
gamm
a
rawdelta
rawalpha
gamm
adeltatheta
rawalpha
rawalphabeta
gamm
adeltatheta
rawraw
alphabetadeltatheta
rawbeta
gamm
atheta
rawalphatheta
rawalpha
gamm
a
rawgam
ma
theta
rawalphadelta
rawalphabetatheta
rawbeta
gamm
a
rawalphabeta
gamm
atheta
Channel
80 90
100
Score (%)
8787
8787
8787
8787
8787
8787
8786
8685
8483
82
84
8686
8584
81
8382
82
86
82
80
87
9192
93
91
8889
9090
93
9192
92
87
90
92
83
Comparison of possible com
binations for the CNN
Average AccuracyAverage Sensitivity
Average Specificity
Figure B.2: Data combinations 17-32 for the CNN. Ordered by highest average accuracy.
15
betagam
ma
delta
alphagam
ma
delta
betagam
ma
theta
alphagam
ma
alphabeta
gamm
atheta
alphabeta
gamm
adelta
betagam
ma
gamm
atheta
gamm
adelta
alphabetadeltatheta
alphabeta
gamm
a
alphagam
ma
deltatheta
betaalphabetatheta
alphagam
ma
theta
betadelta
Channel
80 90
100
Score (%)
7878
7878
7878
7877
7777
7777
7777
7777
7878
8180
81
7778
7777
7677
7677
75
78
74
7878
7576
74
78
76
7878
7878
7878
79
75
79
Comparison of possible com
binations for the CNN
Average AccuracyAverage Sensitivity
Average Specificity
Figure B.3: Data combinations 33-48 for the CNN. Ordered by highest average accuracy.
16
gamm
adeltatheta
betadeltatheta
gamm
adeltatheta
alphadeltatheta
betagam
ma
deltatheta
betatheta
alphabetadelta
alphadelta
alphabeta
gamm
adeltatheta
alphabeta
alphadelta
alphatheta
theta
Channel
80 90
100
Score (%)
7777
7776
7676
7676
7676
7575
7575
73
78
7475
79
77
75
72
7574
7273
7676
7575
75
80
78
73
75
77
80
7777
80
78
7474
7472
Comparison of possible com
binations for the CNN
Average AccuracyAverage Sensitivity
Average Specificity
Figure B.4: Data combinations 49-63 for the CNN. Ordered by highest average accuracy.
17