Top Banner
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2013, Article ID 265819, 9 pages http://dx.doi.org/10.1155/2013/265819 Research Article Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data Chengwei Huang, Ruiyu Liang, Qingyun Wang, Ji Xi, Cheng Zha, and Li Zhao School of Information Science and Engineering, Southeast University, Nanjing 210096, China Correspondence should be addressed to Chengwei Huang; [email protected] Received 7 March 2013; Revised 26 May 2013; Accepted 4 June 2013 Academic Editor: Saeed Balochian Copyright © 2013 Chengwei Huang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We study the cross-database speech emotion recognition based on online learning. How to apply a classifier trained on acted data to naturalistic data, such as elicited data, remains a major challenge in today’s speech emotion recognition system. We introduce three types of different data sources: first, a basic speech emotion dataset which is collected from acted speech by professional actors and actresses; second, a speaker-independent data set which contains a large number of speakers; third, an elicited speech data set collected from a cognitive task. Acoustic features are extracted from emotional utterances and evaluated by using maximal information coefficient (MIC). A baseline valence and arousal classifier is designed based on Gaussian mixture models. Online training module is implemented by using AdaBoost. While the offline recognizer is trained on the acted data, the online testing data includes the speaker-independent data and the elicited data. Experimental results show that by introducing the online learning module our speech emotion recognition system can be better adapted to new data, which is an important character in real world applications. 1. Introduction e state-of-the-art speech emotion recognition (SER) sys- tem is largely dependent on its training data. Emotional vocal behavior is personality dependent, situation dependent, and language dependent. erefore, emotional models trained from a specific database may not fit to other databases. To solve this problem, we introduce an online learning frame- work to the SER system. Online speech data is used to retrain and to improve the classifier. Adopting the online learning framework, we may better adapt our SER system to different speakers and different data sources. Many achievements have been reported on the acted speech emotion databases [13]. Tawari and Trivedi [4] con- sidered the role of context and detected seven emotions on the Berlin Emotional Database [5]. Ververidis and Kotropou- los [6] studied gender-based speech emotion recognition sys- tem for five different emotional states. A number of machine learning algorithms have been studied in SER, using acted emotional data. Only recently the need of using naturalistic data has been pointed out. Several naturalistic speech emo- tion databases have been developed, such as AIBO emotional speech database [7] and VAM database [8]. Many researchers notice that real world data plays a key role in the SER system [9], and the model trained on the acted data does not fit very well on the naturalistic data. Incremental learning may provide us a good solution to solve this problem under an online learning framework. e pretrained models on the acted data may be updated using very few online data. Since the naturalistic emotion data is very difficult to collect, acted speech data still plays an important role, especially in studying rare emotion types, such as fear-type emotion [1], confidence, and anxiety [10]. By using incremental learning we can make use of the available acted databases as a baseline recognizer and then retrain the classifier online for specific purposes. Many successful algorithms have been proposed for in- cremental learning, such as Learning++ [11] and Bagging++ [12]. Incremental learning algorithms may be classified into two categories. In the first category, a single classifier is updat- ed by reestimating its parameters. is type of learning algo- rithms is dependent on the specific classifier, such as the incremental learning algorithm for support vector machine
10

Research Article Practical Speech Emotion Recognition ...

Dec 25, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Research Article Practical Speech Emotion Recognition ...

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 265819 9 pageshttpdxdoiorg1011552013265819

Research ArticlePractical Speech Emotion Recognition Based on OnlineLearning From Acted Data to Elicited Data

Chengwei Huang Ruiyu Liang Qingyun Wang Ji Xi Cheng Zha and Li Zhao

School of Information Science and Engineering Southeast University Nanjing 210096 China

Correspondence should be addressed to Chengwei Huang huangcwx126com

Received 7 March 2013 Revised 26 May 2013 Accepted 4 June 2013

Academic Editor Saeed Balochian

Copyright copy 2013 Chengwei Huang et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

We study the cross-database speech emotion recognition based on online learning How to apply a classifier trained on acted datato naturalistic data such as elicited data remains a major challenge in todayrsquos speech emotion recognition system We introducethree types of different data sources first a basic speech emotion dataset which is collected from acted speech by professionalactors and actresses second a speaker-independent data set which contains a large number of speakers third an elicited speechdata set collected from a cognitive task Acoustic features are extracted from emotional utterances and evaluated by using maximalinformation coefficient (MIC) A baseline valence and arousal classifier is designed based on Gaussian mixture models Onlinetraining module is implemented by using AdaBoost While the offline recognizer is trained on the acted data the online testingdata includes the speaker-independent data and the elicited data Experimental results show that by introducing the online learningmodule our speech emotion recognition system can be better adapted to new data which is an important character in real worldapplications

1 Introduction

The state-of-the-art speech emotion recognition (SER) sys-tem is largely dependent on its training data Emotional vocalbehavior is personality dependent situation dependent andlanguage dependent Therefore emotional models trainedfrom a specific database may not fit to other databases Tosolve this problem we introduce an online learning frame-work to the SER system Online speech data is used to retrainand to improve the classifier Adopting the online learningframework we may better adapt our SER system to differentspeakers and different data sources

Many achievements have been reported on the actedspeech emotion databases [1ndash3] Tawari and Trivedi [4] con-sidered the role of context and detected seven emotions onthe Berlin Emotional Database [5] Ververidis and Kotropou-los [6] studied gender-based speech emotion recognition sys-tem for five different emotional states A number of machinelearning algorithms have been studied in SER using actedemotional data Only recently the need of using naturalisticdata has been pointed out Several naturalistic speech emo-tion databases have been developed such as AIBO emotional

speech database [7] and VAMdatabase [8] Many researchersnotice that real world data plays a key role in the SER system[9] and the model trained on the acted data does not fit verywell on the naturalistic data

Incremental learning may provide us a good solutionto solve this problem under an online learning frameworkThe pretrained models on the acted data may be updatedusing very few online data Since the naturalistic emotiondata is very difficult to collect acted speech data still playsan important role especially in studying rare emotion typessuch as fear-type emotion [1] confidence and anxiety [10] Byusing incremental learning we can make use of the availableacted databases as a baseline recognizer and then retrain theclassifier online for specific purposes

Many successful algorithms have been proposed for in-cremental learning such as Learning++ [11] and Bagging++[12] Incremental learning algorithms may be classified intotwo categories In the first category a single classifier is updat-ed by reestimating its parameters This type of learning algo-rithms is dependent on the specific classifier such as theincremental learning algorithm for support vector machine

2 Mathematical Problems in Engineering

proposed by Xiao et al [13] The techniques used in suchparameter estimation may not be generalized In the secondcategory the incremental learning algorithm is not depen-dent on a specific type of classifiers Multiple classifiers arecreated and combined by a certain fusion rule such as major-ity vote Boosting is a typical type of algorithms that fall intothe second category By creating weak classifiers using se-lected data we may add new training data to the learningprocedure and gradually adapt the SER system in an onlineenvironment

In this paper we explore the possibility of transferringpretrained SER system from acted data to more naturalisticdata in an online learning framework Section 2 describesour acted data and elicited data Section 3 provides acousticanalysis of emotional features In Section 4 we introduce ourspeech emotion recognizer and the online learning method-ology Finally in Section 5 we provide the experimentalresults which show that combining the acted data and theelicited data using online learning brings us the best result

2 Three Types of Data Sources

In this paper we use three types of data sources to validate ourSER system (i) acted basic emotion database (ii) speaker-independent emotion database and (iii) elicited emotiondatabase

The first database contains the basic emotions includinghappiness anger surprise sadness fear and neutrality Theemotional speech is recorded by professional actors andactress six males and six femalesThis acted database may beused as a standard training dataset for our baseline recog-nizer However in real world applications the naturalisticemotional speech is different from the acted speech

The second database is designed for speaker-independenttest which includes fifty-one different speakers Other thana large number of speakers a special type of emotion isconsidered namely fidgetiness Fidgetiness is an importanttype of emotion in cognitive related tasks It may be inducedby repeated work environmental noise and stress The sec-ond database contains five emotions as shown in Table 1This database may be used for testing the ability of speakeradaptationWhen using training data from the first databaseit is challenging to test our SER system on the second data-base due to many unknown speakers

The third database contains elicited speech in a cognitivetask as shown in Table 2 The first row shows the emotiontypes collected in our experiments such as fidgetiness con-fidence and tirednessThe second row is the speaker numberrelated to each type of emotion The third row is the maleand female proportion in the emotion data The last row isthe number of utterances in each emotion class The datais collected locally in our lab We carried out a cognitiveexperiment and collected the emotional speech related tocognitive performance Subject was required to work on aset of math calculations and to report the results orallyDuring the cognitive task the speech signals were recordedand annotated with emotional labels

In the third database ldquocorrect answerrdquo or ldquofalse answerrdquolabels are marked on each utterance in the oral report by

Table 1 The Speaker-independent emotion dataset

Emotion type Happiness Anger Fidgetiness Sadness NeutralitySpeakernumber 51 51 51 51 51

Malefemale 2328 2328 2328 2328 2328Utterance size 2200 2200 2200 2200 2200

Table 2 The Elicited Emotion Dataset

Emotiontype Confidence Tiredness Fidgetiness Happiness Neutrality

Speakernumber 6 6 6 6 6

Malefemale 33 33 33 33 33

Utterancesize 1200 1200 1200 1200 1200

the listeners who have not participated in the eliciting exper-iment Therefore we may calculate the percentage of falseanswers in the negative emotion samples and the percentageof negative emotion in the ldquofalse answerrdquo samples Resultsshow that the proportion of the mistake made in the mathcalculation is higher with the presence of negative emotionsas shown in Figures 1 and 2The purpose of this database is tostudy the cognitive related emotions in speech The analysisshows the dependency between the mistakes made in themath calculation and the negative emotions in the speech

3 Feature Analysis

31 Acoustic Feature Extraction Emotional information ishidden in the speech signals Unlike the linguistic informa-tion it is difficult to find the related acoustic features There-fore feature analysis and selection are very important steps inbuilding an SER system

We selected typical utterances to study the feature vari-ance caused by emotional change as shown in Figures 3 45 6 7 8 9 10 and 11 To better reflect the change caused byemotional information we fix the context of these utterances

The utterances shown in the figures are recorded from thesame speaker By comparing the utterances under differentemotional state from the same speaker we can exclude theinfluence brought by different speaking habits and personali-ties It reveals the changes in the acoustic features caused onlyby the emotional information

We induced three types of practical emotions from acognitive task namely fidgetiness confidence and tirednessWe also studied the basic emotions like happiness angersurprise sadness and fearThe intensity feature and the pitchcontour are extracted and demonstrated in Figure 3 throughFigure 11

The first syllable is not normal speech under the fear emo-tional state The pitch feature is missing and it is whisperedspeech under the emotional state of fear Under the tirednessemotion state the pitch contour is low and flat which is quitedistinguishable from other emotion states

Mathematical Problems in Engineering 3

Mistakes

Negative emotions

Positive emotions

Figure 1The percentage of negative emotions whenmistake occursin the cognitive task

Negative emotions

Correct answersFalse answers

Figure 2The percentage of correct answers and false answers whennegative emotion occurs in the cognitive task

In the neutral speech the pitch contour is also flat but atthe end of the sentence the pitch frequency increases Com-paring speaking the pitch frequency is not consistent at theend of the sentence Under the sadness emotion state thepitch contour is smooth and decreases at the end of the sen-tence Furthermore in the happiness sample the varianceof the pitch frequency is higher The pith frequency also in-creases in the confidence and surprise samples

We also notice that under the angry emotion state thevariance of the intensity is lower and the intensity contouris smooth However in the sadness sample the varianceof the intensity is higher Sadness and tiredness may havecaused longer time duration and a lower speech rate whilefidgetiness and anger may have caused a higher speech rate

Quantitative statistical analysis is shown in Figure 12Pitch and formants features are compared under variousemotional states

For modeling and recognition purposes 481 dimensionsof acoustic features are constructed Statistic functions over

Time (s)0 2337

0

600

Freq

uenc

y (H

z)

Time (s)0 2337

minus06864

06402

0

Time (s)0 2337

5374

8552

Inte

nsity

(dB)

Figure 3 Intensity and pitch contour of happiness

Time (s)0 3587

minus0708

07079

0

Time (s)0 3587

0

600

Freq

uenc

y (H

z)

Time (s)0 3587

1393

8779

Inte

nsity

(dB)

Figure 4 Intensity and pitch contour of sadness

the entire utterance such as maximum minimum meanrange are applied to the basic speech features as listed belowldquodrdquo stands for difference and ldquod2rdquo stands for the second orderof difference

Feature 1ndash6 mean maximum minimum medianrange and variance of Short-time Energy (SE)Feature 7ndash18 mean maximum minimum medianrange and variance of dSE and d2SE

4 Mathematical Problems in Engineering

Time (s)0 2716

minus07079

07079

0

Time (s)0 2716

0

600

Freq

uenc

y (H

z)

Time (s)0 2716

minus2878

8675

Inte

nsity

(dB)

Figure 5 Intensity and pitch contour of fidgetiness

Time (s)0 286

minus0708

06984

0

Time (s)0 286

0

600

Freq

uenc

y (H

z)

Time (s)0 286

1434

8812

Inte

nsity

(dB)

Figure 6 Intensity and pitch contour of surprise

Feature 19ndash24 mean maximum minimum medianrange and variance of pitch frequency (F

0)

Feature 25ndash36 mean maximum minimum medianrange and variance of dF

0and d2F

0

Feature 37ndash42 mean maximum minimum medianrange and variance of Zero-Crossing Rate (ZCR)

Time (s)0 2575

-06931

07073

0

Time (s)0 2575

0

600

Freq

uenc

y (H

z)Time (s)

0 25751474

8628

Inte

nsity

(dB)

Figure 7 Intensity and pitch contour of fear

Time (s)0 4061

minus0708

07079

0

Time (s)0 4061

0

600

Freq

uenc

y (H

z)

Time (s)0 4061

minus300

8682

Inte

nsity

(dB)

Figure 8 Intensity and pitch contour of tiredness

Feature 43ndash54 mean maximum minimum medianrange and variance of dZCR and d2ZCR

Feature 55 speech rate (SR)

Feature 56ndash57 Pitch Jitter1 (PJ1) Pitch Jitter2 (PJ2)

Feature 58ndash61 0ndash250HzEnergyRatio (ER) 0ndash650HzER and 4 kHz above ER and Energy Shimmer (ESH)

Mathematical Problems in Engineering 5

Time (s)0 17

minus07079

07079

0

Time (s)0 17

0

500

Freq

uenc

y (H

z)

Time (s)0 17

1539

845

Inte

nsity

(dB)

Figure 9 Intensity and pitch contour of anger

Time (s)0 1919

minus07599

06849

0

Time (s)0 1919

0

600

Freq

uenc

y (H

z)

Time (s)0 1919

154

8567

Inte

nsity

(dB)

Figure 10 Intensity and pitch contour of neutrality

Feature 62ndash65Voiced Frames (VF)Unvoiced Frames(UF) UFVF and VF(UF+VF)

Feature 66ndash69 Voiced Segments (VS) Unvoiced Seg-ments (US) USVS and VS(US+VS)

Feature 70-71 Maximum Voiced Duration (MVD)Maximum Unvoiced Duration (MUD)

Time (s)0 2354

minus07079

07079

0

Time (s)0 2354

0

600

Freq

uenc

y (H

z)

Time (s)0 2354

minus09446

8677

Inte

nsity

(dB)

Figure 11 Intensity and pitch contour of confidence

Feature 72ndash77 mean maximum minimum medianrange and variance of Harmonic-to-Noise Ratio(HNR)

Feature 78ndash95 mean maximum minimum medianrange and variance of HNR (0ndash400Hz 400ndash2000Hz and 2000ndash5000Hz)

Feature 96ndash119 meanmaximumminimummedianrange and variance of 1st formant frequency (F1) 2ndformant frequency (F2) 3rd formant frequency (F3)and 4th formant frequency (F4)

Feature 120ndash143 mean maximum minimum me-dian range and variance of dF1 dF2 dF3 and dF4

Feature 144ndash167 mean maximum minimum me-dian range and variance of d2F1 d2F2 d2F3 andd2F4

Feature 168ndash171 Jitter1 of F1 F2 F3 and F4

Feature 172ndash175 Jitter2 of F1 F2 F3 and F4

Feature 176ndash199 mean maximum minimum me-dian range and variance of F1 F2 F3 and F4 Band-width

Feature 200ndash223 mean maximum minimum me-dian range and variance of dF1 Bandwidth dF2Bandwidth dF3 Bandwidth and dF4 Bandwidth

Feature 224ndash247 mean maximum minimum me-dian range and variance of d2F1 Bandwidth d2F2Bandwidth d2F3 Bandwidth and d2F4 Bandwidth

Feature 248ndash325 mean maximum minimum me-dian range and variance of MFCC (0ndash12th-order)

6 Mathematical Problems in Engineering

1 09521

067120801407877

002040608

112

Fidgetiness Happiness Confidence Tiredness Neutrality

Nor

mal

ized

mea

n pi

tch

frequ

ency

(a) Normalised mean pitch frequency

735852

789705

791

0100200300400500600700800900

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n fir

st fo

rman

t fre

quen

cy

(b) Mean first formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n se

cond

form

ant

frequ

ency 1780

1842 1853

17751746

16801700172017401760178018001820184018601880

(c) Mean second formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n th

ird fo

rman

t fre

quen

cy

2998 2985

3164

3078

2949

280028502900295030003050310031503200

(d) Mean third formant frequency (Hz)

Figure 12 Feature distribution over various emotional states

Feature 326ndash403 mean maximum minimum me-dian range and variance of dMFCC (0ndash12th-order)Feature 404ndash481 mean maximum minimum me-dian range and variance of d2MFCC (0ndash12th-order)

32 Feature Selection Based onMIC In this section we intro-duce the feature selection algorithm in our speech emotionclassifier Feature selection algorithms may be roughly clas-sified into two groups namely ldquowrapperrdquo and ldquofilterrdquo Algo-rithms in the former group are dependent on the specific clas-sifiers such as sequential forward selection (SFS) The finalselection result is dependent on a specific classifier If we re-place the specific classifier the results will change In thesecond group feature selection is done by a certain evaluationcriteria such as FisherDiscriminant Ratio (FDR)The feature

Angry

Fidgetiness

Fear

Surprise

Neutrality

Happiness

Confidence

Sadness

Tiredness

minus1

minus08

minus06

minus04

minus02

0

02

04

06

08

1

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1Aro

usal

Valence

Figure 13 The arousal and the valence dimensions of emotions

selection result achieved in this type of method is not de-pendent on specific classifiers and bears a better generalityacross different databases

Maximal information coefficient (MIC) based feature se-lection algorithm falls into the second group MIC is a newstatistic tool that measures linear and nonlinear relationshipsbetween paired variables invented by Reshef et al [14]

MIC is based on the idea that if a relationship existsbetween two variables then a grid can be drawn on the scat-terplot of the two variables that partitions the data to encap-sulate that relationship [14] We may calculate the MIC of acertain acoustic feature and the emotional state by exploringall possible grids on the two variables First we computefor every pair of integers (119909 119910) that largest possible mutualinformation achieved by any 119909-by-119910 grid [14] Second for afair comparison we normalize these MIC values between allacoustic features and the emotional state Detailed study ofMIC may be found in [14]

Since MIC can treat linear and nonlinear associations atthe same time we do not need tomake any assumption on thedistribution of our original features Therefore it is especiallysuitable for evaluating a large number of emotional featuresBased on a large number of basic features as described inSection 31 we apply MIC to measure the contribution ofthese features in correlation with emotion states Finally asubset of features is selected for our emotion classifier

4 Recognition Methodology

41 Baseline GMM Classifier The Gaussian mixture model(GMM) based classifier is the state-of-the-art recognitionmethod in speaker and language identification In this paperwe built the baseline classifier using Gaussianmixturemodeland we may compare the baseline classifier with the onlinelearning method

Mathematical Problems in Engineering 7

GMM may be defined by the sum of several Gaussiandistributions

119901 (X119905| 120582) =

119872

sum

119894=1

119886119894119887119894(X119905) (1)

where X119905is a 119863-dimension random vector 119887

119894(X119905) is the 119894th

member of Gaussian distribution 119905 is the index of utterancesample 119886

119894is the mixture weight and 119872 is the number of

Gaussian mixture members Each member is a119863-dimensionvariable which follows the Gaussian distribution with themean U

119894and the covariance Σ

119894

119887119894(X119905) =

1

(2120587)119863210038161003816

10038161003816Σ119894

1003816100381610038161003816

12exp minus1

2

(X119905minus U119894)119879

Σminus1

119894(X119905minus U119894)

(2)

Note that119872

sum

119894=1

119886119894= 1 (3)

Emotion classification can be done by maximizing theposterior probability

EmotionLable = argmax119896

(119901 (X119905| 120582119896)) (4)

ExpectationMaximization (EM) is adopted forGMMparam-eter estimation [15]

119886119894

119898=

sum119879

119905=1120574119894

119905119898

sum119879

119905=1sum119872

119898=1120574119894

119905119898

U119894119898=

sum119879

119905=1120574119894

119905119898X119905

sum119879

119905=1120574119894

119905119898

Σ119894

119898=

sum119879

119905=1120574119894

119905119898(X119905minus U119894119898) (X119905minus U119894119898)

119879

sum119879

119905=1120574119894

119905119898

120574119894

119905119898=

119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

sum119872

119898=1119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

(5)

Due to the different types of emotions among the datasetswe unify the emotional datasets by categorizing them intopositive and negative regions in the valence and arousal di-mensions as shown in Figure 13 We may verify the ability ofthe emotion classifier by classifying the emotional utterancesinto different regions in the valence and arousal space

42 Online LearningUsingAdaBoost While the offlineGMMclassifier is trained using EM algorithm the online trainingalgorithmusingAdabBoost will be introduced in this sectionAdaBoost is a powerful algorithm in assemble learning [16]The belief in this AdaBoost is that weak classifiers may becombined into a powerful classifier Multiple classifierstrained on randomly selected datasets perform quiet differ-ently from each other on the same testing dataset therefore

we may reduce the misclassification rate by a proper decisionfusion rule

AdaBoost algorithm consists of several iterations In eachiteration a new training set is selected for a new weak clas-sifier A weight is assigned to the new weak classifier Basedon the testing results of the newweak classifier the weights ofall the data samples are modified for the next iteration At thefinal step the assembled classifier is achieved by combinationof themultipleweak classified through aweighted voting rule

Let us suppose the current training set is [17]

119879 = 1199041 1199042 119904

119873 (6)

where the weights of the samples are

119882 = 1199081 1199082 119908

119873

119873

sum

119894=0

119908119894= 1

(7)

The error rate of the new weak classifier is

119890 = sum

119894119888(119904119894) = 119910119894

119908119894 (8)

where 119888(119904119894) is the classification result and 119910

119894is the class label

The fusion weight assigned to each classifier is defined by theerror rate

120572 = ln((1 minus 119890)119890

) (9)

At the beginning of the algorithm each sample is assignedby equal weight During the iteration the sample weights areupdated

119908119894+1

=

119908119894times 120573 119888 (119904

119894) = 119910119894

119908119894 119888 (119904

119894) = 119910119894

(10)

At the arrival of the new data assuming that we knowthe label information for each sample pretrained classifiersfrom the offline data are used as initial weak classifiers Ada-Boost algorithm is applied to the new online data and fusionweights are reassigned to the offline trained classifiers

At the first119898 initial iterations119898 pretrained classifiers areused as the weak classifiers and added to the final ensembleclassifier instead of training new weak classifiers from therandomly selected dataset After the119898 initial iterations newweak classifiers are trained from the new online data andadded to the final ensemble classifier in the AdaBoostalgorithm

The major difference between the online training and theoffline training is the data used for learning Offline train-ing uses large acted data while online training uses small andnatural data Offline training is independent of the onlinetraining and ready to use while the online training is depen-dent on the offline training and only retrains the existingmodel to fit specific purposes such as to tune on a largenumber of speakers The purpose of online training is toquickly adapt the existing offline model to a small amountof new data

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Practical Speech Emotion Recognition ...

2 Mathematical Problems in Engineering

proposed by Xiao et al [13] The techniques used in suchparameter estimation may not be generalized In the secondcategory the incremental learning algorithm is not depen-dent on a specific type of classifiers Multiple classifiers arecreated and combined by a certain fusion rule such as major-ity vote Boosting is a typical type of algorithms that fall intothe second category By creating weak classifiers using se-lected data we may add new training data to the learningprocedure and gradually adapt the SER system in an onlineenvironment

In this paper we explore the possibility of transferringpretrained SER system from acted data to more naturalisticdata in an online learning framework Section 2 describesour acted data and elicited data Section 3 provides acousticanalysis of emotional features In Section 4 we introduce ourspeech emotion recognizer and the online learning method-ology Finally in Section 5 we provide the experimentalresults which show that combining the acted data and theelicited data using online learning brings us the best result

2 Three Types of Data Sources

In this paper we use three types of data sources to validate ourSER system (i) acted basic emotion database (ii) speaker-independent emotion database and (iii) elicited emotiondatabase

The first database contains the basic emotions includinghappiness anger surprise sadness fear and neutrality Theemotional speech is recorded by professional actors andactress six males and six femalesThis acted database may beused as a standard training dataset for our baseline recog-nizer However in real world applications the naturalisticemotional speech is different from the acted speech

The second database is designed for speaker-independenttest which includes fifty-one different speakers Other thana large number of speakers a special type of emotion isconsidered namely fidgetiness Fidgetiness is an importanttype of emotion in cognitive related tasks It may be inducedby repeated work environmental noise and stress The sec-ond database contains five emotions as shown in Table 1This database may be used for testing the ability of speakeradaptationWhen using training data from the first databaseit is challenging to test our SER system on the second data-base due to many unknown speakers

The third database contains elicited speech in a cognitivetask as shown in Table 2 The first row shows the emotiontypes collected in our experiments such as fidgetiness con-fidence and tirednessThe second row is the speaker numberrelated to each type of emotion The third row is the maleand female proportion in the emotion data The last row isthe number of utterances in each emotion class The datais collected locally in our lab We carried out a cognitiveexperiment and collected the emotional speech related tocognitive performance Subject was required to work on aset of math calculations and to report the results orallyDuring the cognitive task the speech signals were recordedand annotated with emotional labels

In the third database ldquocorrect answerrdquo or ldquofalse answerrdquolabels are marked on each utterance in the oral report by

Table 1 The Speaker-independent emotion dataset

Emotion type Happiness Anger Fidgetiness Sadness NeutralitySpeakernumber 51 51 51 51 51

Malefemale 2328 2328 2328 2328 2328Utterance size 2200 2200 2200 2200 2200

Table 2 The Elicited Emotion Dataset

Emotiontype Confidence Tiredness Fidgetiness Happiness Neutrality

Speakernumber 6 6 6 6 6

Malefemale 33 33 33 33 33

Utterancesize 1200 1200 1200 1200 1200

the listeners who have not participated in the eliciting exper-iment Therefore we may calculate the percentage of falseanswers in the negative emotion samples and the percentageof negative emotion in the ldquofalse answerrdquo samples Resultsshow that the proportion of the mistake made in the mathcalculation is higher with the presence of negative emotionsas shown in Figures 1 and 2The purpose of this database is tostudy the cognitive related emotions in speech The analysisshows the dependency between the mistakes made in themath calculation and the negative emotions in the speech

3 Feature Analysis

31 Acoustic Feature Extraction Emotional information ishidden in the speech signals Unlike the linguistic informa-tion it is difficult to find the related acoustic features There-fore feature analysis and selection are very important steps inbuilding an SER system

We selected typical utterances to study the feature vari-ance caused by emotional change as shown in Figures 3 45 6 7 8 9 10 and 11 To better reflect the change caused byemotional information we fix the context of these utterances

The utterances shown in the figures are recorded from thesame speaker By comparing the utterances under differentemotional state from the same speaker we can exclude theinfluence brought by different speaking habits and personali-ties It reveals the changes in the acoustic features caused onlyby the emotional information

We induced three types of practical emotions from acognitive task namely fidgetiness confidence and tirednessWe also studied the basic emotions like happiness angersurprise sadness and fearThe intensity feature and the pitchcontour are extracted and demonstrated in Figure 3 throughFigure 11

The first syllable is not normal speech under the fear emo-tional state The pitch feature is missing and it is whisperedspeech under the emotional state of fear Under the tirednessemotion state the pitch contour is low and flat which is quitedistinguishable from other emotion states

Mathematical Problems in Engineering 3

Mistakes

Negative emotions

Positive emotions

Figure 1The percentage of negative emotions whenmistake occursin the cognitive task

Negative emotions

Correct answersFalse answers

Figure 2The percentage of correct answers and false answers whennegative emotion occurs in the cognitive task

In the neutral speech the pitch contour is also flat but atthe end of the sentence the pitch frequency increases Com-paring speaking the pitch frequency is not consistent at theend of the sentence Under the sadness emotion state thepitch contour is smooth and decreases at the end of the sen-tence Furthermore in the happiness sample the varianceof the pitch frequency is higher The pith frequency also in-creases in the confidence and surprise samples

We also notice that under the angry emotion state thevariance of the intensity is lower and the intensity contouris smooth However in the sadness sample the varianceof the intensity is higher Sadness and tiredness may havecaused longer time duration and a lower speech rate whilefidgetiness and anger may have caused a higher speech rate

Quantitative statistical analysis is shown in Figure 12Pitch and formants features are compared under variousemotional states

For modeling and recognition purposes 481 dimensionsof acoustic features are constructed Statistic functions over

Time (s)0 2337

0

600

Freq

uenc

y (H

z)

Time (s)0 2337

minus06864

06402

0

Time (s)0 2337

5374

8552

Inte

nsity

(dB)

Figure 3 Intensity and pitch contour of happiness

Time (s)0 3587

minus0708

07079

0

Time (s)0 3587

0

600

Freq

uenc

y (H

z)

Time (s)0 3587

1393

8779

Inte

nsity

(dB)

Figure 4 Intensity and pitch contour of sadness

the entire utterance such as maximum minimum meanrange are applied to the basic speech features as listed belowldquodrdquo stands for difference and ldquod2rdquo stands for the second orderof difference

Feature 1ndash6 mean maximum minimum medianrange and variance of Short-time Energy (SE)Feature 7ndash18 mean maximum minimum medianrange and variance of dSE and d2SE

4 Mathematical Problems in Engineering

Time (s)0 2716

minus07079

07079

0

Time (s)0 2716

0

600

Freq

uenc

y (H

z)

Time (s)0 2716

minus2878

8675

Inte

nsity

(dB)

Figure 5 Intensity and pitch contour of fidgetiness

Time (s)0 286

minus0708

06984

0

Time (s)0 286

0

600

Freq

uenc

y (H

z)

Time (s)0 286

1434

8812

Inte

nsity

(dB)

Figure 6 Intensity and pitch contour of surprise

Feature 19ndash24 mean maximum minimum medianrange and variance of pitch frequency (F

0)

Feature 25ndash36 mean maximum minimum medianrange and variance of dF

0and d2F

0

Feature 37ndash42 mean maximum minimum medianrange and variance of Zero-Crossing Rate (ZCR)

Time (s)0 2575

-06931

07073

0

Time (s)0 2575

0

600

Freq

uenc

y (H

z)Time (s)

0 25751474

8628

Inte

nsity

(dB)

Figure 7 Intensity and pitch contour of fear

Time (s)0 4061

minus0708

07079

0

Time (s)0 4061

0

600

Freq

uenc

y (H

z)

Time (s)0 4061

minus300

8682

Inte

nsity

(dB)

Figure 8 Intensity and pitch contour of tiredness

Feature 43ndash54 mean maximum minimum medianrange and variance of dZCR and d2ZCR

Feature 55 speech rate (SR)

Feature 56ndash57 Pitch Jitter1 (PJ1) Pitch Jitter2 (PJ2)

Feature 58ndash61 0ndash250HzEnergyRatio (ER) 0ndash650HzER and 4 kHz above ER and Energy Shimmer (ESH)

Mathematical Problems in Engineering 5

Time (s)0 17

minus07079

07079

0

Time (s)0 17

0

500

Freq

uenc

y (H

z)

Time (s)0 17

1539

845

Inte

nsity

(dB)

Figure 9 Intensity and pitch contour of anger

Time (s)0 1919

minus07599

06849

0

Time (s)0 1919

0

600

Freq

uenc

y (H

z)

Time (s)0 1919

154

8567

Inte

nsity

(dB)

Figure 10 Intensity and pitch contour of neutrality

Feature 62ndash65Voiced Frames (VF)Unvoiced Frames(UF) UFVF and VF(UF+VF)

Feature 66ndash69 Voiced Segments (VS) Unvoiced Seg-ments (US) USVS and VS(US+VS)

Feature 70-71 Maximum Voiced Duration (MVD)Maximum Unvoiced Duration (MUD)

Time (s)0 2354

minus07079

07079

0

Time (s)0 2354

0

600

Freq

uenc

y (H

z)

Time (s)0 2354

minus09446

8677

Inte

nsity

(dB)

Figure 11 Intensity and pitch contour of confidence

Feature 72ndash77 mean maximum minimum medianrange and variance of Harmonic-to-Noise Ratio(HNR)

Feature 78ndash95 mean maximum minimum medianrange and variance of HNR (0ndash400Hz 400ndash2000Hz and 2000ndash5000Hz)

Feature 96ndash119 meanmaximumminimummedianrange and variance of 1st formant frequency (F1) 2ndformant frequency (F2) 3rd formant frequency (F3)and 4th formant frequency (F4)

Feature 120ndash143 mean maximum minimum me-dian range and variance of dF1 dF2 dF3 and dF4

Feature 144ndash167 mean maximum minimum me-dian range and variance of d2F1 d2F2 d2F3 andd2F4

Feature 168ndash171 Jitter1 of F1 F2 F3 and F4

Feature 172ndash175 Jitter2 of F1 F2 F3 and F4

Feature 176ndash199 mean maximum minimum me-dian range and variance of F1 F2 F3 and F4 Band-width

Feature 200ndash223 mean maximum minimum me-dian range and variance of dF1 Bandwidth dF2Bandwidth dF3 Bandwidth and dF4 Bandwidth

Feature 224ndash247 mean maximum minimum me-dian range and variance of d2F1 Bandwidth d2F2Bandwidth d2F3 Bandwidth and d2F4 Bandwidth

Feature 248ndash325 mean maximum minimum me-dian range and variance of MFCC (0ndash12th-order)

6 Mathematical Problems in Engineering

1 09521

067120801407877

002040608

112

Fidgetiness Happiness Confidence Tiredness Neutrality

Nor

mal

ized

mea

n pi

tch

frequ

ency

(a) Normalised mean pitch frequency

735852

789705

791

0100200300400500600700800900

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n fir

st fo

rman

t fre

quen

cy

(b) Mean first formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n se

cond

form

ant

frequ

ency 1780

1842 1853

17751746

16801700172017401760178018001820184018601880

(c) Mean second formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n th

ird fo

rman

t fre

quen

cy

2998 2985

3164

3078

2949

280028502900295030003050310031503200

(d) Mean third formant frequency (Hz)

Figure 12 Feature distribution over various emotional states

Feature 326ndash403 mean maximum minimum me-dian range and variance of dMFCC (0ndash12th-order)Feature 404ndash481 mean maximum minimum me-dian range and variance of d2MFCC (0ndash12th-order)

32 Feature Selection Based onMIC In this section we intro-duce the feature selection algorithm in our speech emotionclassifier Feature selection algorithms may be roughly clas-sified into two groups namely ldquowrapperrdquo and ldquofilterrdquo Algo-rithms in the former group are dependent on the specific clas-sifiers such as sequential forward selection (SFS) The finalselection result is dependent on a specific classifier If we re-place the specific classifier the results will change In thesecond group feature selection is done by a certain evaluationcriteria such as FisherDiscriminant Ratio (FDR)The feature

Angry

Fidgetiness

Fear

Surprise

Neutrality

Happiness

Confidence

Sadness

Tiredness

minus1

minus08

minus06

minus04

minus02

0

02

04

06

08

1

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1Aro

usal

Valence

Figure 13 The arousal and the valence dimensions of emotions

selection result achieved in this type of method is not de-pendent on specific classifiers and bears a better generalityacross different databases

Maximal information coefficient (MIC) based feature se-lection algorithm falls into the second group MIC is a newstatistic tool that measures linear and nonlinear relationshipsbetween paired variables invented by Reshef et al [14]

MIC is based on the idea that if a relationship existsbetween two variables then a grid can be drawn on the scat-terplot of the two variables that partitions the data to encap-sulate that relationship [14] We may calculate the MIC of acertain acoustic feature and the emotional state by exploringall possible grids on the two variables First we computefor every pair of integers (119909 119910) that largest possible mutualinformation achieved by any 119909-by-119910 grid [14] Second for afair comparison we normalize these MIC values between allacoustic features and the emotional state Detailed study ofMIC may be found in [14]

Since MIC can treat linear and nonlinear associations atthe same time we do not need tomake any assumption on thedistribution of our original features Therefore it is especiallysuitable for evaluating a large number of emotional featuresBased on a large number of basic features as described inSection 31 we apply MIC to measure the contribution ofthese features in correlation with emotion states Finally asubset of features is selected for our emotion classifier

4 Recognition Methodology

41 Baseline GMM Classifier The Gaussian mixture model(GMM) based classifier is the state-of-the-art recognitionmethod in speaker and language identification In this paperwe built the baseline classifier using Gaussianmixturemodeland we may compare the baseline classifier with the onlinelearning method

Mathematical Problems in Engineering 7

GMM may be defined by the sum of several Gaussiandistributions

119901 (X119905| 120582) =

119872

sum

119894=1

119886119894119887119894(X119905) (1)

where X119905is a 119863-dimension random vector 119887

119894(X119905) is the 119894th

member of Gaussian distribution 119905 is the index of utterancesample 119886

119894is the mixture weight and 119872 is the number of

Gaussian mixture members Each member is a119863-dimensionvariable which follows the Gaussian distribution with themean U

119894and the covariance Σ

119894

119887119894(X119905) =

1

(2120587)119863210038161003816

10038161003816Σ119894

1003816100381610038161003816

12exp minus1

2

(X119905minus U119894)119879

Σminus1

119894(X119905minus U119894)

(2)

Note that119872

sum

119894=1

119886119894= 1 (3)

Emotion classification can be done by maximizing theposterior probability

EmotionLable = argmax119896

(119901 (X119905| 120582119896)) (4)

ExpectationMaximization (EM) is adopted forGMMparam-eter estimation [15]

119886119894

119898=

sum119879

119905=1120574119894

119905119898

sum119879

119905=1sum119872

119898=1120574119894

119905119898

U119894119898=

sum119879

119905=1120574119894

119905119898X119905

sum119879

119905=1120574119894

119905119898

Σ119894

119898=

sum119879

119905=1120574119894

119905119898(X119905minus U119894119898) (X119905minus U119894119898)

119879

sum119879

119905=1120574119894

119905119898

120574119894

119905119898=

119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

sum119872

119898=1119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

(5)

Due to the different types of emotions among the datasetswe unify the emotional datasets by categorizing them intopositive and negative regions in the valence and arousal di-mensions as shown in Figure 13 We may verify the ability ofthe emotion classifier by classifying the emotional utterancesinto different regions in the valence and arousal space

42 Online LearningUsingAdaBoost While the offlineGMMclassifier is trained using EM algorithm the online trainingalgorithmusingAdabBoost will be introduced in this sectionAdaBoost is a powerful algorithm in assemble learning [16]The belief in this AdaBoost is that weak classifiers may becombined into a powerful classifier Multiple classifierstrained on randomly selected datasets perform quiet differ-ently from each other on the same testing dataset therefore

we may reduce the misclassification rate by a proper decisionfusion rule

AdaBoost algorithm consists of several iterations In eachiteration a new training set is selected for a new weak clas-sifier A weight is assigned to the new weak classifier Basedon the testing results of the newweak classifier the weights ofall the data samples are modified for the next iteration At thefinal step the assembled classifier is achieved by combinationof themultipleweak classified through aweighted voting rule

Let us suppose the current training set is [17]

119879 = 1199041 1199042 119904

119873 (6)

where the weights of the samples are

119882 = 1199081 1199082 119908

119873

119873

sum

119894=0

119908119894= 1

(7)

The error rate of the new weak classifier is

119890 = sum

119894119888(119904119894) = 119910119894

119908119894 (8)

where 119888(119904119894) is the classification result and 119910

119894is the class label

The fusion weight assigned to each classifier is defined by theerror rate

120572 = ln((1 minus 119890)119890

) (9)

At the beginning of the algorithm each sample is assignedby equal weight During the iteration the sample weights areupdated

119908119894+1

=

119908119894times 120573 119888 (119904

119894) = 119910119894

119908119894 119888 (119904

119894) = 119910119894

(10)

At the arrival of the new data assuming that we knowthe label information for each sample pretrained classifiersfrom the offline data are used as initial weak classifiers Ada-Boost algorithm is applied to the new online data and fusionweights are reassigned to the offline trained classifiers

At the first119898 initial iterations119898 pretrained classifiers areused as the weak classifiers and added to the final ensembleclassifier instead of training new weak classifiers from therandomly selected dataset After the119898 initial iterations newweak classifiers are trained from the new online data andadded to the final ensemble classifier in the AdaBoostalgorithm

The major difference between the online training and theoffline training is the data used for learning Offline train-ing uses large acted data while online training uses small andnatural data Offline training is independent of the onlinetraining and ready to use while the online training is depen-dent on the offline training and only retrains the existingmodel to fit specific purposes such as to tune on a largenumber of speakers The purpose of online training is toquickly adapt the existing offline model to a small amountof new data

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Practical Speech Emotion Recognition ...

Mathematical Problems in Engineering 3

Mistakes

Negative emotions

Positive emotions

Figure 1The percentage of negative emotions whenmistake occursin the cognitive task

Negative emotions

Correct answersFalse answers

Figure 2The percentage of correct answers and false answers whennegative emotion occurs in the cognitive task

In the neutral speech the pitch contour is also flat but atthe end of the sentence the pitch frequency increases Com-paring speaking the pitch frequency is not consistent at theend of the sentence Under the sadness emotion state thepitch contour is smooth and decreases at the end of the sen-tence Furthermore in the happiness sample the varianceof the pitch frequency is higher The pith frequency also in-creases in the confidence and surprise samples

We also notice that under the angry emotion state thevariance of the intensity is lower and the intensity contouris smooth However in the sadness sample the varianceof the intensity is higher Sadness and tiredness may havecaused longer time duration and a lower speech rate whilefidgetiness and anger may have caused a higher speech rate

Quantitative statistical analysis is shown in Figure 12Pitch and formants features are compared under variousemotional states

For modeling and recognition purposes 481 dimensionsof acoustic features are constructed Statistic functions over

Time (s)0 2337

0

600

Freq

uenc

y (H

z)

Time (s)0 2337

minus06864

06402

0

Time (s)0 2337

5374

8552

Inte

nsity

(dB)

Figure 3 Intensity and pitch contour of happiness

Time (s)0 3587

minus0708

07079

0

Time (s)0 3587

0

600

Freq

uenc

y (H

z)

Time (s)0 3587

1393

8779

Inte

nsity

(dB)

Figure 4 Intensity and pitch contour of sadness

the entire utterance such as maximum minimum meanrange are applied to the basic speech features as listed belowldquodrdquo stands for difference and ldquod2rdquo stands for the second orderof difference

Feature 1ndash6 mean maximum minimum medianrange and variance of Short-time Energy (SE)Feature 7ndash18 mean maximum minimum medianrange and variance of dSE and d2SE

4 Mathematical Problems in Engineering

Time (s)0 2716

minus07079

07079

0

Time (s)0 2716

0

600

Freq

uenc

y (H

z)

Time (s)0 2716

minus2878

8675

Inte

nsity

(dB)

Figure 5 Intensity and pitch contour of fidgetiness

Time (s)0 286

minus0708

06984

0

Time (s)0 286

0

600

Freq

uenc

y (H

z)

Time (s)0 286

1434

8812

Inte

nsity

(dB)

Figure 6 Intensity and pitch contour of surprise

Feature 19ndash24 mean maximum minimum medianrange and variance of pitch frequency (F

0)

Feature 25ndash36 mean maximum minimum medianrange and variance of dF

0and d2F

0

Feature 37ndash42 mean maximum minimum medianrange and variance of Zero-Crossing Rate (ZCR)

Time (s)0 2575

-06931

07073

0

Time (s)0 2575

0

600

Freq

uenc

y (H

z)Time (s)

0 25751474

8628

Inte

nsity

(dB)

Figure 7 Intensity and pitch contour of fear

Time (s)0 4061

minus0708

07079

0

Time (s)0 4061

0

600

Freq

uenc

y (H

z)

Time (s)0 4061

minus300

8682

Inte

nsity

(dB)

Figure 8 Intensity and pitch contour of tiredness

Feature 43ndash54 mean maximum minimum medianrange and variance of dZCR and d2ZCR

Feature 55 speech rate (SR)

Feature 56ndash57 Pitch Jitter1 (PJ1) Pitch Jitter2 (PJ2)

Feature 58ndash61 0ndash250HzEnergyRatio (ER) 0ndash650HzER and 4 kHz above ER and Energy Shimmer (ESH)

Mathematical Problems in Engineering 5

Time (s)0 17

minus07079

07079

0

Time (s)0 17

0

500

Freq

uenc

y (H

z)

Time (s)0 17

1539

845

Inte

nsity

(dB)

Figure 9 Intensity and pitch contour of anger

Time (s)0 1919

minus07599

06849

0

Time (s)0 1919

0

600

Freq

uenc

y (H

z)

Time (s)0 1919

154

8567

Inte

nsity

(dB)

Figure 10 Intensity and pitch contour of neutrality

Feature 62ndash65Voiced Frames (VF)Unvoiced Frames(UF) UFVF and VF(UF+VF)

Feature 66ndash69 Voiced Segments (VS) Unvoiced Seg-ments (US) USVS and VS(US+VS)

Feature 70-71 Maximum Voiced Duration (MVD)Maximum Unvoiced Duration (MUD)

Time (s)0 2354

minus07079

07079

0

Time (s)0 2354

0

600

Freq

uenc

y (H

z)

Time (s)0 2354

minus09446

8677

Inte

nsity

(dB)

Figure 11 Intensity and pitch contour of confidence

Feature 72ndash77 mean maximum minimum medianrange and variance of Harmonic-to-Noise Ratio(HNR)

Feature 78ndash95 mean maximum minimum medianrange and variance of HNR (0ndash400Hz 400ndash2000Hz and 2000ndash5000Hz)

Feature 96ndash119 meanmaximumminimummedianrange and variance of 1st formant frequency (F1) 2ndformant frequency (F2) 3rd formant frequency (F3)and 4th formant frequency (F4)

Feature 120ndash143 mean maximum minimum me-dian range and variance of dF1 dF2 dF3 and dF4

Feature 144ndash167 mean maximum minimum me-dian range and variance of d2F1 d2F2 d2F3 andd2F4

Feature 168ndash171 Jitter1 of F1 F2 F3 and F4

Feature 172ndash175 Jitter2 of F1 F2 F3 and F4

Feature 176ndash199 mean maximum minimum me-dian range and variance of F1 F2 F3 and F4 Band-width

Feature 200ndash223 mean maximum minimum me-dian range and variance of dF1 Bandwidth dF2Bandwidth dF3 Bandwidth and dF4 Bandwidth

Feature 224ndash247 mean maximum minimum me-dian range and variance of d2F1 Bandwidth d2F2Bandwidth d2F3 Bandwidth and d2F4 Bandwidth

Feature 248ndash325 mean maximum minimum me-dian range and variance of MFCC (0ndash12th-order)

6 Mathematical Problems in Engineering

1 09521

067120801407877

002040608

112

Fidgetiness Happiness Confidence Tiredness Neutrality

Nor

mal

ized

mea

n pi

tch

frequ

ency

(a) Normalised mean pitch frequency

735852

789705

791

0100200300400500600700800900

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n fir

st fo

rman

t fre

quen

cy

(b) Mean first formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n se

cond

form

ant

frequ

ency 1780

1842 1853

17751746

16801700172017401760178018001820184018601880

(c) Mean second formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n th

ird fo

rman

t fre

quen

cy

2998 2985

3164

3078

2949

280028502900295030003050310031503200

(d) Mean third formant frequency (Hz)

Figure 12 Feature distribution over various emotional states

Feature 326ndash403 mean maximum minimum me-dian range and variance of dMFCC (0ndash12th-order)Feature 404ndash481 mean maximum minimum me-dian range and variance of d2MFCC (0ndash12th-order)

32 Feature Selection Based onMIC In this section we intro-duce the feature selection algorithm in our speech emotionclassifier Feature selection algorithms may be roughly clas-sified into two groups namely ldquowrapperrdquo and ldquofilterrdquo Algo-rithms in the former group are dependent on the specific clas-sifiers such as sequential forward selection (SFS) The finalselection result is dependent on a specific classifier If we re-place the specific classifier the results will change In thesecond group feature selection is done by a certain evaluationcriteria such as FisherDiscriminant Ratio (FDR)The feature

Angry

Fidgetiness

Fear

Surprise

Neutrality

Happiness

Confidence

Sadness

Tiredness

minus1

minus08

minus06

minus04

minus02

0

02

04

06

08

1

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1Aro

usal

Valence

Figure 13 The arousal and the valence dimensions of emotions

selection result achieved in this type of method is not de-pendent on specific classifiers and bears a better generalityacross different databases

Maximal information coefficient (MIC) based feature se-lection algorithm falls into the second group MIC is a newstatistic tool that measures linear and nonlinear relationshipsbetween paired variables invented by Reshef et al [14]

MIC is based on the idea that if a relationship existsbetween two variables then a grid can be drawn on the scat-terplot of the two variables that partitions the data to encap-sulate that relationship [14] We may calculate the MIC of acertain acoustic feature and the emotional state by exploringall possible grids on the two variables First we computefor every pair of integers (119909 119910) that largest possible mutualinformation achieved by any 119909-by-119910 grid [14] Second for afair comparison we normalize these MIC values between allacoustic features and the emotional state Detailed study ofMIC may be found in [14]

Since MIC can treat linear and nonlinear associations atthe same time we do not need tomake any assumption on thedistribution of our original features Therefore it is especiallysuitable for evaluating a large number of emotional featuresBased on a large number of basic features as described inSection 31 we apply MIC to measure the contribution ofthese features in correlation with emotion states Finally asubset of features is selected for our emotion classifier

4 Recognition Methodology

41 Baseline GMM Classifier The Gaussian mixture model(GMM) based classifier is the state-of-the-art recognitionmethod in speaker and language identification In this paperwe built the baseline classifier using Gaussianmixturemodeland we may compare the baseline classifier with the onlinelearning method

Mathematical Problems in Engineering 7

GMM may be defined by the sum of several Gaussiandistributions

119901 (X119905| 120582) =

119872

sum

119894=1

119886119894119887119894(X119905) (1)

where X119905is a 119863-dimension random vector 119887

119894(X119905) is the 119894th

member of Gaussian distribution 119905 is the index of utterancesample 119886

119894is the mixture weight and 119872 is the number of

Gaussian mixture members Each member is a119863-dimensionvariable which follows the Gaussian distribution with themean U

119894and the covariance Σ

119894

119887119894(X119905) =

1

(2120587)119863210038161003816

10038161003816Σ119894

1003816100381610038161003816

12exp minus1

2

(X119905minus U119894)119879

Σminus1

119894(X119905minus U119894)

(2)

Note that119872

sum

119894=1

119886119894= 1 (3)

Emotion classification can be done by maximizing theposterior probability

EmotionLable = argmax119896

(119901 (X119905| 120582119896)) (4)

ExpectationMaximization (EM) is adopted forGMMparam-eter estimation [15]

119886119894

119898=

sum119879

119905=1120574119894

119905119898

sum119879

119905=1sum119872

119898=1120574119894

119905119898

U119894119898=

sum119879

119905=1120574119894

119905119898X119905

sum119879

119905=1120574119894

119905119898

Σ119894

119898=

sum119879

119905=1120574119894

119905119898(X119905minus U119894119898) (X119905minus U119894119898)

119879

sum119879

119905=1120574119894

119905119898

120574119894

119905119898=

119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

sum119872

119898=1119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

(5)

Due to the different types of emotions among the datasetswe unify the emotional datasets by categorizing them intopositive and negative regions in the valence and arousal di-mensions as shown in Figure 13 We may verify the ability ofthe emotion classifier by classifying the emotional utterancesinto different regions in the valence and arousal space

42 Online LearningUsingAdaBoost While the offlineGMMclassifier is trained using EM algorithm the online trainingalgorithmusingAdabBoost will be introduced in this sectionAdaBoost is a powerful algorithm in assemble learning [16]The belief in this AdaBoost is that weak classifiers may becombined into a powerful classifier Multiple classifierstrained on randomly selected datasets perform quiet differ-ently from each other on the same testing dataset therefore

we may reduce the misclassification rate by a proper decisionfusion rule

AdaBoost algorithm consists of several iterations In eachiteration a new training set is selected for a new weak clas-sifier A weight is assigned to the new weak classifier Basedon the testing results of the newweak classifier the weights ofall the data samples are modified for the next iteration At thefinal step the assembled classifier is achieved by combinationof themultipleweak classified through aweighted voting rule

Let us suppose the current training set is [17]

119879 = 1199041 1199042 119904

119873 (6)

where the weights of the samples are

119882 = 1199081 1199082 119908

119873

119873

sum

119894=0

119908119894= 1

(7)

The error rate of the new weak classifier is

119890 = sum

119894119888(119904119894) = 119910119894

119908119894 (8)

where 119888(119904119894) is the classification result and 119910

119894is the class label

The fusion weight assigned to each classifier is defined by theerror rate

120572 = ln((1 minus 119890)119890

) (9)

At the beginning of the algorithm each sample is assignedby equal weight During the iteration the sample weights areupdated

119908119894+1

=

119908119894times 120573 119888 (119904

119894) = 119910119894

119908119894 119888 (119904

119894) = 119910119894

(10)

At the arrival of the new data assuming that we knowthe label information for each sample pretrained classifiersfrom the offline data are used as initial weak classifiers Ada-Boost algorithm is applied to the new online data and fusionweights are reassigned to the offline trained classifiers

At the first119898 initial iterations119898 pretrained classifiers areused as the weak classifiers and added to the final ensembleclassifier instead of training new weak classifiers from therandomly selected dataset After the119898 initial iterations newweak classifiers are trained from the new online data andadded to the final ensemble classifier in the AdaBoostalgorithm

The major difference between the online training and theoffline training is the data used for learning Offline train-ing uses large acted data while online training uses small andnatural data Offline training is independent of the onlinetraining and ready to use while the online training is depen-dent on the offline training and only retrains the existingmodel to fit specific purposes such as to tune on a largenumber of speakers The purpose of online training is toquickly adapt the existing offline model to a small amountof new data

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Practical Speech Emotion Recognition ...

4 Mathematical Problems in Engineering

Time (s)0 2716

minus07079

07079

0

Time (s)0 2716

0

600

Freq

uenc

y (H

z)

Time (s)0 2716

minus2878

8675

Inte

nsity

(dB)

Figure 5 Intensity and pitch contour of fidgetiness

Time (s)0 286

minus0708

06984

0

Time (s)0 286

0

600

Freq

uenc

y (H

z)

Time (s)0 286

1434

8812

Inte

nsity

(dB)

Figure 6 Intensity and pitch contour of surprise

Feature 19ndash24 mean maximum minimum medianrange and variance of pitch frequency (F

0)

Feature 25ndash36 mean maximum minimum medianrange and variance of dF

0and d2F

0

Feature 37ndash42 mean maximum minimum medianrange and variance of Zero-Crossing Rate (ZCR)

Time (s)0 2575

-06931

07073

0

Time (s)0 2575

0

600

Freq

uenc

y (H

z)Time (s)

0 25751474

8628

Inte

nsity

(dB)

Figure 7 Intensity and pitch contour of fear

Time (s)0 4061

minus0708

07079

0

Time (s)0 4061

0

600

Freq

uenc

y (H

z)

Time (s)0 4061

minus300

8682

Inte

nsity

(dB)

Figure 8 Intensity and pitch contour of tiredness

Feature 43ndash54 mean maximum minimum medianrange and variance of dZCR and d2ZCR

Feature 55 speech rate (SR)

Feature 56ndash57 Pitch Jitter1 (PJ1) Pitch Jitter2 (PJ2)

Feature 58ndash61 0ndash250HzEnergyRatio (ER) 0ndash650HzER and 4 kHz above ER and Energy Shimmer (ESH)

Mathematical Problems in Engineering 5

Time (s)0 17

minus07079

07079

0

Time (s)0 17

0

500

Freq

uenc

y (H

z)

Time (s)0 17

1539

845

Inte

nsity

(dB)

Figure 9 Intensity and pitch contour of anger

Time (s)0 1919

minus07599

06849

0

Time (s)0 1919

0

600

Freq

uenc

y (H

z)

Time (s)0 1919

154

8567

Inte

nsity

(dB)

Figure 10 Intensity and pitch contour of neutrality

Feature 62ndash65Voiced Frames (VF)Unvoiced Frames(UF) UFVF and VF(UF+VF)

Feature 66ndash69 Voiced Segments (VS) Unvoiced Seg-ments (US) USVS and VS(US+VS)

Feature 70-71 Maximum Voiced Duration (MVD)Maximum Unvoiced Duration (MUD)

Time (s)0 2354

minus07079

07079

0

Time (s)0 2354

0

600

Freq

uenc

y (H

z)

Time (s)0 2354

minus09446

8677

Inte

nsity

(dB)

Figure 11 Intensity and pitch contour of confidence

Feature 72ndash77 mean maximum minimum medianrange and variance of Harmonic-to-Noise Ratio(HNR)

Feature 78ndash95 mean maximum minimum medianrange and variance of HNR (0ndash400Hz 400ndash2000Hz and 2000ndash5000Hz)

Feature 96ndash119 meanmaximumminimummedianrange and variance of 1st formant frequency (F1) 2ndformant frequency (F2) 3rd formant frequency (F3)and 4th formant frequency (F4)

Feature 120ndash143 mean maximum minimum me-dian range and variance of dF1 dF2 dF3 and dF4

Feature 144ndash167 mean maximum minimum me-dian range and variance of d2F1 d2F2 d2F3 andd2F4

Feature 168ndash171 Jitter1 of F1 F2 F3 and F4

Feature 172ndash175 Jitter2 of F1 F2 F3 and F4

Feature 176ndash199 mean maximum minimum me-dian range and variance of F1 F2 F3 and F4 Band-width

Feature 200ndash223 mean maximum minimum me-dian range and variance of dF1 Bandwidth dF2Bandwidth dF3 Bandwidth and dF4 Bandwidth

Feature 224ndash247 mean maximum minimum me-dian range and variance of d2F1 Bandwidth d2F2Bandwidth d2F3 Bandwidth and d2F4 Bandwidth

Feature 248ndash325 mean maximum minimum me-dian range and variance of MFCC (0ndash12th-order)

6 Mathematical Problems in Engineering

1 09521

067120801407877

002040608

112

Fidgetiness Happiness Confidence Tiredness Neutrality

Nor

mal

ized

mea

n pi

tch

frequ

ency

(a) Normalised mean pitch frequency

735852

789705

791

0100200300400500600700800900

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n fir

st fo

rman

t fre

quen

cy

(b) Mean first formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n se

cond

form

ant

frequ

ency 1780

1842 1853

17751746

16801700172017401760178018001820184018601880

(c) Mean second formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n th

ird fo

rman

t fre

quen

cy

2998 2985

3164

3078

2949

280028502900295030003050310031503200

(d) Mean third formant frequency (Hz)

Figure 12 Feature distribution over various emotional states

Feature 326ndash403 mean maximum minimum me-dian range and variance of dMFCC (0ndash12th-order)Feature 404ndash481 mean maximum minimum me-dian range and variance of d2MFCC (0ndash12th-order)

32 Feature Selection Based onMIC In this section we intro-duce the feature selection algorithm in our speech emotionclassifier Feature selection algorithms may be roughly clas-sified into two groups namely ldquowrapperrdquo and ldquofilterrdquo Algo-rithms in the former group are dependent on the specific clas-sifiers such as sequential forward selection (SFS) The finalselection result is dependent on a specific classifier If we re-place the specific classifier the results will change In thesecond group feature selection is done by a certain evaluationcriteria such as FisherDiscriminant Ratio (FDR)The feature

Angry

Fidgetiness

Fear

Surprise

Neutrality

Happiness

Confidence

Sadness

Tiredness

minus1

minus08

minus06

minus04

minus02

0

02

04

06

08

1

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1Aro

usal

Valence

Figure 13 The arousal and the valence dimensions of emotions

selection result achieved in this type of method is not de-pendent on specific classifiers and bears a better generalityacross different databases

Maximal information coefficient (MIC) based feature se-lection algorithm falls into the second group MIC is a newstatistic tool that measures linear and nonlinear relationshipsbetween paired variables invented by Reshef et al [14]

MIC is based on the idea that if a relationship existsbetween two variables then a grid can be drawn on the scat-terplot of the two variables that partitions the data to encap-sulate that relationship [14] We may calculate the MIC of acertain acoustic feature and the emotional state by exploringall possible grids on the two variables First we computefor every pair of integers (119909 119910) that largest possible mutualinformation achieved by any 119909-by-119910 grid [14] Second for afair comparison we normalize these MIC values between allacoustic features and the emotional state Detailed study ofMIC may be found in [14]

Since MIC can treat linear and nonlinear associations atthe same time we do not need tomake any assumption on thedistribution of our original features Therefore it is especiallysuitable for evaluating a large number of emotional featuresBased on a large number of basic features as described inSection 31 we apply MIC to measure the contribution ofthese features in correlation with emotion states Finally asubset of features is selected for our emotion classifier

4 Recognition Methodology

41 Baseline GMM Classifier The Gaussian mixture model(GMM) based classifier is the state-of-the-art recognitionmethod in speaker and language identification In this paperwe built the baseline classifier using Gaussianmixturemodeland we may compare the baseline classifier with the onlinelearning method

Mathematical Problems in Engineering 7

GMM may be defined by the sum of several Gaussiandistributions

119901 (X119905| 120582) =

119872

sum

119894=1

119886119894119887119894(X119905) (1)

where X119905is a 119863-dimension random vector 119887

119894(X119905) is the 119894th

member of Gaussian distribution 119905 is the index of utterancesample 119886

119894is the mixture weight and 119872 is the number of

Gaussian mixture members Each member is a119863-dimensionvariable which follows the Gaussian distribution with themean U

119894and the covariance Σ

119894

119887119894(X119905) =

1

(2120587)119863210038161003816

10038161003816Σ119894

1003816100381610038161003816

12exp minus1

2

(X119905minus U119894)119879

Σminus1

119894(X119905minus U119894)

(2)

Note that119872

sum

119894=1

119886119894= 1 (3)

Emotion classification can be done by maximizing theposterior probability

EmotionLable = argmax119896

(119901 (X119905| 120582119896)) (4)

ExpectationMaximization (EM) is adopted forGMMparam-eter estimation [15]

119886119894

119898=

sum119879

119905=1120574119894

119905119898

sum119879

119905=1sum119872

119898=1120574119894

119905119898

U119894119898=

sum119879

119905=1120574119894

119905119898X119905

sum119879

119905=1120574119894

119905119898

Σ119894

119898=

sum119879

119905=1120574119894

119905119898(X119905minus U119894119898) (X119905minus U119894119898)

119879

sum119879

119905=1120574119894

119905119898

120574119894

119905119898=

119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

sum119872

119898=1119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

(5)

Due to the different types of emotions among the datasetswe unify the emotional datasets by categorizing them intopositive and negative regions in the valence and arousal di-mensions as shown in Figure 13 We may verify the ability ofthe emotion classifier by classifying the emotional utterancesinto different regions in the valence and arousal space

42 Online LearningUsingAdaBoost While the offlineGMMclassifier is trained using EM algorithm the online trainingalgorithmusingAdabBoost will be introduced in this sectionAdaBoost is a powerful algorithm in assemble learning [16]The belief in this AdaBoost is that weak classifiers may becombined into a powerful classifier Multiple classifierstrained on randomly selected datasets perform quiet differ-ently from each other on the same testing dataset therefore

we may reduce the misclassification rate by a proper decisionfusion rule

AdaBoost algorithm consists of several iterations In eachiteration a new training set is selected for a new weak clas-sifier A weight is assigned to the new weak classifier Basedon the testing results of the newweak classifier the weights ofall the data samples are modified for the next iteration At thefinal step the assembled classifier is achieved by combinationof themultipleweak classified through aweighted voting rule

Let us suppose the current training set is [17]

119879 = 1199041 1199042 119904

119873 (6)

where the weights of the samples are

119882 = 1199081 1199082 119908

119873

119873

sum

119894=0

119908119894= 1

(7)

The error rate of the new weak classifier is

119890 = sum

119894119888(119904119894) = 119910119894

119908119894 (8)

where 119888(119904119894) is the classification result and 119910

119894is the class label

The fusion weight assigned to each classifier is defined by theerror rate

120572 = ln((1 minus 119890)119890

) (9)

At the beginning of the algorithm each sample is assignedby equal weight During the iteration the sample weights areupdated

119908119894+1

=

119908119894times 120573 119888 (119904

119894) = 119910119894

119908119894 119888 (119904

119894) = 119910119894

(10)

At the arrival of the new data assuming that we knowthe label information for each sample pretrained classifiersfrom the offline data are used as initial weak classifiers Ada-Boost algorithm is applied to the new online data and fusionweights are reassigned to the offline trained classifiers

At the first119898 initial iterations119898 pretrained classifiers areused as the weak classifiers and added to the final ensembleclassifier instead of training new weak classifiers from therandomly selected dataset After the119898 initial iterations newweak classifiers are trained from the new online data andadded to the final ensemble classifier in the AdaBoostalgorithm

The major difference between the online training and theoffline training is the data used for learning Offline train-ing uses large acted data while online training uses small andnatural data Offline training is independent of the onlinetraining and ready to use while the online training is depen-dent on the offline training and only retrains the existingmodel to fit specific purposes such as to tune on a largenumber of speakers The purpose of online training is toquickly adapt the existing offline model to a small amountof new data

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Practical Speech Emotion Recognition ...

Mathematical Problems in Engineering 5

Time (s)0 17

minus07079

07079

0

Time (s)0 17

0

500

Freq

uenc

y (H

z)

Time (s)0 17

1539

845

Inte

nsity

(dB)

Figure 9 Intensity and pitch contour of anger

Time (s)0 1919

minus07599

06849

0

Time (s)0 1919

0

600

Freq

uenc

y (H

z)

Time (s)0 1919

154

8567

Inte

nsity

(dB)

Figure 10 Intensity and pitch contour of neutrality

Feature 62ndash65Voiced Frames (VF)Unvoiced Frames(UF) UFVF and VF(UF+VF)

Feature 66ndash69 Voiced Segments (VS) Unvoiced Seg-ments (US) USVS and VS(US+VS)

Feature 70-71 Maximum Voiced Duration (MVD)Maximum Unvoiced Duration (MUD)

Time (s)0 2354

minus07079

07079

0

Time (s)0 2354

0

600

Freq

uenc

y (H

z)

Time (s)0 2354

minus09446

8677

Inte

nsity

(dB)

Figure 11 Intensity and pitch contour of confidence

Feature 72ndash77 mean maximum minimum medianrange and variance of Harmonic-to-Noise Ratio(HNR)

Feature 78ndash95 mean maximum minimum medianrange and variance of HNR (0ndash400Hz 400ndash2000Hz and 2000ndash5000Hz)

Feature 96ndash119 meanmaximumminimummedianrange and variance of 1st formant frequency (F1) 2ndformant frequency (F2) 3rd formant frequency (F3)and 4th formant frequency (F4)

Feature 120ndash143 mean maximum minimum me-dian range and variance of dF1 dF2 dF3 and dF4

Feature 144ndash167 mean maximum minimum me-dian range and variance of d2F1 d2F2 d2F3 andd2F4

Feature 168ndash171 Jitter1 of F1 F2 F3 and F4

Feature 172ndash175 Jitter2 of F1 F2 F3 and F4

Feature 176ndash199 mean maximum minimum me-dian range and variance of F1 F2 F3 and F4 Band-width

Feature 200ndash223 mean maximum minimum me-dian range and variance of dF1 Bandwidth dF2Bandwidth dF3 Bandwidth and dF4 Bandwidth

Feature 224ndash247 mean maximum minimum me-dian range and variance of d2F1 Bandwidth d2F2Bandwidth d2F3 Bandwidth and d2F4 Bandwidth

Feature 248ndash325 mean maximum minimum me-dian range and variance of MFCC (0ndash12th-order)

6 Mathematical Problems in Engineering

1 09521

067120801407877

002040608

112

Fidgetiness Happiness Confidence Tiredness Neutrality

Nor

mal

ized

mea

n pi

tch

frequ

ency

(a) Normalised mean pitch frequency

735852

789705

791

0100200300400500600700800900

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n fir

st fo

rman

t fre

quen

cy

(b) Mean first formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n se

cond

form

ant

frequ

ency 1780

1842 1853

17751746

16801700172017401760178018001820184018601880

(c) Mean second formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n th

ird fo

rman

t fre

quen

cy

2998 2985

3164

3078

2949

280028502900295030003050310031503200

(d) Mean third formant frequency (Hz)

Figure 12 Feature distribution over various emotional states

Feature 326ndash403 mean maximum minimum me-dian range and variance of dMFCC (0ndash12th-order)Feature 404ndash481 mean maximum minimum me-dian range and variance of d2MFCC (0ndash12th-order)

32 Feature Selection Based onMIC In this section we intro-duce the feature selection algorithm in our speech emotionclassifier Feature selection algorithms may be roughly clas-sified into two groups namely ldquowrapperrdquo and ldquofilterrdquo Algo-rithms in the former group are dependent on the specific clas-sifiers such as sequential forward selection (SFS) The finalselection result is dependent on a specific classifier If we re-place the specific classifier the results will change In thesecond group feature selection is done by a certain evaluationcriteria such as FisherDiscriminant Ratio (FDR)The feature

Angry

Fidgetiness

Fear

Surprise

Neutrality

Happiness

Confidence

Sadness

Tiredness

minus1

minus08

minus06

minus04

minus02

0

02

04

06

08

1

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1Aro

usal

Valence

Figure 13 The arousal and the valence dimensions of emotions

selection result achieved in this type of method is not de-pendent on specific classifiers and bears a better generalityacross different databases

Maximal information coefficient (MIC) based feature se-lection algorithm falls into the second group MIC is a newstatistic tool that measures linear and nonlinear relationshipsbetween paired variables invented by Reshef et al [14]

MIC is based on the idea that if a relationship existsbetween two variables then a grid can be drawn on the scat-terplot of the two variables that partitions the data to encap-sulate that relationship [14] We may calculate the MIC of acertain acoustic feature and the emotional state by exploringall possible grids on the two variables First we computefor every pair of integers (119909 119910) that largest possible mutualinformation achieved by any 119909-by-119910 grid [14] Second for afair comparison we normalize these MIC values between allacoustic features and the emotional state Detailed study ofMIC may be found in [14]

Since MIC can treat linear and nonlinear associations atthe same time we do not need tomake any assumption on thedistribution of our original features Therefore it is especiallysuitable for evaluating a large number of emotional featuresBased on a large number of basic features as described inSection 31 we apply MIC to measure the contribution ofthese features in correlation with emotion states Finally asubset of features is selected for our emotion classifier

4 Recognition Methodology

41 Baseline GMM Classifier The Gaussian mixture model(GMM) based classifier is the state-of-the-art recognitionmethod in speaker and language identification In this paperwe built the baseline classifier using Gaussianmixturemodeland we may compare the baseline classifier with the onlinelearning method

Mathematical Problems in Engineering 7

GMM may be defined by the sum of several Gaussiandistributions

119901 (X119905| 120582) =

119872

sum

119894=1

119886119894119887119894(X119905) (1)

where X119905is a 119863-dimension random vector 119887

119894(X119905) is the 119894th

member of Gaussian distribution 119905 is the index of utterancesample 119886

119894is the mixture weight and 119872 is the number of

Gaussian mixture members Each member is a119863-dimensionvariable which follows the Gaussian distribution with themean U

119894and the covariance Σ

119894

119887119894(X119905) =

1

(2120587)119863210038161003816

10038161003816Σ119894

1003816100381610038161003816

12exp minus1

2

(X119905minus U119894)119879

Σminus1

119894(X119905minus U119894)

(2)

Note that119872

sum

119894=1

119886119894= 1 (3)

Emotion classification can be done by maximizing theposterior probability

EmotionLable = argmax119896

(119901 (X119905| 120582119896)) (4)

ExpectationMaximization (EM) is adopted forGMMparam-eter estimation [15]

119886119894

119898=

sum119879

119905=1120574119894

119905119898

sum119879

119905=1sum119872

119898=1120574119894

119905119898

U119894119898=

sum119879

119905=1120574119894

119905119898X119905

sum119879

119905=1120574119894

119905119898

Σ119894

119898=

sum119879

119905=1120574119894

119905119898(X119905minus U119894119898) (X119905minus U119894119898)

119879

sum119879

119905=1120574119894

119905119898

120574119894

119905119898=

119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

sum119872

119898=1119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

(5)

Due to the different types of emotions among the datasetswe unify the emotional datasets by categorizing them intopositive and negative regions in the valence and arousal di-mensions as shown in Figure 13 We may verify the ability ofthe emotion classifier by classifying the emotional utterancesinto different regions in the valence and arousal space

42 Online LearningUsingAdaBoost While the offlineGMMclassifier is trained using EM algorithm the online trainingalgorithmusingAdabBoost will be introduced in this sectionAdaBoost is a powerful algorithm in assemble learning [16]The belief in this AdaBoost is that weak classifiers may becombined into a powerful classifier Multiple classifierstrained on randomly selected datasets perform quiet differ-ently from each other on the same testing dataset therefore

we may reduce the misclassification rate by a proper decisionfusion rule

AdaBoost algorithm consists of several iterations In eachiteration a new training set is selected for a new weak clas-sifier A weight is assigned to the new weak classifier Basedon the testing results of the newweak classifier the weights ofall the data samples are modified for the next iteration At thefinal step the assembled classifier is achieved by combinationof themultipleweak classified through aweighted voting rule

Let us suppose the current training set is [17]

119879 = 1199041 1199042 119904

119873 (6)

where the weights of the samples are

119882 = 1199081 1199082 119908

119873

119873

sum

119894=0

119908119894= 1

(7)

The error rate of the new weak classifier is

119890 = sum

119894119888(119904119894) = 119910119894

119908119894 (8)

where 119888(119904119894) is the classification result and 119910

119894is the class label

The fusion weight assigned to each classifier is defined by theerror rate

120572 = ln((1 minus 119890)119890

) (9)

At the beginning of the algorithm each sample is assignedby equal weight During the iteration the sample weights areupdated

119908119894+1

=

119908119894times 120573 119888 (119904

119894) = 119910119894

119908119894 119888 (119904

119894) = 119910119894

(10)

At the arrival of the new data assuming that we knowthe label information for each sample pretrained classifiersfrom the offline data are used as initial weak classifiers Ada-Boost algorithm is applied to the new online data and fusionweights are reassigned to the offline trained classifiers

At the first119898 initial iterations119898 pretrained classifiers areused as the weak classifiers and added to the final ensembleclassifier instead of training new weak classifiers from therandomly selected dataset After the119898 initial iterations newweak classifiers are trained from the new online data andadded to the final ensemble classifier in the AdaBoostalgorithm

The major difference between the online training and theoffline training is the data used for learning Offline train-ing uses large acted data while online training uses small andnatural data Offline training is independent of the onlinetraining and ready to use while the online training is depen-dent on the offline training and only retrains the existingmodel to fit specific purposes such as to tune on a largenumber of speakers The purpose of online training is toquickly adapt the existing offline model to a small amountof new data

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Practical Speech Emotion Recognition ...

6 Mathematical Problems in Engineering

1 09521

067120801407877

002040608

112

Fidgetiness Happiness Confidence Tiredness Neutrality

Nor

mal

ized

mea

n pi

tch

frequ

ency

(a) Normalised mean pitch frequency

735852

789705

791

0100200300400500600700800900

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n fir

st fo

rman

t fre

quen

cy

(b) Mean first formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n se

cond

form

ant

frequ

ency 1780

1842 1853

17751746

16801700172017401760178018001820184018601880

(c) Mean second formant frequency (Hz)

Fidgetiness Happiness Confidence Tiredness Neutrality

Mea

n th

ird fo

rman

t fre

quen

cy

2998 2985

3164

3078

2949

280028502900295030003050310031503200

(d) Mean third formant frequency (Hz)

Figure 12 Feature distribution over various emotional states

Feature 326ndash403 mean maximum minimum me-dian range and variance of dMFCC (0ndash12th-order)Feature 404ndash481 mean maximum minimum me-dian range and variance of d2MFCC (0ndash12th-order)

32 Feature Selection Based onMIC In this section we intro-duce the feature selection algorithm in our speech emotionclassifier Feature selection algorithms may be roughly clas-sified into two groups namely ldquowrapperrdquo and ldquofilterrdquo Algo-rithms in the former group are dependent on the specific clas-sifiers such as sequential forward selection (SFS) The finalselection result is dependent on a specific classifier If we re-place the specific classifier the results will change In thesecond group feature selection is done by a certain evaluationcriteria such as FisherDiscriminant Ratio (FDR)The feature

Angry

Fidgetiness

Fear

Surprise

Neutrality

Happiness

Confidence

Sadness

Tiredness

minus1

minus08

minus06

minus04

minus02

0

02

04

06

08

1

minus1 minus08 minus06 minus04 minus02 0 02 04 06 08 1Aro

usal

Valence

Figure 13 The arousal and the valence dimensions of emotions

selection result achieved in this type of method is not de-pendent on specific classifiers and bears a better generalityacross different databases

Maximal information coefficient (MIC) based feature se-lection algorithm falls into the second group MIC is a newstatistic tool that measures linear and nonlinear relationshipsbetween paired variables invented by Reshef et al [14]

MIC is based on the idea that if a relationship existsbetween two variables then a grid can be drawn on the scat-terplot of the two variables that partitions the data to encap-sulate that relationship [14] We may calculate the MIC of acertain acoustic feature and the emotional state by exploringall possible grids on the two variables First we computefor every pair of integers (119909 119910) that largest possible mutualinformation achieved by any 119909-by-119910 grid [14] Second for afair comparison we normalize these MIC values between allacoustic features and the emotional state Detailed study ofMIC may be found in [14]

Since MIC can treat linear and nonlinear associations atthe same time we do not need tomake any assumption on thedistribution of our original features Therefore it is especiallysuitable for evaluating a large number of emotional featuresBased on a large number of basic features as described inSection 31 we apply MIC to measure the contribution ofthese features in correlation with emotion states Finally asubset of features is selected for our emotion classifier

4 Recognition Methodology

41 Baseline GMM Classifier The Gaussian mixture model(GMM) based classifier is the state-of-the-art recognitionmethod in speaker and language identification In this paperwe built the baseline classifier using Gaussianmixturemodeland we may compare the baseline classifier with the onlinelearning method

Mathematical Problems in Engineering 7

GMM may be defined by the sum of several Gaussiandistributions

119901 (X119905| 120582) =

119872

sum

119894=1

119886119894119887119894(X119905) (1)

where X119905is a 119863-dimension random vector 119887

119894(X119905) is the 119894th

member of Gaussian distribution 119905 is the index of utterancesample 119886

119894is the mixture weight and 119872 is the number of

Gaussian mixture members Each member is a119863-dimensionvariable which follows the Gaussian distribution with themean U

119894and the covariance Σ

119894

119887119894(X119905) =

1

(2120587)119863210038161003816

10038161003816Σ119894

1003816100381610038161003816

12exp minus1

2

(X119905minus U119894)119879

Σminus1

119894(X119905minus U119894)

(2)

Note that119872

sum

119894=1

119886119894= 1 (3)

Emotion classification can be done by maximizing theposterior probability

EmotionLable = argmax119896

(119901 (X119905| 120582119896)) (4)

ExpectationMaximization (EM) is adopted forGMMparam-eter estimation [15]

119886119894

119898=

sum119879

119905=1120574119894

119905119898

sum119879

119905=1sum119872

119898=1120574119894

119905119898

U119894119898=

sum119879

119905=1120574119894

119905119898X119905

sum119879

119905=1120574119894

119905119898

Σ119894

119898=

sum119879

119905=1120574119894

119905119898(X119905minus U119894119898) (X119905minus U119894119898)

119879

sum119879

119905=1120574119894

119905119898

120574119894

119905119898=

119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

sum119872

119898=1119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

(5)

Due to the different types of emotions among the datasetswe unify the emotional datasets by categorizing them intopositive and negative regions in the valence and arousal di-mensions as shown in Figure 13 We may verify the ability ofthe emotion classifier by classifying the emotional utterancesinto different regions in the valence and arousal space

42 Online LearningUsingAdaBoost While the offlineGMMclassifier is trained using EM algorithm the online trainingalgorithmusingAdabBoost will be introduced in this sectionAdaBoost is a powerful algorithm in assemble learning [16]The belief in this AdaBoost is that weak classifiers may becombined into a powerful classifier Multiple classifierstrained on randomly selected datasets perform quiet differ-ently from each other on the same testing dataset therefore

we may reduce the misclassification rate by a proper decisionfusion rule

AdaBoost algorithm consists of several iterations In eachiteration a new training set is selected for a new weak clas-sifier A weight is assigned to the new weak classifier Basedon the testing results of the newweak classifier the weights ofall the data samples are modified for the next iteration At thefinal step the assembled classifier is achieved by combinationof themultipleweak classified through aweighted voting rule

Let us suppose the current training set is [17]

119879 = 1199041 1199042 119904

119873 (6)

where the weights of the samples are

119882 = 1199081 1199082 119908

119873

119873

sum

119894=0

119908119894= 1

(7)

The error rate of the new weak classifier is

119890 = sum

119894119888(119904119894) = 119910119894

119908119894 (8)

where 119888(119904119894) is the classification result and 119910

119894is the class label

The fusion weight assigned to each classifier is defined by theerror rate

120572 = ln((1 minus 119890)119890

) (9)

At the beginning of the algorithm each sample is assignedby equal weight During the iteration the sample weights areupdated

119908119894+1

=

119908119894times 120573 119888 (119904

119894) = 119910119894

119908119894 119888 (119904

119894) = 119910119894

(10)

At the arrival of the new data assuming that we knowthe label information for each sample pretrained classifiersfrom the offline data are used as initial weak classifiers Ada-Boost algorithm is applied to the new online data and fusionweights are reassigned to the offline trained classifiers

At the first119898 initial iterations119898 pretrained classifiers areused as the weak classifiers and added to the final ensembleclassifier instead of training new weak classifiers from therandomly selected dataset After the119898 initial iterations newweak classifiers are trained from the new online data andadded to the final ensemble classifier in the AdaBoostalgorithm

The major difference between the online training and theoffline training is the data used for learning Offline train-ing uses large acted data while online training uses small andnatural data Offline training is independent of the onlinetraining and ready to use while the online training is depen-dent on the offline training and only retrains the existingmodel to fit specific purposes such as to tune on a largenumber of speakers The purpose of online training is toquickly adapt the existing offline model to a small amountof new data

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Practical Speech Emotion Recognition ...

Mathematical Problems in Engineering 7

GMM may be defined by the sum of several Gaussiandistributions

119901 (X119905| 120582) =

119872

sum

119894=1

119886119894119887119894(X119905) (1)

where X119905is a 119863-dimension random vector 119887

119894(X119905) is the 119894th

member of Gaussian distribution 119905 is the index of utterancesample 119886

119894is the mixture weight and 119872 is the number of

Gaussian mixture members Each member is a119863-dimensionvariable which follows the Gaussian distribution with themean U

119894and the covariance Σ

119894

119887119894(X119905) =

1

(2120587)119863210038161003816

10038161003816Σ119894

1003816100381610038161003816

12exp minus1

2

(X119905minus U119894)119879

Σminus1

119894(X119905minus U119894)

(2)

Note that119872

sum

119894=1

119886119894= 1 (3)

Emotion classification can be done by maximizing theposterior probability

EmotionLable = argmax119896

(119901 (X119905| 120582119896)) (4)

ExpectationMaximization (EM) is adopted forGMMparam-eter estimation [15]

119886119894

119898=

sum119879

119905=1120574119894

119905119898

sum119879

119905=1sum119872

119898=1120574119894

119905119898

U119894119898=

sum119879

119905=1120574119894

119905119898X119905

sum119879

119905=1120574119894

119905119898

Σ119894

119898=

sum119879

119905=1120574119894

119905119898(X119905minus U119894119898) (X119905minus U119894119898)

119879

sum119879

119905=1120574119894

119905119898

120574119894

119905119898=

119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

sum119872

119898=1119886119894minus1

119898119873(X119905| U119894minus1119898Σ119894minus1

119898)

(5)

Due to the different types of emotions among the datasetswe unify the emotional datasets by categorizing them intopositive and negative regions in the valence and arousal di-mensions as shown in Figure 13 We may verify the ability ofthe emotion classifier by classifying the emotional utterancesinto different regions in the valence and arousal space

42 Online LearningUsingAdaBoost While the offlineGMMclassifier is trained using EM algorithm the online trainingalgorithmusingAdabBoost will be introduced in this sectionAdaBoost is a powerful algorithm in assemble learning [16]The belief in this AdaBoost is that weak classifiers may becombined into a powerful classifier Multiple classifierstrained on randomly selected datasets perform quiet differ-ently from each other on the same testing dataset therefore

we may reduce the misclassification rate by a proper decisionfusion rule

AdaBoost algorithm consists of several iterations In eachiteration a new training set is selected for a new weak clas-sifier A weight is assigned to the new weak classifier Basedon the testing results of the newweak classifier the weights ofall the data samples are modified for the next iteration At thefinal step the assembled classifier is achieved by combinationof themultipleweak classified through aweighted voting rule

Let us suppose the current training set is [17]

119879 = 1199041 1199042 119904

119873 (6)

where the weights of the samples are

119882 = 1199081 1199082 119908

119873

119873

sum

119894=0

119908119894= 1

(7)

The error rate of the new weak classifier is

119890 = sum

119894119888(119904119894) = 119910119894

119908119894 (8)

where 119888(119904119894) is the classification result and 119910

119894is the class label

The fusion weight assigned to each classifier is defined by theerror rate

120572 = ln((1 minus 119890)119890

) (9)

At the beginning of the algorithm each sample is assignedby equal weight During the iteration the sample weights areupdated

119908119894+1

=

119908119894times 120573 119888 (119904

119894) = 119910119894

119908119894 119888 (119904

119894) = 119910119894

(10)

At the arrival of the new data assuming that we knowthe label information for each sample pretrained classifiersfrom the offline data are used as initial weak classifiers Ada-Boost algorithm is applied to the new online data and fusionweights are reassigned to the offline trained classifiers

At the first119898 initial iterations119898 pretrained classifiers areused as the weak classifiers and added to the final ensembleclassifier instead of training new weak classifiers from therandomly selected dataset After the119898 initial iterations newweak classifiers are trained from the new online data andadded to the final ensemble classifier in the AdaBoostalgorithm

The major difference between the online training and theoffline training is the data used for learning Offline train-ing uses large acted data while online training uses small andnatural data Offline training is independent of the onlinetraining and ready to use while the online training is depen-dent on the offline training and only retrains the existingmodel to fit specific purposes such as to tune on a largenumber of speakers The purpose of online training is toquickly adapt the existing offline model to a small amountof new data

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Practical Speech Emotion Recognition ...

8 Mathematical Problems in Engineering

5 Experimental Results

In our experiment the offline training is carried out on theacted basic emotion dataset The speaker-independent data-set and the elicited practical emotion dataset are used for theonline training and the online testing Although the datasetsused in online testing are preprocessed utterances rather thanreal time online data our experiments still provide a simu-lated online situation We divide dataset 2 and dataset 3 intosmaller sets dataset 2a and dataset 2b which are used as thesimulated online initialization

Speech utterances from different sources are organizedinto several datasets as shown in Table 2

The online learning algorithm is verified both on thespeaker-independent data and the elicited data The resultsare shown in Table 4 A large number of speakers bring dif-ficulties in modeling emotional behavior since emotionexpression is highly dependent on individual habit and per-sonality By extending the offline trained classifier to theonline data that contains a large number of speakers weimproved the generality of our SER system The elicited datais collected in a cognitive experiment that is more close tothe real world situation During the cognitive task emotionalspeech is induced We observed that the different naturebetween the acted data and the induced speech during acognitive task caused a significant decrease of the recognitionrate By using the online training technique we may transferthe offline trained SER system to the elicited data Extendingour SER system to different data sources may bring emotionrecognition closer to real world applications

The major challenge in our online learning algorithm ishow to combine the existing offline classifier and efficientlyadapt the model parameters to a small number of new onlinedata We adopted the incremental learning idea and solvedthis problem by modifying the initial stage in the AdaBoostframework One of the contributions of our online learningalgorithm is that we may reuse the existing offline trainingdata and make the online learning stage more efficiently Wemake use of a large amount of available offline training dataand only require a small amount of data for online trainingas shown in Table 3 The weight of each weak classifier is animportant parameter The proposed method may be furtherimproved by using fuzzy membership function to evaluatethe confidence in GMM classifiers and reestimate the weightof each weak classifier

6 Discussions

Acted data is often considered not suitable for real worldapplications However traditional researches have been fo-cused on the acted emotion speech andmany acted databasesare available How to transfer an SER system that trained onthe acted data to the new naturalistic data in real world is anunsolved challenge

Many feature selection algorithms may be applied to SERsystem MIC is a newly proposed and powerful algorithm forexploring nonlinear relationship between variables

AdaBoost is a popular algorithm to ensemble multipleweak classifiers to establish a strong classifier By applying

Table 3 Selected datasets for online and offline experiments

Datasets index Data source Number ofutterances Purpose of use

Dataset 1 Acted speech 12000 Offline training

Dataset 2a Speakerindependent 1000 Online training

Dataset 2b Speakerindependent 10000 Testing

Dataset 3a Elicited speech 1000 Online trainingDataset 3b Elicited speech 5000 Testing

Table 4 Online and offline experimental results

Experimentindex

Offlinetraining set

Onlinetraining set Testing set Classification

result Experiment 1 Dataset 1 NA Dataset 2b 633Experiment 2 Dataset 1 Dataset 2a Dataset 2b 756Experiment 5 Dataset 2a NA Dataset 2b 700Experiment 3 Dataset 1 NA Dataset 3b 612Experiment 4 Dataset 1 Dataset 3a Dataset 3b 731Experiment 6 Dataset 3a NA Dataset 3b 685

AdaBoost in the online occasion we train multiple weakclassifiers based on the newly arrived online data The offlinepretrained classifiers are used for initialization We may ex-plore other incremental learning algorithms in the futurework

Acknowledgments

This work was partially supported by China Postdoctoral Sci-ence Foundation (no 2012M520973) National Nature Sci-ence Foundation (no 61231002 no 61273266 no 51075068)and Doctoral Fund of Ministry of Education of China (no20110092130004)The authors would like to thank the anony-mous reviewers for their valuable comments and helpfulsuggestions

References

[1] C Clavel I Vasilescu L Devillers G Richard and T EhretteldquoFear-type emotion recognition for future audio-based surveil-lance systemsrdquo Speech Communication vol 50 no 6 pp 487ndash503 2008

[2] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoSpeech emotionrecognition based on re-composition of two-class classifiersrdquoin Proceedings of the 3rd International Conference on AffectiveComputing and Intelligent Interaction andWorkshops (ACII rsquo09)Amsterdam The Netherlands September 2009

[3] K R Scherer ldquoVocal communication of emotion a review ofresearch paradigmsrdquo SpeechCommunication vol 40 no 1-2 pp227ndash256 2003

[4] A Tawari andMM Trivedi ldquoSpeech emotion analysis explor-ing the role of contextrdquo IEEE Transactions on Multimedia vol12 no 6 pp 502ndash509 2010

[5] F Burkhardt A Paeschke M Rolfes W Sendlmeier and BWeiss ldquoA database of German emotional speechrdquo inProceedings

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Practical Speech Emotion Recognition ...

Mathematical Problems in Engineering 9

of the 9th European Conference on Speech Communication andTechnology pp 1517ndash1520 Lissabon Portugal September 2005

[6] D Ververidis and C Kotropoulos ldquoAutomatic speech classifi-cation to five emotional states based on gender informationrdquo inProceedings of the 12th European Signal Processing Conferencepp 341ndash344 Vienna Austria 2004

[7] S SteidlAutomatic Classification of Emotion-RelatedUser Statesin Spontaneous Childrenrsquos Speech Department of Computer Sci-ence Friedrich-Alexander-Universitaet Erlangen-NuermbergBerlin Germany 2008

[8] M Grimm K Kroschel and S Narayanan ldquoThe Vera am Mit-tag German audio-visual emotional speech databaserdquo in Pro-ceedings of the IEEE International Conference onMultimedia andExpo (ICME rsquo08) pp 865ndash868 Hannover Germany June 2008

[9] K P Truong How Does Real Affect Affect Affect Recognitionin Speech Center for Telematics and Information TechnologyUniversity of Twente Enschede The Netherlands 2009

[10] C Huang Y Jin Y Zhao Y Yu and L Zhao ldquoRecognition ofpractical emotion from elicited speechrdquo in Proceedings of the 1stInternational Conference on Information Science and Engineer-ing (ICISE rsquo09) pp 639ndash642 Nanjing China December 2009

[11] R Polikar L Udpa S S Udpa and V Honavar ldquoLearn++an incremental learning algorithm for supervised neural net-worksrdquo IEEE Transactions on Systems Man and Cybernetics Cvol 31 no 4 pp 497ndash508 2001

[12] Q L Zhao Y H Jiang and M Xu ldquoIncremental learning byheterogeneous Bagging ensemblerdquo Lecture Notes in ComputerScience vol 6441 no 2 pp 1ndash12 2010

[13] R Xiao J Wang and F Zhang ldquoAn approach to incrementalSVM learning algorithmrdquo in Proceedings of the IEEE Interna-tional Conference on Tools with Artificial Intelligence pp 268ndash273 2000

[14] D N Reshef Y A Reshef H K Finucane et al ldquoDetectingnovel associations in large data setsrdquo Science vol 334 no 6062pp 1518ndash1524 2011

[15] D A Reynolds and R C Rose ldquoRobust text-independentspeaker identification using Gaussian mixture speaker modelsrdquoIEEE Transactions on Speech and Audio Processing vol 3 no 1pp 72ndash83 1995

[16] Y Freund and R E Schapire ldquoA decision-theoretic generaliza-tion of on-line learning and an application to boostingrdquo Journalof Computer and System Sciences vol 55 no 1 part 2 pp 119ndash139 1997

[17] Q ZhaoThe research on ensemble pruning and its application inon-line machine learning [PhD thesis] National University ofDefense Technology Changsha China 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Practical Speech Emotion Recognition ...

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of