Top Banner
Non intrusive vision and acoustic based emotion recognition of driver in Advanced Driver Assistance System
13

Emotion recognition using facial expressions and speech

Sep 13, 2014

Download

Technology

Emotion recognition using facial expressions and speech
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Emotion recognition using facial expressions and speech

Non intrusive vision and acoustic based emotion recognition of driver in Advanced Driver Assistance System

Page 2: Emotion recognition using facial expressions and speech

Motivation

• Driving is one of the most dangerous tasks in our everyday lives.

• Some statistics in Vijayawada

http://www.aptransport.org/html/accidents.htm

Page 3: Emotion recognition using facial expressions and speech

• Majority of the accidents on roads are mainly due to the driver’s inattentiveness.

• The main reason to the poor attention of the driver in the driving is caused by various emotions/moods (for example sad, angry, joy, pleasure, despair and irritation) of the driver.

• The emotions are generally measured by analyzing either head movement patterns or eyelid movements or face expressions or all together.

• In this project, we develop a system to identify emotions of the driver using non intrusive methods.

Page 4: Emotion recognition using facial expressions and speech

Emotion• There are more than 300 crisply identified emotions by researchers.

However all of them are not experienced in day-to-day life.• Palette theory quotes, any emotion is composition of 6 primary emotions

as any color is combination of 3 primary colors. • Anger, disgust, fear, happiness, sadness and surprise are considered as the

primary or basic emotions and also referred to as archetypal emotions

http://2wanderlust.files.wordpress.com/2009/03/picture-2.png

Page 5: Emotion recognition using facial expressions and speech

Face recognition techniquesDifferent face recognition techniques are• Model based, a 3D model is constructed based on the facial variations in the image Disadvantages: * need high expensive camera (Stereo vision). * construction of 3D model is difficult and takes more time. • Appearance based, performance depends on the quality of extracted features.• Feature based, the overall technique describes the position and size of each feature

(eye, nose, mouth or face outline) Disadvantages: * Extracting features in different poses (viewing condition) and lighting

conditions is a very complex task. * For applications with large database, large set of features with different sizes and positions, feature points identification difficult.

Page 6: Emotion recognition using facial expressions and speech

Feature Extraction from the Visual Information

• The appearance based linear subspace techniques extract the global features, as these techniques use the statistical properties like the mean and variance of the image.

• Challenge: The major difficulty in applying these techniques over large databases is that the computational load and memory requirements for calculating features increase dramatically for large databases

• Solution: In order to increase the performance of the feature extraction techniques, the nonlinear feature extraction techniques are introduced.

Page 7: Emotion recognition using facial expressions and speech

Nonlinear feature extraction techniques• Radon transform• Wavelet transform. The radon transform based nonlinear feature

extraction gives the direction of the local features.

When features are extracted using radon transform, the variations in this facial frequency are also boosted. The wavelet transform gives the spatial and frequency components present in an image.

Page 8: Emotion recognition using facial expressions and speech

Performance comparison of different face recognition approaches

Page 9: Emotion recognition using facial expressions and speech

Feature Extraction from acoustic informationThe important voice features to consider for emotion classification are:• Fundamental frequency (F0) or Pitch,• Intensity (Energy), • Speaking rate,• Voice quality and many other features that may be extracted/calculated from the voice information are

the formants,• the vocal tract cross-section areas,• the MFCC (Mel Frequency Cepstral Coefficient), • Linear frequency cepstrum coefficients (LFCC),• Linear Predictive Coding (LPC) and • the teager energy operator-based featuresPitch is the fundamental frequency of audio signals (highness or lowness of a sound). The MFCC is “spectrum of the spectrum” used to find the number of voices in the speech.The teager energy operator is used to find the number of harmonics due to nonlinear air flow in the vocal

track The LPC provides an accurate and economical representation of the envelope of the short-time power

spectrum.The LFCC is similar to MFCC but without the perceptually oriented transformation into the Mel frequency

scale; emphasize changes or periodicity in the spectrum, while being relatively robust against noise. These features are measured from the mean, range, variance and transmission duration between utterances .

Page 10: Emotion recognition using facial expressions and speech

Advantages and Disadvantages of using acoustic features for detecting emotions

Advantages:• We can often detect a speaker’s emotion even if we can not

understand the language.• Speech is easy to record even under extreme environmental

conditions (temperature, high humidity and bright light), requires cheap, durable and maintenance free sensors

Disadvantages: Depends on age and gender. Angry males show higher levels of

energy than angry females. It is found that males express anger with a slow speech rate as opposed to females who employ a fast speech rate

Page 11: Emotion recognition using facial expressions and speech

Previous Work On Emotion Detection From Speech

• Schuller et al. [3]used Hidden Markov Model based approach for speech emotion recognition. They achieved an overall accuracy of about 87%.

• In [4] using spectral features and GMM supervector based SVMs emotion recognition reached an accuracy level of more than 90% in some cases.

• Many other approaches for emotion recognition has been tried like decision tree based approach in [5],

• rough set and SVM based approach in [6].• ANN and HMM based Multilevel speech recognition work was done in [7]• Some authors have done comparative study of two or more approaches for emotion detection

using speech [8] [9]. • Speaker dependent and Speaker independent studies has also been done [9] and proved that different approaches will give different accuracy level for the two cases.• Different features used affect the emotion recognition [10] and hence proper feature set must

be taken for emotion recognition.• Since large number of features can be extracted for audio, few works related to feature

selection method has also been done [11].

Page 12: Emotion recognition using facial expressions and speech

Recent work

• Using 3D shape information: Increased availability of 3D databases and

affordable 3D sensors . 3D shape information provides invariance

against head pose and illumination conditions.• Using Thermal cameras.• Integration of audio, video and body language.

Page 13: Emotion recognition using facial expressions and speech

References• [1] H. D. Vankayalapati and K. Kyamakya, "Nonlinear Feature Extraction Approaches for Scalable Face Recognition

Applications," ISAST transactions on computers and intelligent systems, vol. 2, 2009.• [2] Extraction of visual and acoustic features of the driver for real-time driver monitoring system - Sandeep Kotte• [3] Schuller, B.; Rigoll, G.; Lang, M.; ”Hidden Markov Model-based Speech Emotion Recognition” IEEE International Conference

on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). 2003• [4] Hao Hu; Ming-Xing Xu; Wei Wu; ”GMM Supervector Based SVM with Spectral Features for Speech Emotion Recognition” IEEE

International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007.• [5] Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee and Shrikanth S. Narayanan, ”Emotion recognition using a hierarchical

binary decision tree approach”, in: Proceedings of Inter-Speech, 2009.• [6] Jian Zhou1,Guoyin Wang, Yong Yang, Peijun Chen ”Speech Emotilon Ruecognition Based on Rough Set and SVM” 5th IEEE

International Conference on Cognitive Informatics, 2006. ICCI 2006.• [7] Xia Mao, Lijiang Chen, Liqin Fu. ”Multi-Level Speech Emotion Recognition based on HMM and ANN”,World Congress on

Computer Science and Information Engineering, 2009• [8] Razak, A.A.; Komiya, R.; Izani, M.; Abidin, Z.; ”Comparison between Fuzzy and NN method for Speech Emotion Recognition”

Third International Conference on Information Technology and Applications, 2005. ICITA 2005.• [9] Iliou, Theodoros; Anagnostopoulos, Christos-Nikolaos; ”SVM-MLP-PNN Classifiers on Speech Emotion Recognition Field A

Comparative Study” Fifth International Conference on DigitalTelecommunications 2010• [10] Anton Batliner, Stefan Steidl, Bjorn Schuller, Dino Seppi, Thurid Vogt, Johannes Wagner, Laurence Devillers, Laurence

Vidrascu, Vered Aharonson, Loic Kessous, Noam Amir,”Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech”, 2009, Computer Science & Language

• [11] Ling Cen, Wee Ser, Zhu Liang Yu , ”Speech Emotion Recogni- tion Using Canonical Correlation Analysis and Probabilistic Neu-ral Network” 2008 Seventh International Conference on Machine Learning and Applications

• [12] Dimitrios Ververidis and Constantine Kotropoulos. Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9):1162 -1181, 2006.

• [13] Emotion Recognition using Speech Features By K. Sreenivasa Rao, Shashidhar G. Koolagudi