multimodal emotion multimodal emotion recognition and recognition and expressivity analysis expressivity analysis ICME 2005 Special Session ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems Lab National Technical University of Athens
40
Embed
Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
multimodal emotion multimodal emotion recognition and expressivity recognition and expressivity analysisanalysis
ICME 2005 Special SessionICME 2005 Special Session
Stefanos Kollias, Kostas KarpouzisImage, Video and Multimedia
Systems Lab National Technical University of
Athens
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session2
expressivity and emotion expressivity and emotion recognitionrecognition
• affective computing– capability of machines to recognize, express,
model, communicate and respond to emotional information
• computers need the ability to recognize human emotion– everyday HCI is emotional: three-quarters of
computer users admit to swearing at computers
– user input and system reaction are important to pinpoint problems or provide natural interfaces
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session3
the the targeted interaction targeted interaction frameworkframework
• Generating intelligent interfaces with affective, learning, reasoning and adaptive capabilities.
• Multidisciplinary expertise is the basic means for novel interfaces, including perception and emotion recognition, semantic analysis, cognition, modelling and expression generation and production of multimodal avatars capable of adapting to the goals and context of interaction.
• Humans function due to four primary modes of being, i.e., affect, motivation, cognition, and behavior; these are related to feeling, wanting, thinking, and acting.
• Affect is particularly difficult requiring to understand and model the causes and consequences of emotions. The latter, especially as realized in behavior, is a daunting task
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session4
July 7, 2005multimodal emotion recognition and expressivity analysis
• detect specific incidents/situations that need human intervention– e.g. anger detection in a call center
• naturalistic interfaces– keyboard/mouse/pointer paradigm can be
difficult for the elderly, handicapped people or children
– speech and gesture interfaces can be useful
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session7
the EU perspectivethe EU perspective• Until 2002, related research was dominated
by mid-scale projects– ERMIS: multimodal emotion recognition (facial
expressions, linguistic and prosody analysis)– NECA: networked affective ECAs– SAFIRA: affective input interfaces– NICE: Natural Interactive Communication for
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session11
moving forwardmoving forward• future EU orientations include (extracted from Call 1
evaluation, 2004):– adaptability and re-configurable interfaces – collaborative technologies and interfaces in the arts– less explored modalities, e.g. haptics, bio-sensing– affective computing, including character and facial
expression recognition and animation– more product innovation and industrial impact
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session12
the special sessionthe special session• segment-based approach to the recognition of
emotions in speech– M. Shami, M. Kamel, University of Waterloo
• comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition
– T. Vogt, E. Andre, University of Augsburg
• annotation and detection of blended emotions in real human-human dialogs recorded in a call center
– L. Vidrascu, L. Devillers, LIMSI-CNRS, France
• a real-time lip sync system using a genetic algorithm for automatic neural network configuration
– G. Zoric, I. Pandzic, University of Zagreb
• visual/acoustic emotion recognition– Cheng-Yao Chen, Yue-Kai Huang, Perry Cook, Princeton University
• an intelligent system for facial emotion recognition– R. Cowie, E. Douglas-Cowie, Queen’s University of Belfast, J. Taylor, King's College, S. Ioannou,
M. Wallace, IVML/NTUA
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session13
the big picturethe big picture
• feature extraction from multiple modalities– prosody, words, face, gestures,
biosignals…
• unimodal recognition• multimodal recognition• using detected features to cater for
affective interaction
July 7, 2005multimodal emotion recognition and expressivity analysis
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session22
other visual featuresother visual features
• visemes, eye gaze, head pose– movement patterns, temporal correlations
• hand gestures, body movements– deictic/conversational gestures– “body language”
• measurable parameters to render expressivity on affective ECAs– spatial extent, repetitiveness, volume, etc.
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session23
• Step 1: Scan or approximate 3d model (in this case estimated from video data only using face space approach)
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session24
• Step 2: Represent 3d model using a predefined template geometry, the same templateis used for expressions.
This template shows higherdensity around eyes, and mouthand lower density around flatter areas such as cheeks, forehead, etc.
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session25
•Step 3: Construct database of facial expressions by recording various actors. The statistics derived from these performances is stored in terms of a “Dynamic Face Space” •Step 4: Apply the expressions to the actor in the video data:
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session26
•Step 5: Matching : rotate head + apply various expressions and match current state with 2D video frame
- Global Minimization process
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session27
video analysis using 3Dvideo analysis using 3D
• the global matching/minimization process is complex
• it is sensitive to – illumination, which may vary across sequence,– shading, shadowing effects on the face, – color changes, or color differences– variability in expressions, some expressions– can not be generated using the statistics of the
a priori recorded sequences
• it is time consuming (several minutes per frame)
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session28
Local template matching Pose estimation
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session29
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session30
3D models
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session31
Add expressions
video analysis using 3Dvideo analysis using 3D
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session32
auditory moduleauditory module
• linguistic analysis aims to extract the words that the speaker produces
• paralinguistic analysis aims to extract significant variations in the way words are produced - mainly in pitch, loudness, timing, and ‘voice quality’
• both are designed to cope with the less than perfect signals that are likely to occur in real use
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session33
llinguistic analysisinguistic analysis
Signal Enhancement/Adaptation Module
Speech RecognitionModule
TextPost-Processing
Module
Short-termspectral domain
Singular valuedecomposition
SpeechSignal
EnhancedSpeechSignal Text
LinguisticParameters
(a) The Linguistic Analysis Subsystem
(b) The Speech Recognition Module
(a)
(b)
ParameterExtraction
Module
SearchEngine
Dictionary
AcousticModeling
LanguageModelingEnhanced
SpeechSignal
Text
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session34
paralinguistic analysisparalinguistic analysis
• ASSESS, developed by QUB, describes speech at multiple levels – intensity & spectrum; edits, pauses, frication;
raw pitch estimates & a smooth fitted curve; rises & falls in intensity & pitch
July 7, 2005multimodal emotion recognition and expressivity analysis
ICME 2005 Special Session35
integrating the evidenceintegrating the evidence
Facial Emotion State Detection
Phonetic Emotion State Detection
Linguistic Emotion State Detection
Emotional State Detection
• level 1:– facial emotion– phonetic emotion– linguistic emotion
• level 2:– “total” emotional
state (result of the "level 1 emotions")
• modeling technique: fuzzy set theory (research by Massaro suggests this models the way humans integrate signs)