Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson- Berndsen University College Dublin
21
Embed
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Slide 2
Facial expression as an input annotation modality for affective
speech-to-speech translation va Szkely, Zeeshan Ahmed, Ingmar
Steiner, Julie Carson-Berndsen University College Dublin
Slide 3
Introduction Expressive speech synthesis in human interaction
Speech-to-speech translation: audiovisual input, affective state
does not need to be predicted from text Facial expression as an
input annotation modality for affective speech-to-speech
translation
Slide 4
Introduction Goal: Transferring paralinguistic information from
source to target language by means of an intermediate, symbolic
representation: facial expression as an input annotation modality.
FEAST: Facial Expression-based Affective Speech Translation Facial
expression as an input annotation modality for affective
speech-to-speech translation
Slide 5
System Architecture of FEAST Facial expression as an input
annotation modality for affective speech-to-speech translation
Slide 6
Face detection and analysis SHORE library for real-time face
detection and analysis
http://www.iis.fraunhofer.de/en/bf/bsy/produkte/shore/ Facial
expression as an input annotation modality for affective
speech-to-speech translation
Slide 7
Emotion classification and style selection Aim of the facial
expression analysis in FEAST system: a single decision regarding
the emotional state of the speaker over each utterance Visual
emotion classifier, trained on segments of the SEMAINE database,
with input features from SHORE Facial expression as an input
annotation modality for affective speech-to-speech translation
Slide 8
Expressive speech synthesis Expressive unit-selection synthesis
using the open-source synthesis platform MARY TTS German male voice
dfki-pavoque-styles : Cheerful Depressed Aggressive Neutral Facial
expression as an input annotation modality for affective
speech-to-speech translation
Slide 9
The SEMAINE database (semaine-db.eu)semaine-db.eu Audiovisual
database collected to study natural social signals occurring in
English conversations Conversations with four emotionally
stereotyped characters: Poppy (happy, outgoing) Obadiah (sad,
depressive) Spike (angry, confrontational) Prudence (even tempered,
sensible) Facial expression as an input annotation modality for
affective speech-to-speech translation
Slide 10
Evaluation experiments 1.Does the system accurately classify
emotion on the utterance level, based on the facial expression in
the video input? 2.Do the synthetic voice styles succeed in
conveying the target emotion category? 3.Do listeners agree with
the cross-lingual transfer of paralinguistic information from the
multimodal stimuli to the expressive synthetic output? Facial
expression as an input annotation modality for affective
speech-to-speech translation
Slide 11
Experiment 1: Classification of facial expressions Support
Vector Machine (SVM) classifier trained on utterances of the male
operators from the SEMAINE database 535 utterances used for
training, 107 for testing Facial expression as an input annotation
modality for affective speech-to-speech translation
Slide 12
Experiment 2: Perception of expressive synthesis Perception
experiment with 20 subjects Listen to natural and synthesised
stimuli and choose which voice style describes the utterance best:
Cheerful Depressed Aggressive Neutral Facial expression as an input
annotation modality for affective speech-to-speech translation
Slide 13
Experiment 2: Results
Slide 14
Experiment 3: Adequacy for S2S translation Perceptual
experiment with 14 bilingual participants 24 utterances from
SEMAINE operator data and their corresponding translation in each
voice style Listeners were asked to choose which German translation
matches the original video best. Facial expression as an input
annotation modality for affective speech-to-speech translation
Slide 15
NCADNCAD Examples - Poppy (happy) Facial expression as an input
annotation modality for affective speech-to-speech translation
Slide 16
NCADNCAD Examples - Prudence (neutral) Facial expression as an
input annotation modality for affective speech-to-speech
translation
Slide 17
NCADNCAD Examples - Spike (angry) Facial expression as an input
annotation modality for affective speech-to-speech translation
Slide 18
NCADNCAD Examples - Obadiah (sad) Facial expression as an input
annotation modality for affective speech-to-speech translation
Slide 19
Experiment 3: Results Facial expression as an input annotation
modality for affective speech-to-speech translation
Slide 20
Conclusion Preserving the paralinguistic content of a message
across languages is possible with significantly greater than chance
accuracy Visual emotion classifier performed with an overall 63.5%
accuracy Cheerful/happy is often mistaken for neutral (conditioned
by the voice) Facial expression as an input annotation modality for
affective speech-to-speech translation
Slide 21
Future Work Extending the classifier to compute the prediction
of the affective state of the user based on acoustic and prosodic
analysis as well as facial expressions. Demonstration of the
prototype system that takes live input through a webcamera and
microphone. Integration of a speech recogniser and a machine
translation component Facial expression as an input annotation
modality for affective speech-to-speech translation