JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,
Post on 31-Mar-2015
221 Views
Preview:
Transcript
JC Martin - LIMSI/CNRS - WP5 WS
1
Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews
J.-C. Martin, S. Abrilian, L. Devillers
LIMSI-CNRS, France
JC Martin - LIMSI/CNRS - WP5 WS 2
Outline
Introduction Goals Requirements on annotation Emotional parameters of mm
behaviors Coding scheme
1st coding scheme and annotation 2nd coding scheme and example on 1 video
Future directions
JC Martin - LIMSI/CNRS - WP5 WS
3
Introduction
JC Martin - LIMSI/CNRS - WP5 WS 4
IntroductionGoals
How modalities correlate in non acted emotions ? Annotations and models : one source of knowledge
Coordination between modalities during non-acted emotion Synthesis of non acted spontaneous multimodal emotions in
ECAs How to code/represent multimodal emotional behavior ? Methodology (which attributes can be annotated easily
manually) Trade-off / intermediate level
Manual global text free whole video Manual medium/high order signs Automatic low level signs
WP5 + WP6 + WP4 + (WP3)
JC Martin - LIMSI/CNRS - WP5 WS 5
Introduction Requirements on coding scheme
Enable annotation (or computation) Literature: Main attributes of emotional behaviors Corpus based approach: Cover behaviors observed in EmoTV Multi-level annotation of temporal data
Global annotation: Manual annotation of multimodal signs for the global sequence Computations from manual annotations in each modality (mono, red,
comp) Emotional segment level
Computations from manual annotations in each modality (mono, red, comp)
Provide one source of knowledge for ECA specification Enable reliability and readability Annotation time
JC Martin - LIMSI/CNRS - WP5 WS 6
IntroductionEmotional parameters of mm behaviors
Psychology & behavior Montepare, J., Koff, E., Zaitchik, D. and Albert, M. (1999). "The
use of body movements and gestures as cues to emotions in younger and older adults." Journal of Nonverbal Behavior.
Wallbott, H. G. (1998). "Bodily expression of emotion." European Journal of Social Psychology
Detection of emotions + relevant non-verbal behaviors Acted data +/- Basic emotions Age, Gender Facial expression masked
Expressivity in ECAs (Hartman & Pelachaud 2004)
JC Martin - LIMSI/CNRS - WP5 WS 7
IntroductionEmotional parameters of mm behaviors
(Boone and Cunningham 1996; Boone and Cunningham 1998)
-Changes in tempo-Directional changes-Frequency-Muscle tension-Duration
Acted
(DeMeijer 1991) -Trunk (stretching, bowing)-Arm (opening, closing)-Vertical direction (upward, downward)-Sagittal direction (forward, backward)-Force (strong, light)-Velocity (fast, slow)-Directness
Acted
JC Martin - LIMSI/CNRS - WP5 WS 8
Introduction Multimodal corpora from TV clips
Communicative functions Kipp (2003) MUMIN (Alwood et al. 2004) Musical Score (Magno Caldognetto et al.
2004) Emotions / informal annotation
Orage (Atifi and Marcoccia 2001)
JC Martin - LIMSI/CNRS - WP5 WS
9
Coding Scheme
JC Martin - LIMSI/CNRS - WP5 WS 10
Current status
1st annotation on 35 clips from EmoTV with 2
coders 2nd
Iterative definition and application to 1 clip of EmoTV using Anvil (SA, JCM)
Annotation guide written 1 meeting with Catherine Pelachaud Paris 8 for
investigating use for WP6
JC Martin - LIMSI/CNRS - WP5 WS 11
Mouvement quality Annotated vs. computed
Quality (annotated) Number of repetitions Fluidity: smooth / normal / jerky Strength: soft / normal / hard Speed: slow / normal / fast Spatial expansion: contracted / normal / expanded
Computed Start / end / duration Mvt direction, type, angle approximation
Torso : Computed from Pose track
JC Martin - LIMSI/CNRS - WP5 WS 12
Annotation #1Multimodal coding scheme
Speech transcription including non-verbal events (laughter, cry, …);
Posture pose; posture shift including speed and action (4 cues with
3 to 10 attributes per each cue, for instance: cue = action, attribute = walk);
Gestures phases of gesture (preparation, stroke, retraction), handedness, speed, energy, spatial region, hand shape,
direction of gesture, gesture type (beats, adaptors, deictic…);
Facial expressions subset Facial Animation Parameters (FAPs)
JC Martin - LIMSI/CNRS - WP5 WS 13
Annotation #1Statistics
most frequently annotated behaviors : facial expressions (78.6% of annotated multimodal behaviors for coder1, 80.4% for coder2), gestures (11.3% for coder1, 11.9% for coder2), posture (10% for coder1, 7.7% for coder2).
most frequent attributes were: gaze direction (26.8% for coder1, 17% for coder2), head movements (23.5% for coder1, 21% for coder2), blinking (15.8% for coder1, 17.6% for coder2), eyebrows movements (10% for coder1, 9.3% for coder2).
quantitatively agreed for some attributes (number of annotations of preparation and stroke gestures phases, number of annotation of speed of posture shift).
Coder1 was more sensitive than coder2 in all the modalities. Disagreements occurred on body poses, and gesture type and energy.
Coder1 annotated subtle body moves, contrary to coder2 who annotated well visible movements. Coder2 associated gesture’s energy with gesture’s speed, while coder1 differentiated both attributes, perceiving that a gesture might have a high energy and a slow motion.
JC Martin - LIMSI/CNRS - WP5 WS 14
Annotation #1 Statistics
Many cues in coder1 annotations are shared by several emotion labels (blinking, head movements…), but there are also typical cues for some emotions such as lowering hands when despaired, slow body movement for serenity.
difference between behaviors linked to strong (anger, exaltation…) and weak (irritation, serenity), attributes for discriminating attributes: are speed and energy for gestures, and speed for body movement.
Serenity involves no gestures, whereas exaltation is often accompanied by fast and energetic gestures.
Anger is correlated with fast and intense gestures, whereas irritation involves slow and low-intensity gestures.
JC Martin - LIMSI/CNRS - WP5 WS 15
Annotation #1Quantitative analysis
Gesture - Phase - Speed
fast56%moderate
27%
slow17%
fast
moderate
slow
Gesture - Phase - Speed
fast47%
moderate52%
slow1%
fast
moderate
slow
Low intercoder agreement on some attributes Reduce the number of values 7 => 3 Improve annotation protocole & guide
JC Martin - LIMSI/CNRS - WP5 WS 16
Tracks or group Tracks
Torso Head Facial expressions Global body Shoulders (Arms) (Gestures)
Alternation of pose and movements Torso, head, shoulders
Common value for attributes: Asymetry, other
JC Martin - LIMSI/CNRS - WP5 WS 17
Methodology
Annotation guide Track per track Annotate emotion vs. Communication
emotionally rich clips reduced interaction (monologue in
interviews) exagerated mouth / brows movements
JC Martin - LIMSI/CNRS - WP5 WS 18
Torso Movement direction to be computed from
pose Poses
3 dimensions twist, side-side, bend rotational, lateral, sagittal
Labels + approximation of angles
JC Martin - LIMSI/CNRS - WP5 WS 19
Torso Pose Twist
JC Martin - LIMSI/CNRS - WP5 WS 20
Torso PoseSide-side / Bend
JC Martin - LIMSI/CNRS - WP5 WS 21
ExampleTorso fast movement
JC Martin - LIMSI/CNRS - WP5 WS 22
Head Mouvements Numerous and combined
=> direction annotated in movement track
Primary & secondary Position Mouvement
FACS
JC Martin - LIMSI/CNRS - WP5 WS 23
ExampleHead : 2 directions - speed
JC Martin - LIMSI/CNRS - WP5 WS 24
Gestures structural transcription(Kipp 04; Efron 1941; McNeill 92)
Preparation Bringing arm and hand into stroke position, note that changing hand shape before/after moving the arm belongs to the preparation
Stroke The most energetic part of the gesture
SequenceOfStroke A number of successive strokes; all strokes should be covered by this phase.
Retract Movement back to rest position; in sitting position this is usually the arm rest, the lap or folded arms.
Hold A phase of stillness just before or just after the stroke, usually used to defer the stroke so that it coincides with a certain word.
JC Martin - LIMSI/CNRS - WP5 WS 25
Gesture functional transcriptionManipulator Contact with body or object. Movement which serve functions of
drive reduction or other non-communicative functions, like scratching oneself.
Beat Synchronised with the emphasis of the speech.
Deictic Arm/hand is used to point at an existing or imaginary object.
Representational Represents attributes, actions, relationships of objects and characters (concrete or abstract)
Emblem Movement with a precise, culturally defined meaning, like the eye-wink, gestures signalling the intellectual deficiency of another person or obscene gestures.
JC Martin - LIMSI/CNRS - WP5 WS 26
ExampleHomogeneous sequence of stroke
JC Martin - LIMSI/CNRS - WP5 WS 27
ExampleManipulator gesture
JC Martin - LIMSI/CNRS - WP5 WS 28
Gesture annotation attributes
Deictic target: self / Camera Manipulator target: Chest / Hairs / Eyebrows / Nose /
Mouth Object in hand: If the character is holding an object,
enter the name of the object. Spatial region: Up / Head / Chest / Down / Extreme
periphery Directness: Linear / Shaped pathway Vertical direction: Upward / Downward Horizontal direction: Leftward / Rightward Sagittal direction: Forward / Backward Hands relationship: Independent / Mirror / Asymmetric
JC Martin - LIMSI/CNRS - WP5 WS 29
Other annotations
Limited set of annotations for Facial expression
Label + Action Unit (combination) Gaze, brows, mouth, chin, nose
Shoulders Arms Global pose and mouvement
JC Martin - LIMSI/CNRS - WP5 WS 30
Future directions Modifications for potential use as one source of
knowledge for WP6 / WP4 Adding temporal evolution in segments Wrist position Fluidity only for between gestures or repetitions ? Integration with other sources of knowledge (temporal)
Validation of annotation Perceptual tests at the different levels of multimodal
annotation Segment of multimodal behavior Annotate common segments + intercoder agreement
Annotation of several videos Evaluation of annotation time
Correlations between emotions and multimodal annotations
JC Martin - LIMSI/CNRS - WP5 WS
31
Architectural Principles of a Software Platform for the Management of Multimodal Emotional Corpora
JC Martin - LIMSI/CNRS - WP5 WS 32
Goals
Guidelines Illustrative combinations of tools
JC Martin - LIMSI/CNRS - WP5 WS 33
Surveys of annotation tools for multimodal corpora
Tools Anvil, TasX,
Surveys ISLE D10, NITE, Harper Eurospeech,
NISLab LREC 2004 paper LREC WS 2002 / 2004
JC Martin - LIMSI/CNRS - WP5 WS 34
Anvil (Kipp 2001)http://www.dfki.uni-sb.de/~kipp/research/index.html
JC Martin - LIMSI/CNRS - WP5 WS 35
TASXhttp://tasxforce.lili.uni-bielefeld.de/
Tiers
Panel switch
Start/End-point
JC Martin - LIMSI/CNRS - WP5 WS 36
Meta-dataMPI tools Editor Browser
JC Martin - LIMSI/CNRS - WP5 WS 37
Platforms examplesWizard of Oz (Buisine et al. 2003)
Schéma de codage
JAVAJAXP
AnnotationsAnnotations
Enregistrementsvidéo
Annotations
Métriques
SPSS
Statistiques
AnnotationsPRAAT ANVIL
34 vidéos
Schéma de codage
JAVAJAXP
AnnotationsAnnotations
Enregistrementsvidéo
Annotations
Métriques
SPSS
Statistiques
AnnotationsPRAAT ANVIL
34 vidéos
JC Martin - LIMSI/CNRS - WP5 WS 38
Requirements / description Requirements of such a platform for emotion
Continuous / discrete Replay / validation
Description Software Data files:
media, meta data Annotations: manual, automatic, mixed
Coding schemes Documentation files Paper forms
JC Martin - LIMSI/CNRS - WP5 WS 39
Architecture
Tools Input / output
Use during various iterations Segmentation Agreement / vote / reduce number of
classes Re-annotation Audio only, video only, audio-video
JC Martin - LIMSI/CNRS - WP5 WS
40
Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews
J.-C. Martin, S. Abrilian, L. Devillers
LIMSI-CNRS, France
JC Martin - LIMSI/CNRS - WP5 WS 41
IntroductionEmotional parameters of mm behaviors
(Montepare et al. 1999) - Hand positions- Gait- Fluidity- Stiffness- Strength- Speed- spatial expansion- Activity
Acted
(Wallbott 1998) - Upper body- Shoulders (up, backward, forward)- Head (downward, backward, turned sideways, bent
sideways)- Arms- Hands- Movement quality (activity, spatial expansion, movement
dynamics, energy, power)- Symmetry
Acted
top related