Top Banner
JC Martin - LIMSI/CNRS - WP5 WS 1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS, France
41

JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

Mar 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS

1

Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews

J.-C. Martin, S. Abrilian, L. Devillers

LIMSI-CNRS, France

Page 2: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 2

Outline

Introduction Goals Requirements on annotation Emotional parameters of mm

behaviors Coding scheme

1st coding scheme and annotation 2nd coding scheme and example on 1 video

Future directions

Page 3: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS

3

Introduction

Page 4: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 4

IntroductionGoals

How modalities correlate in non acted emotions ? Annotations and models : one source of knowledge

Coordination between modalities during non-acted emotion Synthesis of non acted spontaneous multimodal emotions in

ECAs How to code/represent multimodal emotional behavior ? Methodology (which attributes can be annotated easily

manually) Trade-off / intermediate level

Manual global text free whole video Manual medium/high order signs Automatic low level signs

WP5 + WP6 + WP4 + (WP3)

Page 5: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 5

Introduction Requirements on coding scheme

Enable annotation (or computation) Literature: Main attributes of emotional behaviors Corpus based approach: Cover behaviors observed in EmoTV Multi-level annotation of temporal data

Global annotation: Manual annotation of multimodal signs for the global sequence Computations from manual annotations in each modality (mono, red,

comp) Emotional segment level

Computations from manual annotations in each modality (mono, red, comp)

Provide one source of knowledge for ECA specification Enable reliability and readability Annotation time

Page 6: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 6

IntroductionEmotional parameters of mm behaviors

Psychology & behavior Montepare, J., Koff, E., Zaitchik, D. and Albert, M. (1999). "The

use of body movements and gestures as cues to emotions in younger and older adults." Journal of Nonverbal Behavior.

Wallbott, H. G. (1998). "Bodily expression of emotion." European Journal of Social Psychology

Detection of emotions + relevant non-verbal behaviors Acted data +/- Basic emotions Age, Gender Facial expression masked

Expressivity in ECAs (Hartman & Pelachaud 2004)

Page 7: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 7

IntroductionEmotional parameters of mm behaviors

(Boone and Cunningham 1996; Boone and Cunningham 1998)

-Changes in tempo-Directional changes-Frequency-Muscle tension-Duration

Acted

(DeMeijer 1991) -Trunk (stretching, bowing)-Arm (opening, closing)-Vertical direction (upward, downward)-Sagittal direction (forward, backward)-Force (strong, light)-Velocity (fast, slow)-Directness

 Acted

Page 8: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 8

Introduction Multimodal corpora from TV clips

Communicative functions Kipp (2003) MUMIN (Alwood et al. 2004) Musical Score (Magno Caldognetto et al.

2004) Emotions / informal annotation

Orage (Atifi and Marcoccia 2001)

Page 9: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS

9

Coding Scheme

Page 10: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 10

Current status

1st annotation on 35 clips from EmoTV with 2

coders 2nd

Iterative definition and application to 1 clip of EmoTV using Anvil (SA, JCM)

Annotation guide written 1 meeting with Catherine Pelachaud Paris 8 for

investigating use for WP6

Page 11: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 11

Mouvement quality Annotated vs. computed

Quality (annotated) Number of repetitions Fluidity: smooth / normal / jerky Strength: soft / normal / hard Speed: slow / normal / fast Spatial expansion: contracted / normal / expanded

Computed Start / end / duration Mvt direction, type, angle approximation

Torso : Computed from Pose track

Page 12: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 12

Annotation #1Multimodal coding scheme

Speech transcription including non-verbal events (laughter, cry, …);

Posture pose; posture shift including speed and action (4 cues with

3 to 10 attributes per each cue, for instance: cue = action, attribute = walk);

Gestures phases of gesture (preparation, stroke, retraction), handedness, speed, energy, spatial region, hand shape,

direction of gesture, gesture type (beats, adaptors, deictic…);

Facial expressions subset Facial Animation Parameters (FAPs)

Page 13: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 13

Annotation #1Statistics

most frequently annotated behaviors : facial expressions (78.6% of annotated multimodal behaviors for coder1, 80.4% for coder2), gestures (11.3% for coder1, 11.9% for coder2), posture (10% for coder1, 7.7% for coder2).

most frequent attributes were: gaze direction (26.8% for coder1, 17% for coder2), head movements (23.5% for coder1, 21% for coder2), blinking (15.8% for coder1, 17.6% for coder2), eyebrows movements (10% for coder1, 9.3% for coder2).

quantitatively agreed for some attributes (number of annotations of preparation and stroke gestures phases, number of annotation of speed of posture shift).

Coder1 was more sensitive than coder2 in all the modalities. Disagreements occurred on body poses, and gesture type and energy.

Coder1 annotated subtle body moves, contrary to coder2 who annotated well visible movements. Coder2 associated gesture’s energy with gesture’s speed, while coder1 differentiated both attributes, perceiving that a gesture might have a high energy and a slow motion.

Page 14: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 14

Annotation #1 Statistics

Many cues in coder1 annotations are shared by several emotion labels (blinking, head movements…), but there are also typical cues for some emotions such as lowering hands when despaired, slow body movement for serenity.

difference between behaviors linked to strong (anger, exaltation…) and weak (irritation, serenity), attributes for discriminating attributes: are speed and energy for gestures, and speed for body movement.

Serenity involves no gestures, whereas exaltation is often accompanied by fast and energetic gestures.

Anger is correlated with fast and intense gestures, whereas irritation involves slow and low-intensity gestures.

Page 15: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 15

Annotation #1Quantitative analysis

Gesture - Phase - Speed

fast56%moderate

27%

slow17%

fast

moderate

slow

Gesture - Phase - Speed

fast47%

moderate52%

slow1%

fast

moderate

slow

Low intercoder agreement on some attributes Reduce the number of values 7 => 3 Improve annotation protocole & guide

Page 16: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 16

Tracks or group Tracks

Torso Head Facial expressions Global body Shoulders (Arms) (Gestures)

Alternation of pose and movements Torso, head, shoulders

Common value for attributes: Asymetry, other

Page 17: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 17

Methodology

Annotation guide Track per track Annotate emotion vs. Communication

emotionally rich clips reduced interaction (monologue in

interviews) exagerated mouth / brows movements

Page 18: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 18

Torso Movement direction to be computed from

pose Poses

3 dimensions twist, side-side, bend rotational, lateral, sagittal

Labels + approximation of angles

Page 19: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 19

Torso Pose Twist

Page 20: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 20

Torso PoseSide-side / Bend

Page 21: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 21

ExampleTorso fast movement

Page 22: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 22

Head Mouvements Numerous and combined

=> direction annotated in movement track

Primary & secondary Position Mouvement

FACS

Page 23: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 23

ExampleHead : 2 directions - speed

Page 24: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 24

Gestures structural transcription(Kipp 04; Efron 1941; McNeill 92)

Preparation Bringing arm and hand into stroke position, note that changing hand shape before/after moving the arm belongs to the preparation

Stroke The most energetic part of the gesture

SequenceOfStroke A number of successive strokes; all strokes should be covered by this phase.

Retract Movement back to rest position; in sitting position this is usually the arm rest, the lap or folded arms.

Hold A phase of stillness just before or just after the stroke, usually used to defer the stroke so that it coincides with a certain word.

Page 25: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 25

Gesture functional transcriptionManipulator Contact with body or object. Movement which serve functions of

drive reduction or other non-communicative functions, like scratching oneself.

Beat Synchronised with the emphasis of the speech.

Deictic Arm/hand is used to point at an existing or imaginary object.

Representational Represents attributes, actions, relationships of objects and characters (concrete or abstract)

Emblem Movement with a precise, culturally defined meaning, like the eye-wink, gestures signalling the intellectual deficiency of another person or obscene gestures.

Page 26: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 26

ExampleHomogeneous sequence of stroke

Page 27: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 27

ExampleManipulator gesture

Page 28: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 28

Gesture annotation attributes

Deictic target: self / Camera Manipulator target: Chest / Hairs / Eyebrows / Nose /

Mouth Object in hand: If the character is holding an object,

enter the name of the object. Spatial region: Up / Head / Chest / Down / Extreme

periphery Directness: Linear / Shaped pathway Vertical direction: Upward / Downward Horizontal direction: Leftward / Rightward Sagittal direction: Forward / Backward Hands relationship: Independent / Mirror / Asymmetric

Page 29: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 29

Other annotations

Limited set of annotations for Facial expression

Label + Action Unit (combination) Gaze, brows, mouth, chin, nose

Shoulders Arms Global pose and mouvement

Page 30: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 30

Future directions Modifications for potential use as one source of

knowledge for WP6 / WP4 Adding temporal evolution in segments Wrist position Fluidity only for between gestures or repetitions ? Integration with other sources of knowledge (temporal)

Validation of annotation Perceptual tests at the different levels of multimodal

annotation Segment of multimodal behavior Annotate common segments + intercoder agreement

Annotation of several videos Evaluation of annotation time

Correlations between emotions and multimodal annotations

Page 31: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS

31

Architectural Principles of a Software Platform for the Management of Multimodal Emotional Corpora

Page 32: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 32

Goals

Guidelines Illustrative combinations of tools

Page 33: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 33

Surveys of annotation tools for multimodal corpora

Tools Anvil, TasX,

Surveys ISLE D10, NITE, Harper Eurospeech,

NISLab LREC 2004 paper LREC WS 2002 / 2004

Page 34: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 34

Anvil (Kipp 2001)http://www.dfki.uni-sb.de/~kipp/research/index.html

Page 35: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 35

TASXhttp://tasxforce.lili.uni-bielefeld.de/

Tiers

Panel switch

Start/End-point

Page 36: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 36

Meta-dataMPI tools Editor Browser

Page 37: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 37

Platforms examplesWizard of Oz (Buisine et al. 2003)

Schéma de codage

JAVAJAXP

AnnotationsAnnotations

Enregistrementsvidéo

Annotations

Métriques

SPSS

Statistiques

AnnotationsPRAAT ANVIL

34 vidéos

Schéma de codage

JAVAJAXP

AnnotationsAnnotations

Enregistrementsvidéo

Annotations

Métriques

SPSS

Statistiques

AnnotationsPRAAT ANVIL

34 vidéos

Page 38: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 38

Requirements / description Requirements of such a platform for emotion

Continuous / discrete Replay / validation

Description Software Data files:

media, meta data Annotations: manual, automatic, mixed

Coding schemes Documentation files Paper forms

Page 39: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 39

Architecture

Tools Input / output

Use during various iterations Segmentation Agreement / vote / reduce number of

classes Re-annotation Audio only, video only, audio-video

Page 40: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS

40

Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews

J.-C. Martin, S. Abrilian, L. Devillers

LIMSI-CNRS, France

Page 41: JC Martin - LIMSI/CNRS - WP5 WS1 Manual Annotation of Multimodal Behaviors in Emotionnal TV Interviews J.-C. Martin, S. Abrilian, L. Devillers LIMSI-CNRS,

JC Martin - LIMSI/CNRS - WP5 WS 41

IntroductionEmotional parameters of mm behaviors

(Montepare et al. 1999) - Hand positions- Gait- Fluidity- Stiffness- Strength- Speed- spatial expansion- Activity

Acted

(Wallbott 1998) - Upper body- Shoulders (up, backward, forward)- Head (downward, backward, turned sideways, bent

sideways)- Arms- Hands- Movement quality (activity, spatial expansion, movement

dynamics, energy, power)- Symmetry

Acted