YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Thrust 2 07-aug-2013_v3 (1)

Thrust 2: Interaction Modeling (2013-14 academic years)

Defined last year by Shri:1. Analyze and model spatio-temporal context

surrounding (leading to, caused by) behavioral events.

2. Quantitative description and understanding of the underlying structure (e.g., timing, phasing of salient events) and emergence and evolution of interactions.

Page 2: Thrust 2 07-aug-2013_v3 (1)

Progress since Last Thrust 2 Con Call: (April 4, 2013)

1. CVPR paper2. Nine Expeditions-related papers (or more?)

in Interspeech 2013 (Lyon, France)(some are Thrust 2)

3. Synchronization issues for multimodal RABC data-base (all but Kinect).

Page 3: Thrust 2 07-aug-2013_v3 (1)

Thrust 1 / Thrust 2 dependencies

• Thrust 1 is concerned with extracting high/low-level features for further analysis.

• Thrust 1’s raison d’être is to serve Thrust 2 with descriptors.

• Thrust 2‘s obligation is to convey its needs to Thrust 1.

• We have already developed single-mode processing to analyze behavior; now it is time to use multi-mode methods.

Page 4: Thrust 2 07-aug-2013_v3 (1)

Thrust 1 / Thrust 2 dependencies• We need a comprehensive list of Thrust 1

deliverables.1. Speech and Audio examples:

• Voicing intervals• Diarization (who is talking when with cross-talk)• Word locations (KWS)• Non-linguistic events (laughter, whining, crying, sighs)• Prosodic features• Spectral/articulatory features

2. Q-sensors• EDA• Temperature• Accelerometers• Statistical measures associated with various states and timings

Page 5: Thrust 2 07-aug-2013_v3 (1)

Thrust 1 / Thrust 2 dependencies• We need a comprehensive list of Thrust 1

deliverables.3. Vision

• …. (many)

4. Gaze• --- (many)

5. Kinect• --- (many)

Thrust 1: Please fill this out for us.

Page 6: Thrust 2 07-aug-2013_v3 (1)

Thrust 2 projects using RABC (all multi-modal)

1. Engagement score analysis (revisited)– We have engagement scores on all 157 sessions

(only 43 are fully annotated).– Develop optimal categorization using “ground

truth” in the annotated sessions.– Test using non-annotated sessions with Thrust 1

primitives.

Page 7: Thrust 2 07-aug-2013_v3 (1)

Thrust 2 projects using RABC (all multi-modal)

2. Parsing of stages (revisited)– The only attempt was using KWS (~80% correct)– Obvious extension uses Kinect object tracking– Others include EDA, para-linguistic detection, gaze,

posture, etc.

Page 8: Thrust 2 07-aug-2013_v3 (1)

Thrust 2 projects using RABC (all multi-modal)

3. Response to name:– Every RABC session starts with greeting.– After formal session, child’s name is uttered

twice (once from side, once from front)a. What responses are predictable? (All modes)b. Can we ascertain when name was called from

child’s reactions w/o audio cues?c. This will require gaze, head-orientation, EDA,

posture, etc.

Page 9: Thrust 2 07-aug-2013_v3 (1)

Multimodal Retrieval

• We currently have the ability to retrieve interesting regions of RABC session through key-word spotting and diarization.

• Easy extension using any of the other modalities.

• Challenging extensions would be:– Find when child is unhappy (requires multi-modes)– Find when child is not responding.– …

Page 10: Thrust 2 07-aug-2013_v3 (1)

Multimodal Retrieval

Examples: Is a child making eye contact while an examiner talking? (gaze and diarization)Where is a child looking at while an examiner talking? (voice active detector + gaze tracker + head orientation).

Page 11: Thrust 2 07-aug-2013_v3 (1)

Multimodal Linguistic Analysis

Classify examiner’s speech into the four sentence types: 1. Statement2. Question (“Are you ready to play with new toys?”, “where is the yellow duck?”, etc.)3. Exclamation (“It’s a hat! It’s on my head!”)4. Command (“Look at my book”)

• Useful for retrieval and parsing.• Child’s responses and turn-taking to each type can be

tracked.

Page 12: Thrust 2 07-aug-2013_v3 (1)

What Happens During Cross-Talk?

1) terminal overlaps: a speaker assumes the other speaker has or is about to finish their turn. Can include head-nodding and gestures. 2) continuers: examples of the continuer’s phrases are “mm hm” or “uh huh.” (speech and para-linguistics)3) conditional access to the turn: the current speaker yields their turn or invite another speaker to interject in the conversation, usually as collaborative effort. (all sorts of gestures)4) chordal: a non-serial occurrence of turns, such as laughter, smiling, and gesturing.

Page 13: Thrust 2 07-aug-2013_v3 (1)

Example of Multi-Modal Retrieval/Analysis

During the ball/book play in RABC, what is a child response when a ball/book is presented? Is a child looking at the ball/book and the examiner back and forth? (word spotting + gaze tracker + head orientation).

Is a child making vocalizations? If so, is it a positive or negative response? (word spotting + emotion classifier)

Does a child smile when a ball/book is presented? (word spotting + smile detector)

Page 14: Thrust 2 07-aug-2013_v3 (1)

Previous Project Example

Uni-modal paralinguistic event detection– Detection of laughter in children’s speech– Training data from FAU Aibo Emotion Corpus (AEC)

consists of spontaneous vocalizations/verbalizations by adolescents

– Testing on ~10 Rapid ABC sessions yielded accuracy of 70.58%

– Robust laughter detector with good generalization properties that can be used for paralinguistic event diarization

Page 15: Thrust 2 07-aug-2013_v3 (1)

Current Project Example

Multi-modal paralinguistic event detection– Laughter detection using audio– Smile detection using Omron.– These have different time scales and frame rates.– Fusion of confidence scores is likely to improve

accuracy.– Might EDA also be useful for this?

Page 16: Thrust 2 07-aug-2013_v3 (1)

Basic Science in Required for Multi-Modal Analyses: Fusion

Example from USC

Page 17: Thrust 2 07-aug-2013_v3 (1)

Basic Science in Required for Multi-Modal Analyses: Fusion

Example from GT: binary fusion method with different analysis lengths

Page 18: Thrust 2 07-aug-2013_v3 (1)

Opportunistic use of other data as it may become available

• BU children’s diagnostic data. (Helen Tager-Flusberg)– Includes audio and video of at-risk ASD children with more

vocalizations.

• RABC Floor time (natural but very uncontrolled)• ADOS (only available to USC)• CFD table sessions.

Page 19: Thrust 2 07-aug-2013_v3 (1)

Thrust 2 action items

• Set up blog like Thrust 1• Identify who is participating • Monthly con-call for everyone.• Set up time-table for new / revisited

challenges.


Related Documents