Top Banner
Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2
27

Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Dec 18, 2015

Download

Documents

Stanley Potter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Data collection and Multimodal Annotation Tools

Dagstuhl 2001

Workgroup 2

Page 2: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Members

Permanents• Lisa Harper• Michael Kipp• Emiel Krahmer• Jean-Claude Martin• Dagmar Schmauks

Visiting Scientists• Harry Bunt• Kioto Hasida• John Lee• Thomas Rist• Laurent Romary

Page 3: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 4: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Multimodal Corpus• Sound

– human speech (e.g. MPEG7)• transcription, (morphology)

• part-of-speech

• syntax (linguistic DS)

• binary relations– thematic roles

– rhetorical relations

– co-reference

– computer voice, sound, music– environmental sounds

Page 5: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Multimodal Corpus (2)• Vision

– head: movement, gaze, facial expression– gesture: hands/arms

• basic phases

• formal features(handshape, trajectories, direction, location etc.)

• encode qualities (Laban efforts?)

• functional/semiotic categories (emblem, iconic, deictic, self-adaptors etc.)

– posture: including feet/legs– computer graphics (charts/tables), characters– static/dynamic environment (people/objects):

• moving camera

Page 6: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Multimodal Corpus (3)• Haptic

– pressure of feet/hands/back/on seat, texture– force feedback

• Biometric– heartrate, eye dilation, skin sensitivity, eyebrow

movement, breathing

• Smell & taste (VR)• Balance (VR)• Thermal (VR)

– body/object temperature, conduit properties

Page 7: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Multimodal Corpus (4)

• Within-modality/cross-modality relations– mirror behavior, synchronized behavior,

repeated behavior, postural congruence– distance and touch

• Behavioral/Social units?often across modalities!

Page 8: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 9: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Existing Corpora:Meta Survey

• Existing surveys: – ISLE and NIMM (D8, EU & US)– ELRA (EU)– COCOSDA (Japan)– LDC (US)– TalkBank (US)

Page 10: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Existing Corpora:Dagstuhl 2001

• Survey with Dagstuhl participants

• Collected 28 questionnaires

• From 24 different institutes

number of

corpora

number of participants

0 6

1 12

2-9 8

10+ 2

Page 11: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Questionnaire

• annotated modalities– speech: 20

– gestures: 17

– facial expression: 5

– gaze: 3

– posture: 3

• file format– analogue: 4

– digital: 12

– I don't know: 4

• tool– own tool: 9

– other tool: 3

– no tool: 8

– I don't know: 1

• application areas– tourism/navigation (10),

consumer electronics, info kiosk, realty, storytelling, instruction, cinema, graphical design, everyday gestures, education, car, face guessing, games, talk shows

Page 12: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Questionnaire (2)

• Languages– English: 11

– German: 5

– French: 2

– Japanese: 3

– Italian: 2

– Dutch, Swedish, Finnish: 1

• Planning to collect: 21

Page 13: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 14: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Data Collection:Methodology?

• Legal issues: – ethical– commercial– country dependent legislature

• Practical guidelines (best practice)– technical setup for recording– field-specific coder training, models for coding

manuals

• Specify meta-data

Page 15: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 16: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Coding Schemes

• Survey on existing schemes: ISLE D9

• Guidelines for developing schemes:– encoding vs. inference– can scheme accommodate semantics or

generation languages for MM players (MPEG)

• Standardization– partial standards like in speech– standards for computer output log files

(graphics output, locations, xml, trajectories, time-stamping, granularity)

Page 17: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 18: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Tools

• Surveys of existing tools: – ISLE D11– (Bigbee, Loehr, Harper 2001)– TalkBank proposal

• Underlying frameworks: – track-based– annotation graphs– spatial annotation?

Page 19: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Tools: Checklists• Checklist for coding support

– fast and efficient annotation

– efficient view, search & find, customizable

– extensibility of annotation

– easy access to scheme definitions (online)

– automatic extraction of modality-specific specimen (images, sound bits, transcription sequences)

• Checklist for multi-coder support– update/merge, concurrent coding, reliability

• Checklist for Import/Export

Page 20: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Tools: Visions• Bootstrapping

(semi-automatic or fully automatic annotation)

• Use MM techniques for coding tools (3D, haptic, VR)

• Standardized analysis (e.g. metrics) and visualization (metaphors)

• Modular generic framework for– tools– schemes

Page 21: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Tools (4)

Annotation Framework (Tracks, types, objects etc.)

CodingTool

CodingScheme

specificanalysis

MLclassifier

parser

Logical Layerdata

viewer

generalanalysis

Page 22: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Annotation Framework (Tracks, types, objects etc.)

CodingTool

schemeframework

analysismodule

MLclassifier

parser speech

gaze

gesture

Page 23: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 24: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Organizations

• Initiatives:– EAGLES/ISLE

– ATLAS, MATE/NITE

– TalkBank, Childes

• International:– ELRA/ELDA, US? Asia?

• National agencies (Eurospeech):– BAS, LDC, MPI Nijmegen

Page 25: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Needs

• What is a multimodal corpus?

• What corpora do exist?

• How to collect a corpus?

• How to develop/choose a coding scheme?

• What tool to develop/choose?

• What is the organizational infrastructure?

• What is the future?

Page 26: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Future

• Data collection project– sample videos with illustrative MM data– pre-coded minimal data (speech transcription)

• Comparison/integration of schemes

• Encourage collaborative coding?

Page 27: Data collection and Multimodal Annotation Tools Dagstuhl 2001 Workgroup 2.

Future (2)

• Workshop on LREC Language Resources and Evaluation Canary Islands! May 2002– deadline: 20 Nov 2001– paper on Dagstuhl and follow-ups– coding excercise based on data coll.– questionnaire based on Dagstuhl survey