Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury [email protected]Schloss Dagstuhl, Germany 29 October 2001 MITRE www.mitre.org/resources/centers/it/maybury/mark.html This data is the copyright and proprietary data of the MITRE Corporation. It is made available subject to Limited Rights, as defined in paragraph (a) (15) of the clause at DFAR 252.227-7013. The restrictions governing the use and disclosure of these materials are set forth in the
19
Embed
Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury [email protected] Schloss Dagstuhl, Germany 29 October 2001 MITRE .
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Intelligent Multimodal Interaction: Challenges and Promise
This data is the copyright and proprietary data of the MITRE Corporation. It is made available subject to Limited Rights, as defined in paragraph (a) (15) of the clause at
DFAR 252.227-7013. The restrictions governing the use and disclosure of these materials are set forth in the aforesaid clause.
MITRE
Information
What are we talking about?Information Perception Cognition Emotion
VisualizationCognition
Image Source: Dr. Nahum Gershon and Ellaine Mullen, Copyright The MITRE Corporation
Speech
Haptics/Gesture
Facial
See
Smell
MITRE
Why Multimedia?Direct Manipulation Natural Language
Above modified from Cohen, P. 1992. The role of natural language in a multimodal interface. In Proceedings of ACM SIGGRAPH Symposium on User Interface and Software and Technology (UIST), Monterey, CA 143-149.
MITRE
Why Multimedia?
Evidence users prefer both:- Flexibility (user, task, situation) - e.g., speech text, pen #’s - Efficiency and expressive power
Handwriting/Pen Speech
Strengths
1. Intuitive2. Visual feedback3. Persistent/ease of editing4. Multifunctional
Mixed media (e.g., text, graphics,video, speech and non-speechaudio) and mode (e.g., linguistic,visual, auditory) displays tailoredto the user and context. Life likeanimated characters.
DialogControl
Pre-scripted interactionswith standard dialoguepresentations (e.g.,windows, menus, buttons)
Mixed initiative naturalinteraction that dealsrobustly with context shift,interruptions, feedback,and shift of locus ofcontrol.
Ability to tailor flow and controlof interactions and facilitateinteractions including errordetection and correction tailoredto individual physical, perceptualand cognitive differences.Motivational and engaging life-like agents.
Agent/UserModeling
Limited models of userinterests (e.g., viaexplicitly solicited usermodels). Recommendertechnology.
Unobtrusive learning,representation, and use ofmodels of user/agents,including models ofperception, cognition, andemotion.
Enables tracking of usercharacteristics, skills and goals inorder to adapt and enhanceinteraction.
Simplification of functionality,possibly limited by user and/orcontext models. Automated taskcompletion. Task help tailored tosituation, context, and user.Mobile and substitutableinterfaces for disabled users.
MITRE
MAPS
Athens
VIDEOPlato
Aristotle
NATURAL LANGUAGE
Socrates, Plato, and Aristotle were Greek philosophers ...
Multimedia Presentation Generation:“No Presentation without Representation”
Philsopher Born DiedSocrates 470 399Plato 428 348Aristotle 384 322
TABLES
DATAPhilosopher Aristotle Plato Socrates
Born 384 BC 428 BC 470 BCDied 322 BC 348 BC 399 BCWorks Poetics NoneEmphasis VirtueScience Conduct
Republic
GRAPHS
Lifespan
500 450 400 350 300 BC
Plato
Aristotle
Socrates
Lifespan
020406080
100
Socrates Plato Aristotle
Age
ANIMATEDAGENTS
MITRE
Common Presentation Design Tasks
Co-constraining Cascaded processes
CommunicationManagement
ContentSelection
PresentationDesign
Media Allocation
Media Realization
Media Coordination
Media Layout
Length affects layout in space or time
(e.g., EYP, audio)
Information, task, user …
Expressivity of different languages
e.g., “ven aca” gesture
MITRE
Common Representations: Communicative Acts[Maybury, 1993; Wahlster, Andre, Rist 1993]
PHYSICAL ACT LINGUISTIC ACT GRAPHICAL ACT
DEICTIC ACTREFERENTIAL/ATTENTIONAL ACT
DEICTIC ACT
point, tap, circle highlight, blink,circle etc.
indicate direction ILLOCUTIONARY ACT indicate directioninform
Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998)
Presentation Dialog ControlApplication
Interface
Integration
Request
Initiation
Response
MITRE
DARPA Galaxy Communicator
LanguageGeneration
LanguageGeneration
Text-to-SpeechConversion
Text-to-SpeechConversion
AudioServer
AudioServer
DialogueManagement
DialogueManagement
ApplicationBackend
ApplicationBackend
ContextTracking
ContextTracking
FrameConstruction
FrameConstruction
SpeechRecognition
SpeechRecognition
Hub
The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems
Open source and documentation available at fofoca.mitre.org and sourceforge.net/projects/communicator
MITRE
An Example: Communicator-Compliant Emergency Management Interface