Multimodal Interaction for Distributed Interactive Simulation Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Clow Center for Human Computer Communication Oregon Graduate Institute of Science and Technology http:// www.cse.ogi.edu /CHCC Presenter: Keita Fujii
47
Embed
Multimodal Interaction for Distributed Interactive Simulation Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Based on the common knowledge, the agent’s goal, and its believes
– Dialog includes• The timing of the phonemes and pauses
• The type and place of the accents
• The type and place of the gestures
• Speech Synthesizer– Generates sound data from the dialogs
Gesture generation
• Gesture is generated through three stepsA) Symbolic Gesture Specification
– Decides what type of gesture to use for each word
B) PaT-Nets (Parallel Transition Networks)– Determines shape, position, transition and timing of
gestures
C) Gesture Generator– Generates actual motion from the information sent by
PaT-Nets
Symbolic Gesture Specification
• Determines the type of gesture– Words with literally spatial content (“check”)
Iconic– Words with metaphorically spatial content (“account”)
metaphoric– Words with physically spatializable content (“this bank”)
deistic– Other new references beat
– Also based on the annotations from the dialog planner and classification of reference (new to speaker and listener, new to speaker but not to listener, or old)
PeT-Nets
• PeT-Net is a finite state machine– Each state represents an action to be invoked
– State transition is made either conditionally or probabilistically
– Thus, a state transition generates a sequence of actions• Gesture PeT-Net generates gestures, Facial PeT-Net generates
facial expressions
parsingGet
gestureinfo
Send gestureinfo to gesture
PaT-NetSend beat
to beatPaT-Net
Beat signaled
Gesture info found
Gesture info complete
Coarticulation
• The structure of PeT-Net allows coarticulation– Two gestures occurs without intermediary relaxations
• I.e., start the next gesture without waiting for the first one to finish
– Coarticulation occurs when there is no sufficient time to finish a gesture
pausing
Startgesture A
Finishgesture A
Startgesture B
Finishgesture B
Gesture Generator
• The animation of a gesture is created as a combination of– Hand shape
– Wrist control
– Arm positioning
– The system tries to get as close as possible to the gesture goals, but may fail because of coarticulation effects
Facial expression generation
• Facial expression is generated through the same steps as gesture
A) Symbolic Facial Expression/Gaze Specification– Decides what type of expression to use for each word
B) Facial/Gaze PaT-Nets– Determines shape, position, transition and timing of
gestures
C) Facial Expression/Gaze Generator– Generates actual motion from the information sent by
• Symbolic Gaze Specification– Generates the following types of gaze expression
• Planning– E.g., look away while organizing thought
• Comment– E.g., look toward the listener when asking a question
• Control– E.g., gaze at the listener when ending speech
• Feedback– E.g., look toward the listener to obtain feedback
PeT-Nets
• Facial expression PeT-Nets– No information in paper
• Gaze PeT-Net– Each node is characterized by a probability
• An action of a node is invoked probabilistically
gaze
planningcomment control
feedback
Short turnBeginning
of turn
accent
Utterancequestion
Utteranceanswer
End of turn Turn request
Within turn
Back channel
Configurationsignal
Facial Expression/Gaze Generator
• Facial expression generator– Classifies an expression into functional groups
• Lip shape, conversational signal, punctuator, manipulator and emblem
– Uses FACS• Represents an expression as a pair of timing and type
• Gaze and head motion generator– Generates motion of eye and head
• Based on the direction of gaze, timing, and duration
Direct Manipulationvs
Interface Agents
Ben Shneiderman and Pattie Maes
Interactions, Nov. and Dec. 1997
Presenter: Keita Fujii
Introduction
• This article is about a debate session in IUI* 97 and CHI** 97
• Topic– Direct Manipulation vs Interface Agent
• Speaker– Ben Shneiderman
• From University of Maryland, Human-Computer Interaction Lab
• Proponent of Direct Manipulation
– Pattie Maes• MIT Media Laboratory
• Proponent of Intelligent Agent
* Intelligent User Interface Workshop **Conference on Human Factors in Computing Systems
Overview
• Direct Manipulation• Software Agent
– Benefits– Criticisms– Misconceptions
• Objections to agent system• Agreement• Q & A
Direct Manipulation(Ben Shneiderman)
• User interface using information visualization techniques that provides– Overview
• How much /what kind of information is in the system– Great control
• E.g., zoom in, scroll, filter out– Predictability
• User can expect what’s happing next– Detail-on-demand
• Benefits– Reduce errors, and encourage exploration
Examples of Direct Manipulation
• FilmFinder– Organizes movies in 2D plane
with years and popularity
• Lifeline– Shows a case history graphically
• Visible Human Explorer– Displays coronal section and
cross sections of a human body
Software agent(Pattie Maes)
• Software agent is the program that is– Personalized
• Knows the individual user’s habits, preferences, and interests
– Proactive• Provides or suggests information to user before being requested
– Long-lived• Keeps running autonomously
– Adaptive• Monitors the use’s interests as they change over time
– Delegate• User can delegate some task to the agent• Agent acts on the user’s behalf
Examples of Software Agent
• Letizia– Pre-loads web pages that the user may be interested in
• Remembrance Agent– Remembers who sent email or whether email is replied
• Firefly– Personal filters / personal critics
• Yenta– Matchmaking agent– Introduces another user who shares the same interests
Benefits of software agent (Pattie Maes)
• Software agents are necessary because– The computer system is getting more complex,
unstructured, and dynamic• E.g., WWW
– The users are becoming more naïve• End users are not trained to use computers
– The number of tasks to be managed with computer are increasing
• Some tasks need to be delegated to somebody
Criticisms of agents(Pattie Maes)
• Well-designed interfaces are better– Even if the interface is perfect, you may just not want to delegate some
tasks to somebody
• Agents make the user dumb– Yes, it’s true. But as long as there’s always an agent available, it’s not
a problem
• Using agents implies giving up all control– You don’t have to have full control. As long as your task is
satisfactorily done, that’s fine
– However, the system must allow user to choose between direct manipulation and task delegation to the agent
Misconceptions about agent (Pattie Maes)
• Agents replaces user interface• Agents need to be personified or anthropomorphized• Agents need to rely on traditional AI
They all are NOT true
Objections to Agent Systemand Responses to the Objections
(Both)• “Agent” is not a realistic solution for making a good user
interface because– Agent cannot be smart and fast enough to make some intelligent
decision for human user
• Direct manipulation is for– Professional users not for end users– Very well structured and organized domain
not for ill structured and dynamic domain
• Agent system can cooperate with Direct Manipulation– E.g., FilmFinder with agent making movie suggestions
• Anthropomorphic interfaces/representation are not appropriate– Agents do not have to be visible
• There is no “agents” in the Firefly Web site– “Agent” has a broader meaning than “software agent,” so
you need to distinguish different types of “agents”• Autonomous robots, synthetic characters, software agents etc
Direct manipulation & agent system(Both)
• Agent is NOT an alternative but a complementary technique to direct manipulation (interface)– Agent system needs a good user interface that provides good
understanding (overview) and control– Agent designer must pay attention to user-interface issues such as
understanding and control
• Two layer model– The user interface level
• Predictable and controllable
– The agent level• Adaptive, proactive system to increase
usability
Visualization /User Interface
Agent System
Q & A
• Q. How do speech technologies affect direct manipulation and agent system?– A. Speech won’t be a generally usable tool because
• It disrupt cognitive process
• Low bandwidth communication
• Ambiguous
– A. Speech can be used as a supportive medium
• Q. How can user interface and/or agent system support time-critical decision-support environment where mistakes are critical?– Agent system is not suitable for such system
because it is very hard to make agents that never make mistake
• Q. How can we build a direct manipulation system for vision challenged or blind users?– Direct manipulation can be used to make an interface for
such users because direct manipulation depends on spatial relationships and blind users often are strong at spatial processing
• Q. What is it about agents that you dislike? (to Ben Shneiderman)– “intelligent agent” notion avoids dealing with interface