German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.
Post on 31-Mar-2015
215 Views
Preview:
Transcript
German Research Center for Artificial IntelligenceDFKI GmbH
Stuhlsatzenhausweg 366123 Saarbruecken, Germany
phone: (+49 681) 302-5252/4162fax: (+49 681) 302-5341e-mail: wahlster@dfki.de
WWW:http://www.dfki.de/~wahlster
Wolfgang Wahlster
Language Technologies for the Mobile Internet Era
© W. Wahlster
Multimodal Interfaces to 3G Mobile Services
Market studies (May 2002) predict:
Cumulative revenues of almost 1 trillion € from launch until 2010
Multimodal UMTS Systems
Non-voice service revenues will dominate
voice revenues by year 3 and comprise 66% of 3G service revenues
by 2010
Non-voice service revenues will dominate
voice revenues by year 3 and comprise 66% of 3G service revenues
by 2010
322 billion € in revenues in 2010
322 billion € in revenues in 2010
In 2010 the average 3G subscriber will spend about 30 € per month
on 3G data services
In 2010 the average 3G subscriber will spend about 30 € per month
on 3G data services
© W. Wahlster
Multimodal UMTS Systems
Intelligent Interaction with Mobile Internet Services
Access to web content and web services anywhere
and anytime
Access to corporate networks
and virtual private networks from any device
Access to edutainment and infotainment servicesAccess to edutainment
and infotainment services
Access to all messages (voice, email, multimedia, MMS)
from any single device
Access to all messages (voice, email, multimedia, MMS)
from any single device
PersonalizationPersonalization
LocalizationLocalization
© W. Wahlster
Mobile Messaging Services Evolution: From SMS to MMS
Infrastructure
Customer Expectation Applications
Text
Standard Phones
Ubiquity
Youth Focus
Limited Enhancement
SS7
SS7
SMSC
MMS Relay and Servers
UMTS
IP/MPLS Protocols
EMS Phones
MMS Phones
Integrated Image
Capture
Smart Phones
Pictures Audio
Multimedia
VideoEnhanced Text
Personalized Services
Location-based Services
Emotional Experience
Enhanced Message Creation
MMS
SMS
EMS
SMSC
Terminals
Language Technologies for MMS: - Speech Synthesis (with Affect) - Multimodal Authoring Interface - Speech-based Retrieval of Media Objects
© W. Wahlster
From Spoken Dialogue to Multimodal Dialogue
SmartKom
Third Generation UMTS Phone
Speech, Graphics and Gesture
Verbmobil
Today‘s Cell Phone
Speech only
© W. Wahlster
Spoken Dialogue
Graphical Userinterfaces
GesturalInteraction
MultimodalInteraction
Merging Various User Interface Paradigms
Facial Expressions
Haptic Input
© W. Wahlster
System
InputChannels
OutputChannels
Storage
HD DriveDVD
visual
tactile
auditory
haptic
MEDIA (physical information carriers)
MODALITIES(human senses)
language graphics gesture
User
CODE (systems of symbols)
mimics
Using All Human Senses for Intuitive Interaction: Code, Media and Modalities
© W. Wahlster
Symbolic and Subsymbolic Fusion of Multiple Modes
SpeechRecognition
GestureRecognition
ProsodyRecognition
Facial ExpressionRecognition
LipReading
SubsymbolicFusion
- Neuronal Networks- Hidden Markov Models
SymbolicFusion
- Graph Unification - Bayesian Networks
Reference Resolution and Disambiguation
Semantic Representation
© W. Wahlster
Mutual Disambiguation of Multiple Input Modes
The combination of speech and vision analysis increases the robustness and understanding capabilities of multimodal user interfaces.
Speech Recognition + Lip Reading
increases robustness in noisy environments
Speech Recognition + Gesture Recognition (XTRA, SmartKom)
referential disambiguation andfocus control
Speech Recognition + Facial Expression Recognition (SmartKom)
recognition of irony, sarcasm and scope disambiguation
© W. Wahlster
SmartKom-Public: A Multimodal
Communication Kiosk
SmartKom-Mobile: A Handheld
CommunicationAssistant
SmartKom: A Transportable Interface Agent
MediaAnalysis
Kernel ofSmartKomInterface
Agent
Interaction Management
ApplicationManage-
ment
MediaDesign
SmartKom-Home/Office: Multimodal Portal to Information Services
© W. Wahlster
SmartKom`s SDDP Interaction Metaphor
SDDP = Situated Delegation-oriented Dialogue Paradigm
User
specifies goal delegates task
cooperate on problems
asks questions presents results
Service 1 Service 1
Service 2Service 2
Service 3Service 3
Webservices
PersonalizedInteraction Agent
See: Wahlster et al. 2001 , Eurospeech
© W. Wahlster
Multimodal Input and Output in the SmartKom System
Where wouldyou like to
sit?
© W. Wahlster
I‘d like toreserve ticketsfor this movie. Where would
you like to sit?
I‘d likethese two
seats.
Multimodal Interaction with a Life-like Character
User Input: Speech and Gesture
Smartakus Output:Speech, Gesture andFacial Expressions
User Input: Speech and Gesture
© W. Wahlster
Using Facial Expression Recognition forAffective Personalization
(1) Smartakus: Here you see the CNN program for tonight.
(2) User: That’s great.
(3) Smartakus: I’ll show you the program of another channel for tonight.
(2’) User: That’s great.
(3’) Smartakus: Which of these features do you want to see?
Processing ironic or sarcastic comments
© W. Wahlster
SmartKom: Intuitive Multimodal Interaction
MediaInterfaceEuropean Media LabUinv. Of
MunichUniv. ofStuttgart
Saarbrücken
Aachen
Dresden Berkeley
Stuttgart
MunichUniv. ofErlangen
Heidelberg
Main Contractor
DFKISaarbrücken
The SmartKom Consortium:
Project Budget: € 25.5 millionProject Duration: 4 years (September 1999 – September 2003)
Ulm
© W. Wahlster
The SmartKom Demonstrator System
Camera for Gestural Input
Microphone
Multimodal Control of TV-Set
Multimodal Control of VCR/DVD Player
© W. Wahlster
A Demonstration of SmartKom’s MultimodalInterface for the German President Dr. Rau
© W. Wahlster
• Seamless integration and mutual disambiguation of multimodal input and output on semantic and pragmatic levels
• Situated understanding of possibly imprecise, ambiguous, or incom-plete multimodal input
• Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models
• Adaptive generation of coordinated, cohesive and coherent multimodal presentations
• Semi- or fully automatic completion of user-delegated tasks through the integration of information services
• Intuitive personification of the system through a presentation agent
Salient Characteristics of SmartKom
© W. Wahlster
Multimodal Input and Output in SmartKomFusion and Fission of Multiple Modalities
Input by the User
Output by the Presentation agent
Speech
Gesture
FacialExpressions
+
+
+
+
+
+
© W. Wahlster
Which feature films are shown
tonight on TV?
Combination of Speech and Gesture in SmartKom
I show you a survey
of tonight's TV films.
I can't find anything interesting.
Then I'llgo to the movies.
Here you see a programme listing of the movies shown in Heidelberg today.
This one I would like to see. Where is it shown?
On this map all movie theatres
are highlighted, that areshowing "A Little Christmas Story".
© W. Wahlster
Multimodal Input and Output in SmartKom
There I would like to get
a reservation.
In this movie theatre a reservation
is not possible.
Then let's check another theatre.
What about this one?
This overview lists all show times for the movie "A Little Christmas Story"
in the movie theatre "Castle".
Here I would like
to get a reservation.
Please show me where you would like
to be seated.
I would like to get
two seats here.Is this okay? Sure.
I have reserved the seats.
Your confirmation number is 635.
You can pick up the tickets
till half an hour before
the show at the
ticket box.
Okay. Thank you. Good Bye.Good
bye.
© W. Wahlster
Personalized Interaction with WebTVs via SmartKom (DFKI with Sony, Philips, Siemens)
User: Switch on the TV.
Smartakus: Okay, the TV is on.
User: Which channels are presenting the latest news right now?
Smartakus: CNN and NTV are presenting news.
User: Please record this news channel on a videotape.
Smartakus: Okay, the VCR is now recording the selected program.
Example: Multimodal Access to Electronic Program Guides for TV
© W. Wahlster
?
e.g. 60 x 90 pixel b/we.g. 1024 * 768 pixel 24-bit color
The Need for Personalization: Adaptive Interaction with Mobile Devices
© W. Wahlster
PEACH: „Beaming“ A Life-Like Character FromA Large Public Display to a Mobile Personal Device
PEACH: Personalized Edutainment in Museums (IRST – DFKI)
© W. Wahlster
A “Web of Meaning“ has more Personalization Potential than a “Web of Links“
Three Layers of Webpage Annotations
cf.: Dieter Fensel, James Hendler, Henry Liebermann, Wolfgang Wahlster (eds.)Spinning the Semantic Web, MIT Press, November 2002
PersonalizationPotential
OWLDAML + OIL
Content high
Structure XML medium
Layout HTML low
© W. Wahlster
Personalization
Mapping Web Content Onto a Variety of Structures and Layouts
From the “one-size fits-all“ approach of static webpages to the “perfect personal fit“ approach of adaptive webpages
StructureXML1 XML2 XMLn
ContentOWL
LayoutHTML11 HTML1m HTML21 HTML2o HTML31 HTML3p
© W. Wahlster
SmartKom: Towards Multimodal and Mobile Dialogue Systems for Indoor and Outdoor Navigation
Seamless Integration of Various Positioning Technologies
• GSM/UMTS cells
• GPS
• Infrared
• Wavelan, Bluetooth
Using the same device for driving and walking directions
Speech and Gesture Input Graphics and Speech Output
© W. Wahlster
Spoken Dialogues with the Car Navigation System: SENECA
Product Announcement for E-Class Mercedes: End of 2002
© W. Wahlster
Getting Driving and Walking Directions via SmartKom
User: I want to drive to Heidelberg.
Smartakus: Do you want to take the fastest or the shortest route?
User: The fastest.
Smartakus: Here you see a map with your route from Saarbrücken to Heidelberg.
SmartKom can be used for Multimodal Navigation Dialogues in a Car
© W. Wahlster
Getting Driving and Walking Directions via SmartKom
Smartakus: You are now in Heidelberg. Here is a sightseeing map of Heidelberg.
User: I would like to know more about this church!
Smartakus: Here is some information about the St. Peter's Church.
User: Could you please give me walking directions to this church?
Smartakus: In this map, I have high-lighted your walking route.
© W. Wahlster
SmartKom: Multimodal Dialogues with a Hybrid Navigation System
© W. Wahlster
SmartKom, please look forthe nearest parking lot.
SmartKom, please look forthe nearest parking lot.
The parking garageat the main station provides
300 slots. Opening hours are from 6 am to 11 pm.
Do you want to get there?
The parking garageat the main station provides
300 slots. Opening hours are from 6 am to 11 pm.
Do you want to get there?
Spoken Navigation Dialogues with SmartKom
No, please tell me aboutthe next parking option. No, please tell me aboutthe next parking option.
The Market parking lotprovides 150 slots. It is opened
24 hours a day. Do you want to get there?
The Market parking lotprovides 150 slots. It is opened
24 hours a day. Do you want to get there?Yes, pleaseYes, please
I‘ll bring you to the Marketparking lot.
I‘ll bring you to the Marketparking lot.
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
Embedded Speech Understanding
Content Access (eg. Map Updates)
Webservices
Distributed Speech Understanding
Aurora Speech Features
Speech Understanding System With Feature Interface
Remote Speech Understanding
Java-based Voice Streaming Speech
UnderstandingSystem
A Spectrum of Client/Server Architectures for Mobile Multimodal Systems: From Thin to Fat Clients
© W. Wahlster
M3I: A Mobile, Multimodal, and Modular Interface of DFKI
IBM Embedded Via Voice
iPAQ JORNADA
C++ Embedded Java
Java-based Voice Streaming
SmartKom‘s Multimodal Dialogue Engine
1. Hybrid Speech Understanding =
Embedded + Remote/Distributed Speech Understanding
Small Vocabulary
Large Vocabulary (Topic Detection)
2. Resource-Adaptive Speech Processing:
Availability of a Server Improves the Coverage and Quality
© W. Wahlster
Example of Embedded Multimodal DialogueSystem M3I for Pedestrian Navigation (DFKI)
Spoken and Gestural Input combined with graphics and speech output on an iPAQ
© W. Wahlster
Java-Based Voice Streaming for HybridSpeech Understanding in M3I (DFKI)
© W. Wahlster
SmartKom sends a note to the user or activates an alarm as soon as the user approaches an exhibit that matches the specification of an an item on the ActiveList.
ActiveList‘s spatial alarm can be combined with:
- route planning and navigation
- temporal and spatial optimization
of a visit
SmartKom‘s Added-Value Mobile Service ActiveList
Please let me know, when I pass a shop selling batteries.
© W. Wahlster
SmartKom‘s Added-Value Mobile Service SpotInspector
What‘s going on at the castle right now?
SmartKom allows the user to have remote visual access to various interesting spots via a selection of webcams – showing current waiting queues, special events and activities.
SpotInspector can be combined with:
- multimedia presentations of the
expected program for these spots
- route planning and navigation
to these spots
© W. Wahlster
SmartKom‘s Added-Value Mobile Service PartnerRadar
Where are Lisa und Tom ?What are they looking at?
SmartKom helps to locate and to bring together members of the same party.
Involved Technologies
- Navigation and tour instructions
- Monitoring of group activity
- Additional information on exhibits
that are interesting for the whole
party.
© W. Wahlster
Reflectors Photo Detector
Speaker
Command Button
Microphone
Fingerprint Recognizer
Ultimate Simplicity: One-Button Mobile Devices
8hertz technologies Germany
CARC Cyber Assist Research Center Japan
UMTS-Doit: The First Test and Evaluation Center for UMTS-based Multimodal Speech Services in Germany
Mobile Network
Internet Content Provider
Gigastream UMTS Navigation Switch
E1/ATM
RNC
Munich
Node B at DFKI Saarbrücken PSTN, Telephone System
UMTS-Doit Server
Cooperation between and
© W. Wahlster
UMTS Applications in a Mercedes: WebcamProviding a Look-Ahead of the Traffic Situation
© W. Wahlster
Embassi: Multimodal Music Selection in a Car
© W. Wahlster
UMTS Application in a Mercedes: Language-based Music Download
DFKI Spin-off: Natural Language Music Search
© W. Wahlster
MP3 music filesfrom the Web
Rist & Herzog forBlaupunkt
Personalized Car Entertainment (DFKI for Bosch)
© W. Wahlster
Empirical andData-Driven Models
of Multimodality
2002
2005
Advanced Methodsfor Multimodal Communication
Computational Models
of Multimodality
Adequate Corpora for MM Research
Mobile, Human-Centered, andIntelligent Multimodal Interfaces
MultimodalInterface Toolkit
Research Roadmap of Multimodality 2002-2005
XML-EncodedMM Human-Human and
Human-Machine Corpora
Mobile MultimodalInteraction Tools
Standards for the Annotation of MM Training Corpora
Examples of Added-Value of Multimodality
MultimodalBarge-In
Markup Languagesfor Multimodal Dialogue
Semantics
Models for Effective andTrustworthy MM HCI
Collection of Hardest and MostFrequent/Relevant Phenomena
Task- , Situation- and User- Aware
Multimodal Interaction
Plug- and Play Infrastructure
Toolkits for Multimodal Systems
Situated and Task-Specific MM Corpora
Common Representation ofMultimodal Content
Decision-theoretic, Symbolic and Hybrid Modules for MM Input Fusion
Reusable Componentsfor Multimodal Analysis
and GenerationCorpora with Multimodal Artefacts and New Multi-
modal Input Devices
Models of MMMutual Disambiguation
Multiparty MMInteraction
2 Nov. 2001Dagstuhl SeminarFusion and Coordinationin Multimodal Interactionedited by: W. Wahlster
Multimodal Toolkit forUniversal Access
© W. Wahlster
2006
2010
Ecological Multimodal Interfaces
Research Roadmap of Multimodality 2006-2010
Empirical andData-Driven Models
of Multimodality
Advanced Methodsfor Multimodal Communication
Toolkits for Multimodal Systems
Usability EvaluationMethods for MM System Multimodal Feedback
and Grounding
Tailored and Adaptive MM Interaction
Incremental Feedback betweenModalities during Generation
Models of MMCollaboration
Parametrized Model ofMultimodal Behaviour
Demonstration of Performance Advances
through Multimodal Interaction
Real-time Localization and Motion/Eye
Tracking Technology
Multimodality in VR and AR Environments
Resource-BoundedMultimodal Interaction
User‘s Theoriesof System‘s
Multimodal Capabilities
Multicultural Adaptationof Multimodal Presentations
Affective MM Communication
Testsuitesand Benchmarks for
Multimodal Interaction
Multimodal Modelsof Engagement and Floor
Management
Non-Monotonic MMInput Interpretation
Computational Modelsof the Acquisition of MM
Communication SkillsNon-Intrusive& Invisible MMInput Sensors
Biologically-Inspired Intersensory Coordination Models
2 Nov. 2001Dagstuhl SeminarFusion and Coordinationin Multimodal Interactionedited by: W. Wahlster
© W. Wahlster
Burning Issues in Multimodal Interaction
• Multimodality: from alternate modes of interaction towards
mutual disambiguation and synergistic combinations
• Discourse Models: from information-seeking dialogs
towards argumentative dialogs and negotiations
• Domain Models: from closed world assumptions towards
the open world of web services
• Dialog Behaviour: from automata models towards a
combination of probabilistic and plan-based models
© W. Wahlster
• SmartKom is a multimodal dialog system that combines speech, gesture, and mimics input and output.
• Spontaneous speech understanding is combined with the video- based recognition of natural gestures.
• One of the major scientific goals of SmartKom is to design new computational methods for the seamless integration and mutual
disambiguation of multimodal input and output on a semantic and pragmatic level.
• SmartKom is based on the situated delegation-oriented dialog paradigm, in which the user delegates a task to a virtual
communication assistant, visualized as a life-like character on a graphical display.
Conclusions
http://smartkom.dfki.de/
URL of this Presentation: http://www.dfki.de/~wahlster/LangTech-2002
© W. Wahlster© 2002 DFKI Design by R.O.
Thank you very muchfor your attention
top related