German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

German Research Center for Artificial IntelligenceDFKI GmbH

Stuhlsatzenhausweg 366123 Saarbruecken, Germany

phone: (+49 681) 302-5252/4162fax: (+49 681) 302-5341e-mail: wahlster@dfki.de

WWW:http://www.dfki.de/~wahlster

Wolfgang Wahlster

Language Technologies for the Mobile Internet Era

Multimodal Interfaces to 3G Mobile Services

Market studies (May 2002) predict:

Cumulative revenues of almost 1 trillion € from launch until 2010

Multimodal UMTS Systems

Non-voice service revenues will dominate

voice revenues by year 3 and comprise 66% of 3G service revenues

by 2010

Non-voice service revenues will dominate

voice revenues by year 3 and comprise 66% of 3G service revenues

by 2010

322 billion € in revenues in 2010

In 2010 the average 3G subscriber will spend about 30 € per month

on 3G data services

In 2010 the average 3G subscriber will spend about 30 € per month

on 3G data services

Multimodal UMTS Systems

Intelligent Interaction with Mobile Internet Services

Access to web content and web services anywhere

and anytime

Access to corporate networks

and virtual private networks from any device

Access to edutainment and infotainment servicesAccess to edutainment

and infotainment services

Access to all messages (voice, email, multimedia, MMS)

from any single device

Access to all messages (voice, email, multimedia, MMS)

from any single device

PersonalizationPersonalization

LocalizationLocalization

Mobile Messaging Services Evolution: From SMS to MMS

Infrastructure

Customer Expectation Applications

Standard Phones

Ubiquity

Youth Focus

Limited Enhancement

MMS Relay and Servers

IP/MPLS Protocols

EMS Phones

MMS Phones

Integrated Image

Capture

Smart Phones

Pictures Audio

Multimedia

VideoEnhanced Text

Personalized Services

Location-based Services

Emotional Experience

Enhanced Message Creation

Terminals

Language Technologies for MMS: - Speech Synthesis (with Affect) - Multimodal Authoring Interface - Speech-based Retrieval of Media Objects

From Spoken Dialogue to Multimodal Dialogue

SmartKom

Third Generation UMTS Phone

Speech, Graphics and Gesture

Verbmobil

Today‘s Cell Phone

Speech only

Spoken Dialogue

Graphical Userinterfaces

GesturalInteraction

MultimodalInteraction

Merging Various User Interface Paradigms

Facial Expressions

Haptic Input

System

InputChannels

OutputChannels

Storage

HD DriveDVD

visual

tactile

auditory

haptic

MEDIA (physical information carriers)

MODALITIES(human senses)

language graphics gesture

CODE (systems of symbols)

mimics

Using All Human Senses for Intuitive Interaction: Code, Media and Modalities

Symbolic and Subsymbolic Fusion of Multiple Modes

SpeechRecognition

GestureRecognition

ProsodyRecognition

Facial ExpressionRecognition

LipReading

SubsymbolicFusion

- Neuronal Networks- Hidden Markov Models

SymbolicFusion

- Graph Unification - Bayesian Networks

Reference Resolution and Disambiguation

Semantic Representation

Mutual Disambiguation of Multiple Input Modes

The combination of speech and vision analysis increases the robustness and understanding capabilities of multimodal user interfaces.

Speech Recognition + Lip Reading

increases robustness in noisy environments

Speech Recognition + Gesture Recognition (XTRA, SmartKom)

referential disambiguation andfocus control

Speech Recognition + Facial Expression Recognition (SmartKom)

recognition of irony, sarcasm and scope disambiguation

SmartKom-Public: A Multimodal

Communication Kiosk

SmartKom-Mobile: A Handheld

CommunicationAssistant

SmartKom: A Transportable Interface Agent

MediaAnalysis

Kernel ofSmartKomInterface

Interaction Management

ApplicationManage-

MediaDesign

SmartKom-Home/Office: Multimodal Portal to Information Services

SmartKom`s SDDP Interaction Metaphor

SDDP = Situated Delegation-oriented Dialogue Paradigm

specifies goal delegates task

cooperate on problems

asks questions presents results

Service 1 Service 1

Service 2Service 2

Service 3Service 3

Webservices

PersonalizedInteraction Agent

See: Wahlster et al. 2001 , Eurospeech

Multimodal Input and Output in the SmartKom System

Where wouldyou like to

I‘d like toreserve ticketsfor this movie. Where would

you like to sit?

I‘d likethese two

seats.

Multimodal Interaction with a Life-like Character

User Input: Speech and Gesture

Smartakus Output:Speech, Gesture andFacial Expressions

User Input: Speech and Gesture

Using Facial Expression Recognition forAffective Personalization

(1) Smartakus: Here you see the CNN program for tonight.

(2) User: That’s great.

(3) Smartakus: I’ll show you the program of another channel for tonight.

(2’) User: That’s great.

(3’) Smartakus: Which of these features do you want to see?

Processing ironic or sarcastic comments

SmartKom: Intuitive Multimodal Interaction

MediaInterfaceEuropean Media LabUinv. Of

MunichUniv. ofStuttgart

Saarbrücken

Aachen

Dresden Berkeley

Stuttgart

MunichUniv. ofErlangen

Heidelberg

Main Contractor

DFKISaarbrücken

The SmartKom Consortium:

Project Budget: € 25.5 millionProject Duration: 4 years (September 1999 – September 2003)

The SmartKom Demonstrator System

Camera for Gestural Input

Microphone

Multimodal Control of TV-Set

Multimodal Control of VCR/DVD Player

A Demonstration of SmartKom’s MultimodalInterface for the German President Dr. Rau

• Seamless integration and mutual disambiguation of multimodal input and output on semantic and pragmatic levels

• Situated understanding of possibly imprecise, ambiguous, or incom-plete multimodal input

• Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models

• Adaptive generation of coordinated, cohesive and coherent multimodal presentations

• Semi- or fully automatic completion of user-delegated tasks through the integration of information services

• Intuitive personification of the system through a presentation agent

Salient Characteristics of SmartKom

Multimodal Input and Output in SmartKomFusion and Fission of Multiple Modalities

Input by the User

Output by the Presentation agent

Speech

Gesture

FacialExpressions

Which feature films are shown

tonight on TV?

Combination of Speech and Gesture in SmartKom

I show you a survey

of tonight's TV films.

I can't find anything interesting.

Then I'llgo to the movies.

Here you see a programme listing of the movies shown in Heidelberg today.

This one I would like to see. Where is it shown?

On this map all movie theatres

are highlighted, that areshowing "A Little Christmas Story".

Multimodal Input and Output in SmartKom

There I would like to get

a reservation.

In this movie theatre a reservation

is not possible.

Then let's check another theatre.

What about this one?

This overview lists all show times for the movie "A Little Christmas Story"

in the movie theatre "Castle".

Here I would like

to get a reservation.

Please show me where you would like

to be seated.

I would like to get

two seats here.Is this okay? Sure.

I have reserved the seats.

Your confirmation number is 635.

You can pick up the tickets

till half an hour before

the show at the

ticket box.

Okay. Thank you. Good Bye.Good

Personalized Interaction with WebTVs via SmartKom (DFKI with Sony, Philips, Siemens)

User: Switch on the TV.

Smartakus: Okay, the TV is on.

User: Which channels are presenting the latest news right now?

Smartakus: CNN and NTV are presenting news.

User: Please record this news channel on a videotape.

Smartakus: Okay, the VCR is now recording the selected program.

Example: Multimodal Access to Electronic Program Guides for TV

e.g. 60 x 90 pixel b/we.g. 1024 * 768 pixel 24-bit color

The Need for Personalization: Adaptive Interaction with Mobile Devices

PEACH: „Beaming“ A Life-Like Character FromA Large Public Display to a Mobile Personal Device

PEACH: Personalized Edutainment in Museums (IRST – DFKI)

A “Web of Meaning“ has more Personalization Potential than a “Web of Links“

Three Layers of Webpage Annotations

cf.: Dieter Fensel, James Hendler, Henry Liebermann, Wolfgang Wahlster (eds.)Spinning the Semantic Web, MIT Press, November 2002

PersonalizationPotential

OWLDAML + OIL

Content high

Structure XML medium

Layout HTML low

Personalization

Mapping Web Content Onto a Variety of Structures and Layouts

From the “one-size fits-all“ approach of static webpages to the “perfect personal fit“ approach of adaptive webpages

StructureXML1 XML2 XMLn

ContentOWL

LayoutHTML11 HTML1m HTML21 HTML2o HTML31 HTML3p

SmartKom: Towards Multimodal and Mobile Dialogue Systems for Indoor and Outdoor Navigation

Seamless Integration of Various Positioning Technologies

• GSM/UMTS cells

• GPS

• Infrared

• Wavelan, Bluetooth

Using the same device for driving and walking directions

Speech and Gesture Input Graphics and Speech Output

Spoken Dialogues with the Car Navigation System: SENECA

Product Announcement for E-Class Mercedes: End of 2002

Getting Driving and Walking Directions via SmartKom

User: I want to drive to Heidelberg.

Smartakus: Do you want to take the fastest or the shortest route?

User: The fastest.

Smartakus: Here you see a map with your route from Saarbrücken to Heidelberg.

SmartKom can be used for Multimodal Navigation Dialogues in a Car

Getting Driving and Walking Directions via SmartKom

Smartakus: You are now in Heidelberg. Here is a sightseeing map of Heidelberg.

User: I would like to know more about this church!

Smartakus: Here is some information about the St. Peter's Church.

User: Could you please give me walking directions to this church?

Smartakus: In this map, I have high-lighted your walking route.

SmartKom: Multimodal Dialogues with a Hybrid Navigation System

SmartKom, please look forthe nearest parking lot.

The parking garageat the main station provides

300 slots. Opening hours are from 6 am to 11 pm.

Do you want to get there?

The parking garageat the main station provides

300 slots. Opening hours are from 6 am to 11 pm.

Do you want to get there?

Spoken Navigation Dialogues with SmartKom

No, please tell me aboutthe next parking option. No, please tell me aboutthe next parking option.

The Market parking lotprovides 150 slots. It is opened

24 hours a day. Do you want to get there?

The Market parking lotprovides 150 slots. It is opened

24 hours a day. Do you want to get there?Yes, pleaseYes, please

I‘ll bring you to the Marketparking lot.

The High-Level Control Flow of SmartKom

Embedded Speech Understanding

Content Access (eg. Map Updates)

Webservices

Distributed Speech Understanding

Aurora Speech Features

Speech Understanding System With Feature Interface

Remote Speech Understanding

Java-based Voice Streaming Speech

UnderstandingSystem

A Spectrum of Client/Server Architectures for Mobile Multimodal Systems: From Thin to Fat Clients

M3I: A Mobile, Multimodal, and Modular Interface of DFKI

IBM Embedded Via Voice

iPAQ JORNADA

C++ Embedded Java

Java-based Voice Streaming

SmartKom‘s Multimodal Dialogue Engine

1. Hybrid Speech Understanding =

Embedded + Remote/Distributed Speech Understanding

Small Vocabulary

Large Vocabulary (Topic Detection)

2. Resource-Adaptive Speech Processing:

Availability of a Server Improves the Coverage and Quality

Example of Embedded Multimodal DialogueSystem M3I for Pedestrian Navigation (DFKI)

Spoken and Gestural Input combined with graphics and speech output on an iPAQ

Java-Based Voice Streaming for HybridSpeech Understanding in M3I (DFKI)

SmartKom sends a note to the user or activates an alarm as soon as the user approaches an exhibit that matches the specification of an an item on the ActiveList.

ActiveList‘s spatial alarm can be combined with:

- route planning and navigation

- temporal and spatial optimization

of a visit

SmartKom‘s Added-Value Mobile Service ActiveList

Please let me know, when I pass a shop selling batteries.

SmartKom‘s Added-Value Mobile Service SpotInspector

What‘s going on at the castle right now?

SmartKom allows the user to have remote visual access to various interesting spots via a selection of webcams – showing current waiting queues, special events and activities.

SpotInspector can be combined with:

- multimedia presentations of the

expected program for these spots

- route planning and navigation

to these spots

SmartKom‘s Added-Value Mobile Service PartnerRadar

Where are Lisa und Tom ?What are they looking at?

SmartKom helps to locate and to bring together members of the same party.

Involved Technologies

- Navigation and tour instructions

- Monitoring of group activity

- Additional information on exhibits

that are interesting for the whole

party.

Reflectors Photo Detector

Speaker

Command Button

Microphone

Fingerprint Recognizer

Ultimate Simplicity: One-Button Mobile Devices

8hertz technologies Germany

CARC Cyber Assist Research Center Japan

UMTS-Doit: The First Test and Evaluation Center for UMTS-based Multimodal Speech Services in Germany

Mobile Network

Internet Content Provider

Gigastream UMTS Navigation Switch

E1/ATM

Munich

Node B at DFKI Saarbrücken PSTN, Telephone System

UMTS-Doit Server

Cooperation between and

UMTS Applications in a Mercedes: WebcamProviding a Look-Ahead of the Traffic Situation

Embassi: Multimodal Music Selection in a Car

UMTS Application in a Mercedes: Language-based Music Download

DFKI Spin-off: Natural Language Music Search

MP3 music filesfrom the Web

Rist & Herzog forBlaupunkt

Personalized Car Entertainment (DFKI for Bosch)

Empirical andData-Driven Models

of Multimodality

Advanced Methodsfor Multimodal Communication

Computational Models

of Multimodality

Adequate Corpora for MM Research

Mobile, Human-Centered, andIntelligent Multimodal Interfaces

MultimodalInterface Toolkit

Research Roadmap of Multimodality 2002-2005

XML-EncodedMM Human-Human and

Human-Machine Corpora

Mobile MultimodalInteraction Tools

Standards for the Annotation of MM Training Corpora

Examples of Added-Value of Multimodality

MultimodalBarge-In

Markup Languagesfor Multimodal Dialogue

Semantics

Models for Effective andTrustworthy MM HCI

Collection of Hardest and MostFrequent/Relevant Phenomena

Task- , Situation- and User- Aware

Multimodal Interaction

Plug- and Play Infrastructure

Toolkits for Multimodal Systems

Situated and Task-Specific MM Corpora

Common Representation ofMultimodal Content

Decision-theoretic, Symbolic and Hybrid Modules for MM Input Fusion

Reusable Componentsfor Multimodal Analysis

and GenerationCorpora with Multimodal Artefacts and New Multi-

modal Input Devices

Models of MMMutual Disambiguation

Multiparty MMInteraction

2 Nov. 2001Dagstuhl SeminarFusion and Coordinationin Multimodal Interactionedited by: W. Wahlster

Multimodal Toolkit forUniversal Access

Ecological Multimodal Interfaces

Research Roadmap of Multimodality 2006-2010

Empirical andData-Driven Models

of Multimodality

Advanced Methodsfor Multimodal Communication

Toolkits for Multimodal Systems

Usability EvaluationMethods for MM System Multimodal Feedback

and Grounding

Tailored and Adaptive MM Interaction

Incremental Feedback betweenModalities during Generation

Models of MMCollaboration

Parametrized Model ofMultimodal Behaviour

Demonstration of Performance Advances

through Multimodal Interaction

Real-time Localization and Motion/Eye

Tracking Technology

Multimodality in VR and AR Environments

Resource-BoundedMultimodal Interaction

User‘s Theoriesof System‘s

Multimodal Capabilities

Multicultural Adaptationof Multimodal Presentations

Affective MM Communication

Testsuitesand Benchmarks for

Multimodal Interaction

Multimodal Modelsof Engagement and Floor

Management

Non-Monotonic MMInput Interpretation

Computational Modelsof the Acquisition of MM

Communication SkillsNon-Intrusive& Invisible MMInput Sensors

Biologically-Inspired Intersensory Coordination Models

2 Nov. 2001Dagstuhl SeminarFusion and Coordinationin Multimodal Interactionedited by: W. Wahlster

Burning Issues in Multimodal Interaction

• Multimodality: from alternate modes of interaction towards

mutual disambiguation and synergistic combinations

• Discourse Models: from information-seeking dialogs

towards argumentative dialogs and negotiations

• Domain Models: from closed world assumptions towards

the open world of web services

• Dialog Behaviour: from automata models towards a

combination of probabilistic and plan-based models

• SmartKom is a multimodal dialog system that combines speech, gesture, and mimics input and output.

• Spontaneous speech understanding is combined with the video- based recognition of natural gestures.

• One of the major scientific goals of SmartKom is to design new computational methods for the seamless integration and mutual

disambiguation of multimodal input and output on a semantic and pragmatic level.

• SmartKom is based on the situated delegation-oriented dialog paradigm, in which the user delegates a task to a virtual

communication assistant, visualized as a life-like character on a graphical display.

Conclusions

http://smartkom.dfki.de/

URL of this Presentation: http://www.dfki.de/~wahlster/LangTech-2002

Thank you very muchfor your attention

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

wahlster multimodal

gesture slide

wahlster multimodal

wahlster smartkompublic

information services

multimodal dialogue

wahlster symbolic

wahlster id

Documents

German Research Center for Artificial Intelligence DFKI GmbH...

Relations in Anatomy and Image Ontologies Dirk Marwede...

Markerless Motion Capture of Man-Machine Interaction ·...

FR Organische Chemie, Universität des Saarlandes, 66123...

66123 Saarbrücken Franziska...

Business Opportunities - saarbruecken

A randomized linear time algorithm for graph spanners...

Some comments on Granularity Scale & Collectivity by Rector....

Pressemitteilung 03 2018 - Sparkasse Saarbruecken ·...

Housing Office - Studentenwerk€¦ · Gästehaus Weller...

A Non-intrusive, Wavelet-based Approach To Detecting Network...

Wolfgang Wahlster Der Weg zum sprachverstehenden Computer...

Rethinking Internet Bulk Data Transfers Krishna P. Gummadi.....

Stadt Lebach ERLÄUTERUNGSBERICHT II. Änderung ... ·...

Unsinnige steuerliche Belastung der Wärmepumpen Dr. Gerhard...

© W. Wahlster, DFKI Deutsches Forschungszentrum für...