Top Banner
Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302- 5252/4162 fax: (+49 681) 302-5341 e-mail: [email protected] WWW:http://www.dfki.de/ ~wahlster Brain and Communication Mainz Friday, 24 November 2000 Computers that read, hear and understand
23

Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

Dec 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

Prof. Wolfgang Wahlster

German Research Center for Artificial Intelligence, DFKI GmbH

Stuhlsatzenhausweg 366123 Saarbruecken, Germany

phone: (+49 681) 302-5252/4162fax: (+49 681) 302-5341e-mail: [email protected]

WWW:http://www.dfki.de/~wahlster

Brain and CommunicationMainz

Friday, 24 November 2000

Computers that read, hear and understand

Page 2: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Pervasive Speech and Language Technology

A capuccino in 10 minutes, please!

Send the following email to Mark Maybury: Hi Mark,

please forward the following agenda to your project

partners!

Let‘s go to Baker Street in Berkeley!

I would like to hear Mozart‘s piano concert No. 3!

Speech-controlled coffee machine

Speech-basedcar navigation

Speech-enabledmusic selection

Dictation

Page 3: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Show me all CNN news of the last 3 months that

feature Bill Clinton discussing health care!

I would like to make an appointment with

Dr. Kuremastu in Kyoto next week!

Pervasive Speech and Language Technology

What has Jim Hendler said about DAML during our

recent Dagstuhl seminar?

Information on demand

Audio Mining

Speech-to-SpeechTranslation

Page 4: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

What has the speakersaid?100

Alternatives

What has the speaker meant?

10Alternatives

What does the speakerwant?

Unambiguous Understanding in the

Dialog Context

Red

uct

ion

of

Un

cert

ain

tySprachanalyse

Speech Recognition

Speech Input

Discourse Context

Knowledgeabout Domainof Discourse

Grammar

LexicalMeaning

AcousticLanguage Models

Word Lists

Speech Analysis

SpeechUnder-

standing

Three Levels of Language Processing

Page 5: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Input Conditions Naturalness Adaptability Dialog Capabilities

Incr

easi

ng

Co

mp

lexi

ty

Close-SpeakingMicrophone/Headset

Push-to-talk

Telephone,Pause-basedSegmentation

Isolated Words

Read ContinuousSpeech

SpeakerIndependent

SpeakerDependent

MonologDictation

Information-seeking Dialog

Open Microphone,GSM Quality

SpontaneousSpeech

Speakeradaptive

MultipartyNegotiation

Challenges for Language Engineering

Page 6: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Wann fährt der nächsteZug nach Hamburg ab?

When does the next train to Hamburg depart?

Wo befindet sichdas nächste

Hotel?

Where is the nearest hotel?

Context-Sensitive Speech-to-Speech Translation

VerbmobilServer

Page 7: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Mobile Speech-to-Speech Translation of Spontaneous Dialogs

Verbmobil Speech Translation Server

Solution: Conference Call: The Verbmobil Speech Translation Server

is accessed by GSM mobile phones.

Page 8: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Speech-to-Speech Translation

Page 9: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

The Control Panel of Verbmobil

Page 10: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

General Speech Recognition Task

GermanGerman

EnglishEnglish

JapaneseJapanese

Audio Signal Recognizers Word Hypotheses Graph

Page 11: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Machine Learningfor the Integration of Statistical Properties into

Symbolic Models for Speech Recognition, Parsing,Dialog Processing, Translation

TranscribedSpeech Data

SegmentedSpeech

with ProsodicLabels

AnnotatedDialogs withDialog Acts

Treebanks &Predicate-ArgumentStructures

AlignedBilingualCorpora

HiddenMarkovModels

Neural Nets,MultilayeredPerceptrons

ProbabilisticAutomata

ProbabilisticGrammars

ProbabilisticTransfer

Rules

Extracting Statistical Properties from Large Corpora

Page 12: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

The Use of Prosodic Information at All Processing Stages

Speech Signal Word Hypotheses Graph

Multilingual Prosody ModuleProsodic features:durationpitchenergypause

Search SpaceRestriction

Parsing

Dialog ActSegmentation and

Recognition

Dialog Understanding

Constraints forTransfer

Translation

LexicalChoice

GenerationSpeech

Synthesis

SpeakerAdaptation

BoundaryInformationBoundary

InformationBoundary

InformationBoundary

InformationSentence

MoodSentence

MoodAccented

WordsAccented

WordsProsodic Feature

Vector

Page 13: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

I need a car next Tuesday oops MondayI need a car next Tuesday oops Monday

Original Utterance Editing Phase Repair Phase

Reparandum Hesitation Reparans

Recognition ofSubstitutions

Transformation of theWord Hypothesis Graph

I need a car next MondayI need a car next Monday

Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning

Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.

The Understanding of Spontaneous Speech Repairs

Page 14: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Wir treffen uns inMannheim, äh, in Saarbrücken.

(We are meeting in Mannheim, oops, in Saarbruecken.)

We are meetingin Saarbruecken.

English

German

Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs

Page 15: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Fielded applications

Train schedules (German Railway System, DB)

TABA (Philips)+49 241 60 40 20

OSCAR (DaimlerChrysler)+49 1805 99 66 22

Flight Schedules (Lufthansa)

ALF (Philips)+49 1803 00 00 74

Technical Challenges: phone -based dialogs, many proper names, clarificationsubdialogs

Spoken Dialogs about Schedules

Page 16: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

MicrophonePush-to-talk

Switch

Please call Doris Wahlster.

Open the left window in the back.

I want to hear the weather channel.

When will I reach the next gas station?

Where is the next parking lot?

Speech control of: cellular phone, radio, windows / AC, route guidance system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)

Linguatronic : Spoken Dialogs with Mercedes-Benz

Page 17: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

With Maier on 25 Oktober, with Tetzlaff,

and with Streit too.

Oops, not with Streit.

From 2 to 3.

Okay!

Speech-based Interaction with an Organizeron a WAP Phone (Voice In - WML out)

Page 18: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library

Mobile Dialog with a Virtual TouristGuide for the Heidelberg Castle

Location-adaptiveQuery Interpretation

Page 19: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Multimodal Route Description

Mobile Speech Translation andMultilingual Information Access

Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library

Page 20: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Speech-based Accessto 3D Virtual Views

Multimodal Output froma Digital Library and Speech-based Access to Internet Content

Augmented Reality: Combining Speech, Gestures andGraphics for Mobile Access to a Digital Library

Page 21: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Multilingualand Mobile

CommunicationAssistants

Multimodal Interfaces

SmartKom

Speech-based Web Access to Multilingual

Web pages

WAP Phones WebTV

Multilingual Audio Retrieval

and Audio Mining

Discussions Lecture Notes Organizers

MultilingualIndexing andAnnotation of

Videos

Video Archives News Archives

Call CentersECommerce Mobile Travel Assistance Telephone Translations

Verbmobil

Dialog Translation

International Research Trends in Multilingual Systems

Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,

and Speech Synthesis

Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,

and Speech Synthesis

Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding

Page 22: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

Open Problems for the Next Decade

Problems with current machine learning approaches

Expensive data collection

Cognitively unrealistic training data

Data sparseness

Problems with current hand-crafted knowledge sources

Brittleness

Domain dependence

Limited scalability

Page 23: Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681)

© Wolfgang Wahlster, DFKI

A Speculative Conclusion (+50 years)

-500 years TODAY +50 years

Oral Society Textual Society Oral Society

News and knowledge ispassed orally

No mass storageNo automatic processingNo automatic retrieval

Mass storage of textsText ProcessingText Retrieval

Mass storage of speechSpeech ProcessingAudio Retrieval

News and knowledge ispassed textually

News and knowledge ispassed orally