Top Banner
© 2002 IBM Corporation http://w3.ibm.com/ibm/presentations Natural Language Technologies May 5, 2004 | SLU for Conversational Systems Spoken Language Understanding, the Research/Industry Chasm Roberto Pieraccini IBM T.J.Watson Research Center [email protected]
30

Spoken Language Understanding, the Research/Industry Chasm

Dec 30, 2015

Download

Documents

kameko-rojas

Spoken Language Understanding, the Research/Industry Chasm. Roberto Pieraccini IBM T.J.Watson Research Center [email protected]. INDUSTRY. FSM based SLU. Call Routing. RESEARCH. Call Routing Sentence Classification Statistical Parsing. Phrase Structure ATN, Semantic Grammars - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spoken Language Understanding, the Research/Industry Chasm

© 2002 IBM Corporation

Natural Language Technologies

May 5, 2004 | SLU for Conversational Systems

Spoken Language Understanding,the Research/Industry Chasm

Roberto PieracciniIBM T.J.Watson Research [email protected]

Page 2: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

A Brief History of SLU

1970 1980 1990 2000

FSMbased SLU

Phrase StructureATN, Semantic Grammarsmainly for understanding text

Robust ParsingSpontaneous Speech

Call RoutingSentence ClassificationStatistical Parsing

RE

SE

AR

CH

IND

US

TR

Y

FSMbased SLU

Call Routing

ARPA SURDARPARESOURCEMANAGEMENT

ATIS COMMUNICATOR

VoiceXMLSRGS

Page 3: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SLU RESEARCH

Open NL understanding

Few deployed systems

Little data available

Artificial tasks

Lack of evaluation paradigm

Little funding for SLU research

COMMERCIAL SLU

Mostly directed dialog

100s of deployed systems

Lots of proprietary data

Customer driven tasks

Task completion evaluation

Revenue based on license or per-minute

Page 4: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

BASIC, 309

NOISE, 93

ENABLING, 237

APP, 28

DIALOG, 86

NL, 9

TRANS, 14

UND, 24

EuroSpeech 2003 – Paper Breakdown

BASIC: Signal Processing, Speech Modeling, Acoustics, Speech Enhancement, Prosody, Emotions, Speech Coding, Corpora, Phonetics

NOISE: Noise Robustness, Robust ASR,

ENABLING: Speech Recognition, Synthesis, Language Modeling, Speaker/Language ID, Speaker Verif.

APP: non Dialog applications

DIALOG: Dialog and Multimodal systems

NL: Summarization, Title extraction, topic detection, NE recognition, ...

TRANS: Speech to Speech Translation

UND: Spoken Language Understanding

Spoken Language Understanding Papers : 24/800 = 3%14 Academic, 10 Industrial

Page 5: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SLU is difficult

Page 6: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SLU is difficult to evaluate

End-to-end evaluation– Based on task completion measures

– Needs the full conversational system

– Needs real, motivated users

Semantic evaluation– Based on semantic annotation

– Costly

– Subjective

– Needs interpretation principles

– Highly domain/application dependent

Page 7: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluationI am flying between New York and Washington tomorrow, early in the afternoon

DEP. CITY DEP. AIRPORT ARR. CITY ARR. AIRPORT AIRLINE FLIGHT # DEP. TIME

NNYC JFK DDEN DEN CO 156 12:37

NNYC LGA DDEN DEN DL 8901 12:58

NNYC LGA DDEN DEN DL 8903 13:45

NNYC JFK DDEN DEN AA 578 13:57

NNYC LGA DDEN DEN UA 187 14:15

NNYC JFK DDEN DEN DL 987 15:27

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers

Page 8: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluationI am flying between New York and Washington tomorrow, early in the afternoon

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

Page 9: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.1 A flight "between X and Y" means a flight "from X to Y".

Page 10: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.8 The location of a departure, stop, or arrival should always be taken to be an airport.

Page 11: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.3.3 "Stopovers" will mean "stops" unless the context clearly indicates that the subject intended "stopover", as in "Can I make a two day stopover on that flight?". In that case the query is answered using the stopover column of the restrictions table.

Page 12: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.6 A "red-eye" flight is one that leaves between 9 P.M. and 3 A.M. and arrives between 5 A.M. and 12 noon.

Page 13: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

morning 0000 1200afternoon 1200 1800evening 1800 2200day 600 1800night 1800 600early morning 0000 800mid-morning 800 1000

Page 14: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

INCLUDES TERM ENDPOINTS? before T1 No after T1 No between T1 and T2 Yes arriving by T1 Yes departing by T1 Yes periods of the day Yes

Page 15: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

S. Oviatt, "Predicting spoken disfluences during human-computer interaction“

Computer Speech and Language, 1995, 9:19--35.

Do we need SLU in commercial applications?

Disfluences = Self corrections, False starts, Repetitions, Filled pauses

Disfluence rate more than doubles going from constrained to unconstrained interactions.

Disfluence rate grows linearly with length of utterance

Page 16: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

S.M. Witt, J.D. Williams " Two Studies of Open vs. Directed Dialog Strategies in

Spoken Dialog Systems,“ Proc. of EUROSPEECH 2003, Geneva, CH, September 2003

Do we need SLU in commercial applications?

Apps with one time users do poorly with open prompt systems

Apps with repeat users do with open prompts almost as well as with directed dialog.

OPEN: What would you like to do?DIRECTED: ...choose from the following options: web password reset, course enrollment, direct deposit or benefits.

Page 17: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

0

10

20

30

40

50

60

1 2 3 4 5 6 >6

Number of words/utterance

Fre

qu

ency

Human to Human – AMEX (SRI) (2082)

DARPA Communicator Data: Dec 2000 (11168)

SpeechWorks deployed applications - Directed dialog + NL (136447)

2.0 2.9 6.0

AVERAGE NUMBER OF WORDS PER SENTENCE

Do we need SLU in commercial applications?

Page 18: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Do we need SLU in commercial applications?

Yes, but it depends on the application

Page 19: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Sentence Structure

(natural-language-ness)

Simple Phrases

System Initiative

Natural Language

Mixed Initiative

Dialog

Structure

(initiative)

Natural Language

System Initiative

Simple Phrases

Mixed Initiative

Do we need SLU in commercial applications?

PIZZA ORDERING

FLIGHT STATUS

STOCK TRADING

BANKING

HELP DESK

PROBLEM SOLVING

ROUTING

Page 20: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

An architectural chasm?

Page 21: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Research Conversational Architecture

SPEECHRECOGNIZER

LanguageModel

DIALOGMANAGER

NATURALLANGUAGE

UNDERSTANDING

SemanticModel

Page 22: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Commercial Conversational Architecture

VoiceXMLBrowser

Grammar5

ApplicationServer

Grammar4Grammar

3Grammar2Grammar

1

GRAMMAR 3ASR resultGRAMMAR 2

Page 23: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Current industrial SLU$ROOT = $ITINERARY;

$ITINERARY = $FROM $TO;

$FROM = from $AIRPORT;

$TO = to $AIRPORT;

$NY = (new york) | (J F K ) | kennedy

$BOS = boston | logan

$AIRPORT = ($NY | $BOS) [airport]

direction = "";if (origin == "JFK" && destination == "BOS") {

direction = "north}elseif(origin == "BOS" && destination =="JFK") {

direction = "south";}

origin = airport;

destination = airport

airport = "JFK"

airport = "BOS";

origin = "BOS" destination = "JFK"

From Boston to New York

$BOS $NY

$AIRPORT $AIRPORT

$FROM $TO

$ITINERARY

$ROOT

direction = "south"

airport = "JFK"airport = "BOS";

Page 24: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SGRS Standard for grammars<?xml version='1.0' encoding='ISO-8859-1'?><grammar version='1.0' xml:lang='en-us' root="ROOT"> <rule id="ROOT" scope="public"> <ruleref uri="#ITINERARY" tag=" direction = ""; if (ITINERARY.origin == "JFK" && ITINERARY.destination == "BOS") { direction = "north; } elseif(ITINERARY.origin == "BOS" && ITINERARY.destination =="JFK") { direction = "south";}"/> </rule>

<rule id="ITINERARY" scope="public"> <ruleref uri="#FROM" tag="origin = FROM.airport;"/> <ruleref uri="#TO" tag="destination = TO.airport; "/> </rule>

<rule id="FROM"> <item>from</item> <ruleref uri="#AIRPORT" tag="airport=AIRPORT.airport;"/> </rule>

<rule id="NY"> <one-of tag="airport="JFK"/> <item>new york</item> <item>JFK</item> <item>kennedy</item> </one-of>

Page 25: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Difficult problems for commercial systems

No data for training in the design/development phaseSystem development with no data

Tools for fast grammar handcrafting

Tools for content word normalization/speech-ification

Oodles of data after deploymentTools for automatic or semi-automatic adaptation/learning

Page 26: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The problem of content words

I need to go to Phoenix from New York leaving on February 4th

Sentence structure

Content

Word

Variations

I need to go from New York to Phoenix on February 4th

On February 4th leaving from New York and going to Phoenix

Newark

Boston

Denver

Dallas

Baltimore

San Francisco

Los Angeles

Philadelphia

Page 27: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The problem of content words

Large lists of content words need to have priorsHow to estimate priors with no data (or even if you have data?)

e.g. airport names, flight numbers, street names

Large lists of content words often come from proprietary databases

Spelling to Phonemes

Acronym expansion

Word normalization

14" display w/ anti-glr scrn

Synonym/paraphrases generation

A fourteen inches display with anti-glare screen

A display of fourteen inches size with an anti-reflection screen

Page 28: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Exploiting real data

Flight

679

Area Code …

Area Code 3

Area Code 2

Area Code 2

Area Code 1

Time

Time

10 AM

Time

9 AM

Time

8 AM

Time

7 AMDay

Day

Thu

Day

Wed

Day

Tue

Day

Mon

Code) Area|NumberFlight (P

)Time|NumberFlight (P

)Code Area|NumberFlight (P)Day|NumberFlight (P

Flight Number Identification

75.00%

80.00%

85.00%

90.00%

Dev Set Eval Set

N-Best Result

Pure BN result

Global Result

Wai, C., Pieraccini, R., Meng, H., “A Dynamic Semantic Model for Rescoring Recognition Hypothesis,” Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2001

TRAINING: 2.8 M utterancesTEST: 1485 utterances

Page 29: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Conclusions

There is very little research in SLU todaylack of data, funding, motivation

SLU is difficult and difficult to evaluatesemantic vs. task completion

Certain speech based applications do not need SLU, other do.be aware of competing technologies, even if they are not so advanced

There are difficult problems in commercial SLU that are not addressed by the research community.

realignment of academic and industrial research

Page 30: Spoken Language Understanding, the Research/Industry Chasm

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Advertising Campaign for SLU on Google