Top Banner
1 Spoken Conversational Agents Prof. Giuseppe Riccardi Department of Information Engineering and Computer Science University of Trento [email protected]
44

Spoken Conversational Agents - Marco Ronchetti

Feb 14, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spoken Conversational Agents - Marco Ronchetti

1

Spoken Conversational Agents

Prof. Giuseppe Riccardi Department of Information Engineering and Computer Science

University of Trento [email protected]

Page 2: Spoken Conversational Agents - Marco Ronchetti

2

Spoken Conversational Agents

Prof. Giuseppe Riccardi Department of Information Engineering and Computer Science

University of Trento [email protected]

Page 3: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

n Talking to Computers n Research and Technological challenges n Spoken Language Understanding n Speech Technology on Smartphones n Speech Demo App

3

Outline

Page 4: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

4

Talking to Machines

Page 5: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

5

The AI dream

n Design computers that are able to n  Parse human language ( speech, text…) n  Understand users’ intentions n  Execute simple/complex task n  Interact autonomously or cooperatively n  Relate to humans socially, emotionally n  Operate in a virtual and physical space

Page 6: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

6

Talking to Computers

1968

Page 7: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, G. Riccardi

7

Vision (“2001: a Space Odissey”, 1968 by Stanley Kubrick)

HAL 9000: "Heuristically programmed Algorithmic”

Page 8: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

8

Present

2006

Page 9: Spoken Conversational Agents - Marco Ronchetti

Interactive Systems

9

Uh hi, I need a flight tomorrow from Boston..

Machine Customer

How may I help you?

Spring 2012, LPSMT, G. Riccardi

Page 10: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

“892424” Application (2006)

Pronto PagineGialle 892424 10

Page 11: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

Travel Directions

Departure

Arrival

11

Page 12: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

Retrieving directions from the Web

http://maps.google.it/maps? f=d&hl=it&saddr=".$partenza."&daddr=".$arrivo."&sll=41.895888,12.489052

&sspn=14.60869,29.882813&layer=&ie=UTF8

onclick=\"this.blur()\"\u003e27\u003c/a\u003e.\u003c/td\u003e\u003ctd_class=\"dirsegtext\"id=\"dirsegtext_0_223\"\u003eSvolta_a\u003cb\u003esinistra\u003c/b\u003ea\u003cb\u003ePiazza_Raffaello_Sanzio\u003c/b\u003e\u003c/td\u003e\u003ctdclass=\"sdist\"\u003e\u003cdivid=\"sxdist\"class=\"nw\"\u003e54\u0026#160;m\u003c/div\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctrclass=\"dirsegment\"id=\"panel_0_225\"polypoint=\"225\"\u003e\u003ctdclass=\"iconpw\"\u003e\u003cimgsrc=\"/

12

Page 13: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

13

Human-Machine Spoken Dialog User voice request

Automatic Speech Recognition

DM Action

Dialogue Management

SLU

Concepts

Spoken Language Understanding

LG

Words

Language Generation

ASR

Words

TTS

Text-to-Speech Synthesis

Voice reply to User

Page 14: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 14

Automatic Speech Recognition (ASR)

n  An ASR system converts the speech signal into words n  Rich Transcript (Words, Speaker, Language, Dialect,

etc..) n  The recognized words can be

n  The final output, or n  The input to natural language processing

Automatic Speech

Recognition

“yes I would like to make a reservation.. “

Page 15: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 15

ASR Challenges

n  Transducers n  Telephone handset n  Close Talking mic n  Open/Desktop Mic

n  Channel n  Landline n  Wireless n  VoIP

Automatic Speech

Recognition

“yes I would like to make a reservation.. “

n  Speaker n  Dialect n  Accent n  Children

n  Noise (SNR) n  Office Room n  Airplane Cockpit n  Cocktail Party

Problem

Page 16: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 16

ASR - Overview

Feature Extraction A

Pattern Matching Ŵ

Given the acoustic observation sequence A=a1,a2,…,am, what is the most likely “word” sequence W=w1,w2,…,wn?

L EH T AH S P R EY Let us pray Lettuce spray

Page 17: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 17

Sound Units (American English)

Phoneme

Vowels Dipthongs Semivowels Consonants

Front (EVE)

Mid (UP)

Back (BOOT) Liquids Glides Nasals Stops Fricatives Whisper Affricates

Page 18: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 18

Dictionary From phoneme to words

Let = L EH T Us = AH S Pray = P R EY Lettuce = L EH T AH S Spray = S P R EY n Not a unique mapping! n  Prosodic Information

n  Us = AH1 S vs Us = AH0 S

Stress Markers

Page 19: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 19

A Statistical Approach to ASR

n  (Almost) no linguistic knowledge is required n Variable “recognition units” are supported

n  Phoneme, syllable, word, phrase n  Non-linguistically motivated

n Automatic training of statistical models

Page 20: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 20

•  Probability of word sequences. •  W= “I wanna fly to Boston”

•  Shannon’s game

Language Modeling

)to|Boston(...)I|wanna()I()tofly,wanna,I,|Boston(...)I|wanna()I()(

PPPPPPWP

×××=

×××=

Page 21: Spoken Conversational Agents - Marco Ronchetti

Spring 2007 21

Syntax

n  Relation amongst words n  Not random! (frequency-rank plotà Zipf’s law)

n  Word Classes n  Verb, Noun, Determiner

n  Words grouped into constituents n  “Al mattino faccio una passeggiata” vs “Faccio una passeggiata

al mattino” vs “Faccio al mattino una passeggiata”

n  Constituents n  Verb Phrase (VP), Noun Phrase (NP), Prepositional

Phrase (PP). n  PP: “Al mattino”. VP: “faccio”. NP: “una passeggiata”.

Page 22: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 22

Languages: how many?

Page 23: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 23

Languages-Speakers Statistics

From www.ethnologue.com (5/2012)

Page 24: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 24

Search Problem n Search Space of ASR n Vocabulary = 104

n Sentence length = 25 (Wall Street Journal) n Number of candidate word strings

n  10100 ! n Weighted strings

n Acoustic and Language models n Dynamic Programming n Beam search

Page 25: Spoken Conversational Agents - Marco Ronchetti

Spring 2010 25

ASR Performance (Word Error Rate)

Page 26: Spoken Conversational Agents - Marco Ronchetti

Understanding Natural Language Natural Language Query to DB

“Find the best flight from New York to Paris tomorrow business class”�

Flight Database

?

Spring 2012, LPSMT, G. Riccardi

26

Page 27: Spoken Conversational Agents - Marco Ronchetti

Natural Language Query to DB

“Find the best flight from New York to Paris tomorrow business class”�

Flight Database

(JFK, CDG,Z,1300,0700,V,..) ?

Spring 2012, LPSMT, G. Riccardi

27

Page 28: Spoken Conversational Agents - Marco Ronchetti

Natural Language Query to DB

“Find the best flight from New York to Paris tomorrow business class”�

Interactive Machine

Flight Database Spring 2012, LPSMT, G. Riccardi

28

Page 29: Spoken Conversational Agents - Marco Ronchetti

Natural Language Query to DB

Find the best flight from New York to Paris tomorrow business class �

Spring 2012, LPSMT, G. Riccardi

29

Page 30: Spoken Conversational Agents - Marco Ronchetti

Natural Language Query to DB

Find the best flight from New York to Paris tomorrow business class �

Object

Spring 2012, LPSMT, G. Riccardi

30

Page 31: Spoken Conversational Agents - Marco Ronchetti

Natural Language Query to DB

Find the best flight from New York to Paris tomorrow business class �

TASK: Transactional

Object

Spring 2012, LPSMT, G. Riccardi

31

Page 32: Spoken Conversational Agents - Marco Ronchetti

Natural Language Query to DB

Find the best flight from New York to Paris tomorrow business class �

TASK: Transactional

USER CONSTRAINTS

Object

Spring 2012, LPSMT, G. Riccardi

32

Page 33: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi 33

Machine Understanding n  User:

“Find the best flight from New York to Paris tomorrow business class”

n  Speech Recognition: “ Find the bass flight from Newark to Paris

tomorrow business class”

n  Speech Understanding: n  @action=Request-Reservation (0.9) n  @origin=Newark (0.5) n  @time-departure=Tuesday (0.7) n  @destination=Paris (0.8)

Page 34: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi 34

Handling Uncertainty

n How likely is that it was a “Request about a Reservation” given speech utterance U ? n  @action=Request-Reservation (0.9) n  P( @action=Request-Reservation | U ) = 0.9

n How likely is that the “Departure city is Newark “? n  @origin=Newark (0.5)

n  And so on: n  @time-departure=Tuesday (0.7) n  @destination=Paris (0.8)

Page 35: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

35

Present

2011

Page 36: Spoken Conversational Agents - Marco Ronchetti

Machine Handling Uncertainty IBM Watson in Jeopardy! Game

D. Ferrucci et al. “Building Watson: An Overview of the DeepQA Project”, AI Magazine, v. 31, n. 3, 2010

Page 37: Spoken Conversational Agents - Marco Ronchetti

Conversational Agents

Spring 2012, LPSMT, G. Riccardi

37 Dinarelli M.,Stepanov E.,Varges S. and Riccardi G. “The LUNA Spoken Dialogue System: Beyond Utterance Classification”. ICASSP, 2010.

Page 38: Spoken Conversational Agents - Marco Ronchetti

38

Voice Applications Smartphones/Tablets

n Natural interface replacing typing n  For task execution (“finding the telephone numbers of restaurant nearby”)

n Voice Search n  Spoken query translation into text for search n  Intent Recognition triggered by target words/

phrases (“weather”, “restaurants”)

Spring 2012, LPSMT, G. Riccardi

Page 39: Spoken Conversational Agents - Marco Ronchetti

39

Conversational Agents

n Multimodal agent n  Speech, touch, sensorial information.. n  Require interaction to resolve/negotiate/committ

to the task requirements n  Task resolution/execution ( “send email to my brother”, “make reservation for best indian restaurant nearby”)

Spring 2012, LPSMT, G. Riccardi

Page 40: Spoken Conversational Agents - Marco Ronchetti

40

Example 1 (Voice Search) Android

Spring 2012, LPSMT, G. Riccardi

Page 41: Spoken Conversational Agents - Marco Ronchetti

41

Example 2 (Agent-like app) Android

Spring 2012, LPSMT, G. Riccardi

Page 42: Spoken Conversational Agents - Marco Ronchetti

42

Speech Technology on Android android.speech, android.speech.tts

n Automatic Speech Recognition n API Level 3 (Honeycombe)

n  N-Best, Two types of LM, Error tags n API 8 (Froyo)

n  Language Support, Partial Results, Speech Timeouts

n API 14 (Ice Cream Sandwich) n  Confidence Scores

Spring 2012, LPSMT, G. Riccardi

Page 43: Spoken Conversational Agents - Marco Ronchetti

43

Speech Demo App

n Available on Course website n Starter App

n  Experiments with different ASR parameters and ASR output

n Use as base for developing your class/master project.

Spring 2012, LPSMT, G. Riccardi

Page 44: Spoken Conversational Agents - Marco Ronchetti

Spring 2012, LPSMT, G. Riccardi

44

References (Research) 1.  Tutorial, “Talking to Computers: From Speech Sounds to

Human Computer Interaction”, sisl.disi.unitn.it/~riccardi/ 2.  Ed. by G. Tur and R. De Mori, Spoken Language

Understanding: Systems for Extracting Semantic Information from Speech, Wiley, 2011

3.  D. Jurafsky and J. Martin, Speech and Language Processing,