Page 1
C
M
I
SPOKEN LANGUAGE SYSTEMS
Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of Technology
SPEECH GROUP
Machine Intelligence LaboratoryInformation Engineering Division
Cambridge University
SCILL: Spoken Conversational Interaction for Language Learning
Stephanie Seneff ([email protected] ) Jim Glass ([email protected] )
Spoken Language Systems GroupMIT Computer Science and Artificial Intelligence Lab
Steve Young ([email protected] )Speech Group
CUED Machine Intelligence Lab
Page 2
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
SpeechRecognition
SpeechRecognition
Language Understanding
Language Understanding
ContextResolution
ContextResolution
DialogueManagement
DialogueManagement
LanguageGenerationLanguage
Generation
SpeechSynthesisSpeech
Synthesis
AudioAudio DatabaseDatabase
Conversational Interfaces
Page 3
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Hub
GalaxyArchitecture
LanguageGenerationLanguage
Generation
SpeechRecognition
SpeechRecognition
Language Understanding
Language Understanding
ContextResolution
ContextResolution
DatabaseDatabase
DialogueManagement
DialogueManagement
SpeechSynthesisSpeech
Synthesis
AudioAudio
Conversational Interfaces
Page 4
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Bilingual Weather Domain: Video Clip
Page 5
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Computer Aids through Conversational Interaction
• Language teachers have limited time to interact with students in dialogue exchanges
• Computers provide non-threatening environment in which to practice communicating
• Three-phase interaction framework is envisioned:– Preparation: practice phrases, simulated dialogues
– Conversational Interaction
* Telephone conversation with graphical support
* Seamless translation aid
– Assessment
* Review dialog interaction
* Feedback and fluency scores
Page 6
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
SCILL: A Spoken Computer Interface for Language Learning
Speaks only target language.
Has access to information sources.
Can provide translations for both user queries and system responses
Domain Expert
Tutor
Conversational systems for interactive environment for language learning
MIT SLS
Bilingual Conversational
Dialogue Systems
CU Speech Group
Speech Recognition and
Pronunciation Scoring
Page 7
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Technology Requirements
• Robust recognition and understanding of foreign-accented speech– If recognition is too poor, student may become frustrated
– Customize vocabulary and linguistic constructs to lesson plans
• High quality cross-lingual language generation
• Natural and fluent speech synthesis
• Ability to automatically generate simulated dialogues– System should be able to generate multiple dialogues based on
a given lesson topic on the fly
– Allows the student to see example sentence constructs for a particular lesson
• Ability to reconfigure quickly and easily to new lessons
• Automatic scoring for fluency, pronunciation, tone quality, use of vocabulary, etc.
Page 8
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
USER INTERFACE
SCILL System Overview
WEBSERVER
Page 9
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Bilingual Spoken Dialogue Interaction: Current Status
• Initial version of end-to-end system is in place for the weather domain– Rain, snow, wind, temperature, warnings (e.g., tornado), etc.
• MIT Recognizer supports both English and Mandarin– Seamless language switching
• English queries are translated into Mandarin
• Mandarin queries are answered in Mandarin– User can ask for a translation into English of the response at
any time
• Currently using off-the-shelf Mandarin synthesizer from ITRI– Plan to develop high quality domain-dependent Mandarin
synthesis using our Envoice tools
• System can be configured as telephone-only or as telephone augmented with a Web-based GUI interface
Page 10
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Bilingual Recognizer Construction
English corpus
Chinese corpus
Generate
English Recognizer Language Model
Chinese Recognizer Language Model
Automatically induce language model for both English and Mandarin recognizers using NL grammar
Create Mandarin corpus by automatically translating existing English corpus
RecognizerEnglishNetwork
ChineseNetwork
Parse Interlingua
Two recognizers compete in common search space
Page 11
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
HTK Mandarin Speech Recognizer
Except:
• Standard PLP front-end augmented with F0+derivatives (F0 added after HLDA transformation)
• 46 phone acoustic model set with long final phones split eg uang -> ua ng
• Questions about tone added to decision tree context clustering
Standard HTK LVCSR Setup:
• PLP Front-end with 1st/2nd/3rd Derivatives transformed using HLDA
• 3 state cross-word hidden Markov models
• Decision tree clustered context dependent triphones
• N-gram language model smoothed with class-based language model
Page 12
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
HMM-Based Pronunciation Scoring
Basic approach:
• estimate posterior probabilities (ie confidence score) of each phone or syllable given acoustics
• map confidence scores to good/bad decision using data labelled by experts
'
)'|()|(
)|(
p
pAPpAP
ApP sh ih d ax
. . .
A simple approximation
Relates confidence scores to human perception
P(p | A)
BadGood
Good
Bad
Expert Rankings
Page 13
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
NLG
Synthesis
NLU
Recognition
Multilingual Translation Framework
Common meaning representation: semantic frame
SemanticFrame
ParsingRules
GenerationRules
Models
SpeechCorpora
EnglishChineseSpanishJapanese
EnglishChineseSpanishJapanese
Page 14
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
English: Some thunderstorms may be accompanied by gusty winds and hail
clause: weather_eventtopic: precip_act, name: thunderstorm, num: pl
quantifier: somepred: accompanied_by
adverb: possiblytopic: wind, num: pl, pred: gusty
and: precip_act, name: hail
Frame indexed under weather, wind, rain, storm, and hail
Content Understanding and Translation
Japanese:
Spanish: Algunas tormentas posiblement acompanadas por vientos racheados y granizo
Chinese: ¤@ ¨Ç ¹p «B ¥i ¯à ·| ¦ñ ¦³ °} · ©M ¦B ¹r
wind
hail
rain/storm
weather
Page 15
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
• User asks: “Will it rain tomorrow in Boston?” • System paraphrases query, then responds in Chinese• “Please repeat that” in English or Chinese interpreted identically• System repeats response in Chinese• User speaks query in English: seamless language switching • System paraphrases, then translates query into Chinese• User attempts to repeat translation
– Recognition error: hallucinates an erroneous date (February 30) which will be remembered
• System supplies known cities in England• User chooses London• System has no weather for London on February 30• User asks “how about today?”• System provides London’s weather today• User asks for a translation into English, which is provided
Audio Demonstration
Page 16
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Proposed Translation Procedure
Chinese query
Linguistic Frame
English query
parse parse
Linguistic Frame
transfer
Key-value Representation generate
generate
generate
{c wh_question
:topic {q name
:poss “you” }
:auxil “link”
:complement {q object :trace “what” }
{c wh_question
:topic {q name }
:pro “you”
:verb “call”
:complement {q object :trace “what” }
“what is your name” “ni3 jiao4 shen2_me5 ming2_zi4”
{c eform
:attribute “name”
:person “you” }
If generated query fails to parse,simplify interlingua and generation
Page 17
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Proposed Exercise using Typed Inputs
Input: Da2 la2 si4 hui4 xia4 yu3 ming2 tian1 ma5?
Query:
Response:
Type-in Window
Reply Window
Input:
Query: Da2 la1 si1 ming2 tian1 hui4 xia4 yu3 ma5?
Response: Da2 la1 si1 ming2 tian1 xia4 wu3 xia4 te4 da4 yu3
Next: Dallas rain tomorrowNext: Los Angeles wind Saturday
System color codes errors in tone and in syntactic constructs
System is able to parse query in spite of tone errors and (limited) syntax errors
Page 18
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Testing the Effectiveness of Training on Typed Input: Proposed Measures
• Compare the quality of spoken dialogue recorded before and after a Web-based training session
• Measures of fluency: – Syntactic well-formedness
– Tone production accuracy
– Frequency of pauses, edits, and filler words
– Phonetic quality , etc.
• Measures of communication success:– Frequency of usage of translation assistance
– Understanding error rate
– Task completion
– Time to completion, etc.
Page 19
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Technology Goal: Automated Language Understanding
English Sentence
Corpus Pairs
Grammar Induction
Mandarin Parsing
Grammar
Once translation ability exists from English to target language, can create reverse system almost effortlessly
Interlingual Representation
parse Mandarin Sentence
generate
Utilizes English parse tree and
Mandarin generation lexicon to induce
Mandarin parse tree
Page 20
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Building NxN Translation Efficiently
English
JapaneseMandarin
French Arabic
Korean
Automatic Grammar Induction
InterlinguaInterlingua
Spanish Urdu
Page 21
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory
Future Plans (Near Term and Long Term)
• Install current version of system at Cambridge University
• Incorporate CU Mandarin recognizer
• Add support for audio input at the computer
• Build high quality synthesis capability
• Improve understanding, dialogue, and translation performance
• Collect and transcribe data from language learners and assess both system and students
• Develop various scoring algorithms for student fluency
• Refine all aspects of system based on collected data
Page 22
SLS
MIT Computer Science and Artificial Intelligence Laboratory
MILSpeechGroup
CUED Machine Intelligence Laboratory