mputer Science and Artificial Intelligence Laboratory Multilingual Conversational Systems SPEECH RECOGNITION LANGUAGE UNDERSTANDING LANGUAGE GENERATION Language Independent Language Transparent DIALOGUE MANAGER DATABASE Graphs & Tables Meaning Representation DISCOURSE CONTEXT Language Dependent Rules Rules SPEECH SYNTHESIS SPEECH SYNTHESIS SPEECH SYNTHESIS Models Models Models
43
Embed
C omputer S cience and A rtificial I ntelligence L aboratory Multilingual Conversational Systems SPEECH RECOGNITION LANGUAGE UNDERSTANDING LANGUAGE GENERATION.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Computer Science and Artificial Intelligence Laboratory
Multilingual Conversational Systems
SPEECHRECOGNITION
LANGUAGEUNDERSTANDING
LANGUAGEGENERATION
LanguageIndependent
LanguageTransparent
DIALOGUEMANAGER
DATABASE
Graphs& Tables
MeaningRepresentation
DISCOURSE CONTEXT
LanguageDependent
Rules
Rules
SPEECHSYNTHESIS
SPEECHSYNTHESIS
SPEECHSYNTHESIS
ModelsModelsModels
Computer Science and Artificial Intelligence Laboratory
4. Train LM statistics for both recognizers from corpora
5. Develop parsing grammar for Mandarin queries and generation rules for Mandarin responses
Not yet completed:
1. Develop domain-specific user simulation capability
2. Generate thousands of dialogues in both languages
3. Train recognizers and users from simulated dialogues
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months
• Translation from English to Mandarin– Mainly focused on user queries (as contrasted with responses)
– Integrating generation-based translation with example-based approach
– Exploring the use of statistical machine translation
* Use phrase-based statistical translation framework developed by Phillip Koehn
* Utilized the formal methods to generate domain-specific parallel corpus in weather query domain
* Implemented a finite-state transducer version of the decoder and integrated with Galaxy
• Translation from Mandarin to English– Use statistical method to obtain Chinese to English translation
capability
– Explore grammar induction techniques to create parsing grammar for Mandarin queries, towards developing formal methods for Mandarin to English translation
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months, Cont’d
• System Development– Upgraded weather harvesting process
– Upgraded database server to support Postgres in addition to Oracle
– Improved dialogue management
* Better handling of meta queries
– Developed a new GUI interface ovecoming firewall limitations
* Support automatic checking and correction of typed tone errors
* Better display of tones as diacritcs
– Developed a new concatenative speech synthesis capability for high quality translation of user queries spoken in English using Envoice
– Developed a batchmode capability to process synthetic speech through dialogue interaction to aid system development
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months, Cont’d
Presentations:– Three talks at InStill Workshop in Venice
* Wang and Seneff: Translation
* Seneff et al. : LL Systems
* Peabody et al.: Web based interface for tone acquisition
– ISCSLP:
* Seneff et al.: Focused on MuXing system overall
– SigDial Demo Session
* Wang and Seneff: Presentation and live demonstration
– One hour seminar at Microsoft China’s Speech Group
– One hour seminar at Defense Language Institute in Monterey
– Demonstrated system to Julian Wheatley, head of Chinese department at MIT and to Henry Jenkins, director of MIT Comparative Media Studies
Computer Science and Artificial Intelligence Laboratory
Activities over the Last Nine Months, Cont’d
Data collection initiatives:– Eight subjects have completed Web-based exercise at MIT
– Two visits by Stephanie Seneff to Defense Language Institute in Monterey California
* One successful class participation exercise
* Another attempted but aborted due to power outage
– Installed Web-based exercise system on computers at MIT Language Lab
* Julian Wheatley has agreed to support data collection initiatives with students in the MIT Chinese classes
Computer Science and Artificial Intelligence Laboratory
Bilingual Recognizer Construction
English corpus
English Recognizer Language Model
Chinese Recognizer Language Model
Chinese corpus
GenerateParse Semantic Frame
• Two languages compete in common search space
• Automatically translate existing English corpus into Mandarin
• Use NL grammar to automatically induce language model for both English and Mandarin recognizers
EnglishNetwork
ChineseNetwork
Recognizer
Computer Science and Artificial Intelligence Laboratory
Automatic Grammar Induction
English Sentence
Corpus Pairs
Grammar Induction
Mandarin Parsing
Grammar
Once translation ability exists from English to target language, can create reverse system almost effortlessly
Interlinguaparse Mandarin Sentence
generate
Utilizes English parse tree and
Mandarin generation lexicon to induce
Mandarin parse tree
Computer Science and Artificial Intelligence Laboratory
NLG
Synthesis
NLU
Recognition
Multilingual Spoken Translation Framework
Common meaning representation: semantic frame
ParsingRules
GenerationRules
Models
SpeechCorpora
EnglishChineseSpanishJapanese
EnglishChineseSpanishJapanese
Semantic Frame
Computer Science and Artificial Intelligence Laboratory
Challenges in Cross-languageGeneration for Translation
• Some expressions have very different syntactic structures in different languages
What is your name? 你 (you) 叫 (call) 什么 (what) 名字(name)? I like her. Ella me gusta.
附近 (vicinity) 哪儿 (where) 有 (have) 银行 (bank)?Where is a bank nearby?
that hotel 那 (that) 家 (<particle>) 旅馆 (hotel)
I lost my key. 我 (I) 丢 (lose) 了 (<past tense>) 我的 (my) 钥匙 (key).
– Particles (Chinese but not English)
– Gender (extensive in Spanish)
• Syntactic features are expressed in many different ways– Determiners (English but not Chinese)
Computer Science and Artificial Intelligence Laboratory
How long does it take to take a taxi thereHow long take take taxi there
An Example: English/Chinese
• Function words disappear in Chinese
How long does it take to take a taxi there
( take taxi go there need how long )
坐 出租车 去 那里 要 多久
• Sentence structure is very different
• Verb “go” omitted in English
• Two instances of “take” have different translations
How long need take taxi thereHow long need take taxi go there
Computer Science and Artificial Intelligence Laboratory
Semantic Frame for Example
• Semantic frame is identical for both inputs, except for missing function words in Mandarin
• Where necessary, constituent movement is invoked to render the same hierarchical structure
• English generation predicts missing function words
• Mandarin generation infers “go” from “destination” predicate
Computer Science and Artificial Intelligence Laboratory
Stage 1: Drill Exercises
• Web-based Interface to provide practice in typing queries in the weather domain
• 10 weather scenarios to be solved using typed pinyin: “Boston, rain, tomorrow”– Student given feedback on both query completeness and tone
accuracy
• Separate recording sessions allow user to practice both read and spontaneous spoken queries– Recordings will be used to train the system on accented speech
– Recordings will also be assessed for tone quality
• The Defense Language Institute in Monterey conducted a successful experiment using this Web-based interface in a class of 30 students
• We are planning to introduce the exercise in the language laboratory at MIT
Computer Science and Artificial Intelligence Laboratory
Lexical Tone Correction
• Character representation does not explicitly encode tone:– 洛杉矶星期一刮风吗?
• Exploit pinyin to help student acquire tonal knowledge:– Diacritic: luò shān jī xīng qī yī guā fēng ma?
– Percentage of utterance containing pauses and disfluencies
Computer Science and Artificial Intelligence Laboratory
Tone analysis: Native vs Non-Native Mandarin
• Creating pitch contours– F0 extracted using algorithm in (Wang and Seneff, 2000)
– Statistics of each pitch contour over each syllable considered without regard for left or right contexts
• Normalization– Duration normalized by sampling at 10% intervals
– Pitch normalized according to:
• Comparisons based on (Wang et al., 2003)– Include normalized F0 value, peak, valley, range, peak position,
valley position, falling range, and rising range
• Corpus (from the Defense Language Institute) – 2065 utterances from 4 native speakers
– 4657 utterances from 20 non-native speakers
LH
LxxT
lglg
lglg5)(
Computer Science and Artificial Intelligence Laboratory
Tonal averages over all syllables:Native Example
Computer Science and Artificial Intelligence Laboratory
Tonal averages over all syllables:Non-Native Example
Computer Science and Artificial Intelligence Laboratory
Capturing Phonological Errors
• Leverage phonological modeling capabilities of SUMMIT– Model typical pronunciation errors explicitly
– Direct and intuitive mapping from linguistic rules
– Support both within-language and cross-language substitutions
• Initial experiments completed on Koreans learning English (Kim et al., ICSLP 2004)– Phonological rules capture typical problems such as schwa insertion and
/dh/ /d/ confusions
– Best path in alignment used to detect errors
– Verbal feedback given to student
• Current research to apply to Americans learning Mandarin– Build single recognizer to support both languages
– Use data-driven approaches to discover most likely cross-language phone substitution errors
– Explicitly encode such errors in formal phonological rules
– Side benefit may be improved recognition for English-accented Mandarin
–
Computer Science and Artificial Intelligence Laboratory
{} dh {} => dh | [dcl] d ; // Becomes an onset stop as in 'they'. No [dh] in Korean phonemes..
{} dd {} => dcl [d [ax]] ; // A vowel may be inserted after a coda consonant (Staccato Rhythm)
{CONSONANT} td {CONSONANT} => [tcl] [t] | tcl t [ax]; // No CCC allowed in Korean
Detecting Phonological Errors
Computer Science and Artificial Intelligence Laboratory
Future Plans
• Develop tools to rapidly port to new domains and languages– Automatic grammar induction
– Generic dialogue modeling
– Simulated dialogue interactions
• Develop various scoring algorithms for quality assessment of student’s speech
• Develop high quality synthesis capability for Mandarin translations, for multiple domains of knowledge
• Collect and transcribe data from language learners and evaluate both system and students– Begin with weather domain, our most mature system
– Extend to other domains once they are better developed
• Refine all aspects of systems based on collected data