Top Banner
Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI
21

Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Open Problems in Speech Recognition

Nelson Morgan, EECS and ICSI

Page 2: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

ICSI and EECSICSI and EECS

•International Computer Science Institute

•Nonprofit, closely affiliated with UCB-EECS:

- faculty (e.g., Morgan, Feldman)- Board (Berlekamp, Karp, Malik)- students (PhD, MS)

• Focus areas in speech,language,theory, internet research; CITRIS involvement

Page 3: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

A working speech A working speech recognizer (circa 1920)recognizer (circa 1920)

Page 4: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

A working speech A working speech recognizer (circa 2002)recognizer (circa 2002)

Page 5: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Current ApplicationsCurrent Applications

•Toys

•Telephone queries (operator/touch tone replacement)

• Voice dialing (for cell phones)

• Dictation (esp. for specific domains)

Page 6: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Major Reasons for Major Reasons for SuccessSuccess

• Late 60’s statistical methodology (HMMs, developed for cryptography) applied to speech in 70’s and 80’s

• Moore’s Law + engineering refinements to HMM training/recognition (1986-now)

• Normalization approaches (mean norms, RASTA filtering, vocal tract length approx)

Page 7: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Two examples of things Two examples of things that helpedthat helped

• RASTA: 2% digit error -> 60% for different phone system; down to 3% using RASTA; now used for voice dialing in millions of cell phones

• Vocal tract length normalization: 1 parameter for each speaker, significant effect on errors; now used in all large research systems

Page 8: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Major Technical Major Technical ChallengesChallenges

•Speaker variability for fluent/conversational (pronunciation, rate, overlaps)

25-40%error on conversations

•Acoustic variability for general environments (noise, reverb, talker movement) 3-10%error on read digits (vs <1% in clean conditions)

Page 9: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Modern ASR SystemsModern ASR Systems

• From 50,000 ft, all ASR systems the same:

- compute local spectral envelope- determine likelihoods of speech

sounds- search for most likely HMMs

• Spectral envelope distorted by many things

- Alternatives often are bad fits to the statistical models

Page 10: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Pronunciation Lexicon

Signal Processing

PhoneticProbabilityEstimator

Decoder(word search)

WordsSpeech

Grammar

ASR in BriefASR in Brief

Page 11: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

ASR is half-deafASR is half-deaf

• Phonetic classification very poor

• Success due to constraints (domain, speaker, noise-canceling mic, etc)

• These constraints can mask the underlying weakness of the technology

Page 12: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Rethinking Acoustic Rethinking Acoustic Processing for ASRProcessing for ASR

• Escape dependence on spectral envelope

• Use multiple front ends across time/freq

• Modify statistical models to accommodate new front ends

• Design optimal combination schemes for multiple models

Page 13: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

The DARPA (IAO) The DARPA (IAO) “EARS” Program“EARS” Program

• New 5 year program to radically reduce errors in conversational speech-to-text

• Two components: - Rich Transcription (large reductions

in error rate, improvements in readability and portability to new languages)

- Novel Approaches (radical changes)

Page 14: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

EARS: Effective Affordable EARS: Effective Affordable Reusable Speech-to-textReusable Speech-to-text

• Rich Transcription: 4 teams- SRI/ICSI/UW- BBN/U.Pitt/UW/LIMSI- Cambridge U.- IBM

• Novel Approaches: 2 teams- ICSI/SRI/UW/OGI/Columbia/IDIAP- Microsoft

Page 15: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

time

Novel Approach 1: Novel Approach 1: Pushing the Envelope Pushing the Envelope

(aside)(aside)

• Problem: Spectral envelope is a fragile information carrier

estimate of sound identity

info

rmat

ion

fusi

on

10 msOLD

PROPOSED

• Solution: Probabilities from multiple time-frequency patches

i-th estimate

up to 1s

k-th estimate

n-th estimate

estimate of sound identity

Page 16: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Novel Approach 2: Novel Approach 2: Beyond Frames…Beyond Frames…

• Solution: Advanced features require advanced models, not limited by fixed-frame-rate paradigm

OLD

PROPOSED

conventional HMMshort-term features

• Problem: Features & models interact, new features may require different models

advanced features multi-rate / dynamic scale classifier

Page 17: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Other speech-to-text Other speech-to-text projectsprojects

• Dialog systems: DARPA Communicator/Symphony, German SmartKom

• Noise/reverberation for cell phone, military environments: DARPA SPINE program, various European projects (EU, ETSI)

• Recognition/retrieval/summarization for multiparty meetings: Swiss IM2, EU m4, ICSI/UW/SRI/Columbia NSF-ITR

Page 18: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Resource generation Resource generation from Berkeley from Berkeley researchersresearchers

• gmtk - a new graphical model toolkit specialized for speech (extension of 2 PhD theses, Bilmes [UW] and Zweig [IBM]) -

• Publicly available speech/neural network software (RASTA, speech neural network training system)

• Soon: a “meeting data” corpus

Page 19: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Campus interactionCampus interaction

• Within EECS (CIS):- Feldman (also ICSI), NLU- Jordan and Russell, machine

learning

• Linguists:- Ohala, phonology- Fillmore(ICSI), semantic

lexicography

Page 20: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Natural Speech + Natural Speech + Language Projects at Language Projects at

ICSI/EECSICSI/EECS• Berkeley Restaurant Project (BeRP) - online stochastic context free grammar probabilities with natural mixed initiative

• SmartKom - tourist information query system w/American pronunciations of German place names

Page 21: Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

SummarySummary

• Progress in speech recognition research led to working systems in particular domains

• Performance still severely limited for conversational speech, noisy/reverberant conditions

• We and others are working to transcend these limitations with novel approaches