Top Banner
What I did on my What I did on my Summer “Vacation“ Summer “Vacation“ Jeremy Morris Jeremy Morris 10/06/2006 10/06/2006
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

What I did on my What I did on my Summer “Vacation“Summer “Vacation“

Jeremy MorrisJeremy Morris

10/06/200610/06/2006

Page 2: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - DAGSISummer at AFRL - DAGSI

AFRLAFRL• Air Force Research LabsAir Force Research Labs• Wright-Patterson AFB, Dayton OHWright-Patterson AFB, Dayton OH

DAGSI Student/Faculty Resarch DAGSI Student/Faculty Resarch Fellowship programFellowship program• Dayton Area Graduate Studies InstituteDayton Area Graduate Studies Institute• Effort to encourage collaboration Effort to encourage collaboration

between Ohio universities and AFRLbetween Ohio universities and AFRL

Page 3: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL – SCREAM LabSummer at AFRL – SCREAM Lab

SCREAM LabSCREAM Lab• Speech and Speech and

Communication Research, Communication Research, Engineering, Analysis and Engineering, Analysis and Modeling LabModeling Lab

• Interest in a wide variety of speech research Interest in a wide variety of speech research issues for the militaryissues for the military

Speech-to-speech translation, rapid development Speech-to-speech translation, rapid development of speech recognition systems, etc.of speech recognition systems, etc.

Page 4: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL – Why us?Summer at AFRL – Why us?

SCREAM Lab members were SCREAM Lab members were interested in collaborating with OSUinterested in collaborating with OSU

SCREAM Lab working on research in SCREAM Lab working on research in using phonological features in using phonological features in speech recognitionspeech recognition• Perceived overlap with ASAT projectPerceived overlap with ASAT project

Page 5: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Review – Phonological FeaturesReview – Phonological Features

For the ASAT Project, we have been For the ASAT Project, we have been using phonological feature detectorsusing phonological feature detectors

We train detectors on a particular We train detectors on a particular phonological featurephonological feature• e.g. manner or place for consonant, e.g. manner or place for consonant,

height, frontness, etc. for vowelsheight, frontness, etc. for vowels We then combine these features We then combine these features

together for ASR purposestogether for ASR purposes

Page 6: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Phonological Features (cont.)Phonological Features (cont.)

SCREAM Lab very interested in SCREAM Lab very interested in phonological feature detectorsphonological feature detectors• Need for quick development of new ASR Need for quick development of new ASR

systems for new languagessystems for new languages• A full set of phonological feature A full set of phonological feature

detectors would allow reuse of acoustic detectors would allow reuse of acoustic data for training across new languagesdata for training across new languages

Multi-lingual detectors are clearly needed to Multi-lingual detectors are clearly needed to get full coverage of all featuresget full coverage of all features

Page 7: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Phonological Features (cont.)Phonological Features (cont.)

Our phonological feature detectorsOur phonological feature detectors• Monolingual (English only)Monolingual (English only)• Trained using a set of multi-layer perceptron Trained using a set of multi-layer perceptron

neural networksneural networks• Output a set of phonological feature class Output a set of phonological feature class

probabilitiesprobabilities SCREAM lab feature detectorsSCREAM lab feature detectors

• Monolingual and multilingualMonolingual and multilingual• Trained using Gaussian Mixture ModelsTrained using Gaussian Mixture Models• Output a set of likelihoodsOutput a set of likelihoods• Based on work by Tanja Schultz (CMU)Based on work by Tanja Schultz (CMU)

Page 8: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Besides acoustic models, new ASR Besides acoustic models, new ASR systems for new languages have systems for new languages have other needsother needs

An ASR system needs a lexicon An ASR system needs a lexicon mapping phones-to-wordsmapping phones-to-words• Normally hand-constructedNormally hand-constructed• Require time and expertiseRequire time and expertise

Page 9: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Our proposal: look at methods of Our proposal: look at methods of bootstrapping new lexicons from:bootstrapping new lexicons from:• Acoustic dataAcoustic data• Word-level transcriptsWord-level transcripts• Phonological feature detector outputsPhonological feature detector outputs

How?How?• Start by looking at work on deriving Start by looking at work on deriving

Acoustic Sub-Word UnitsAcoustic Sub-Word Units

Page 10: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRM - ProposalSummer at AFRM - Proposal

Acoustic Sub-Word Units (ASWUs)Acoustic Sub-Word Units (ASWUs)• Similar to phones in that they are Similar to phones in that they are

smaller pieces of wordssmaller pieces of words• BUT – automatically derived from BUT – automatically derived from

acoustics instead of manually definedacoustics instead of manually defined• Used to derive both a sub-word unit set Used to derive both a sub-word unit set

and a lexicon for that set simultaneouslyand a lexicon for that set simultaneously• Research in this area has been mainly to Research in this area has been mainly to

improve ASR performance improve ASR performance

Page 11: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Can we use these methods along Can we use these methods along with phonological features as inputs with phonological features as inputs to induce new lexicons?to induce new lexicons?• Using phonological features, the sub-Using phonological features, the sub-

word units may be mappable to word units may be mappable to standard IPA phone labelsstandard IPA phone labels

Page 12: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal The proposed system is inspired by an The proposed system is inspired by an

ASWU by (Singh et al., 2002)ASWU by (Singh et al., 2002)• Notable for not requiring word boundaries to Notable for not requiring word boundaries to

be marked for trainingbe marked for training Start with a basic dictionary (including a Start with a basic dictionary (including a

starting phoneset size)starting phoneset size) Train a set of acoustic models on the Train a set of acoustic models on the

training data with that dictionarytraining data with that dictionary Alter the basic dictionary in a manner that Alter the basic dictionary in a manner that

improves your pronunciationsimproves your pronunciations Repeat until a stopping criterion is reachedRepeat until a stopping criterion is reached

Page 13: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Start with a basic dictionaryStart with a basic dictionary• Start with an assumption that the Start with an assumption that the

number of phones in a word is related to number of phones in a word is related to the number of letters in the orthographythe number of letters in the orthography

Basic dictionary maps word to sequence of Basic dictionary maps word to sequence of letters in that word:letters in that word:

ABLE ABLE A B L E A B L E

BANNED BANNED B A N N E D B A N N E D

Page 14: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Train a set of acoustic modelsTrain a set of acoustic models• Using the basic dictionary, map words in Using the basic dictionary, map words in

the transcript to these “pronunciations”the transcript to these “pronunciations”• Train an HMM-model using the output of Train an HMM-model using the output of

the feature detectors as its input, and the feature detectors as its input, and the above mapping as training labelsthe above mapping as training labels

Page 15: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Alter the basic dictionaryAlter the basic dictionary• Using some metric, find a candidate “phone” to Using some metric, find a candidate “phone” to

be modifiedbe modified We’ve looked at a couple of metrics – more on this We’ve looked at a couple of metrics – more on this

laterlater

• Once the phone is identified, see if the phone Once the phone is identified, see if the phone should be “split” or “deleted”should be “split” or “deleted”

A “split” indicates that the given phone label actually A “split” indicates that the given phone label actually represents two different sounds, and so should be represents two different sounds, and so should be replaced with two different phone labelsreplaced with two different phone labels

A “delete” indicates that A “delete” indicates that for a particular word or for a particular word or wordswords the model fits better if that phone label is the model fits better if that phone label is removed from the pronunciationremoved from the pronunciation

Page 16: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Split example:Split example:

BE BE B E B E

DEVELOP DEVELOP D E1 V E1 L O P D E1 V E1 L O P

Delete examples:Delete examples:

ABLE ABLE A B L E :: ABLE A B L E :: ABLE A B L A B L

ABANDONED ABANDONED A B A N D O N D A B A N D O N D

Page 17: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

For splits, all possible alterations are For splits, all possible alterations are added to temporary lexiconadded to temporary lexicon

For deletes, we alter the HMM to add a For deletes, we alter the HMM to add a possible deletion arc for the phone possible deletion arc for the phone

After lexicon or HMM is altered, word After lexicon or HMM is altered, word transcript is force aligned using new transcript is force aligned using new possible pronunciationspossible pronunciations• Best pronunciations are pulled from this Best pronunciations are pulled from this

alignment and used to build new lexiconalignment and used to build new lexicon• Steps are repeated using the new lexicon in Steps are repeated using the new lexicon in

place of the basic lexiconplace of the basic lexicon

Page 18: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

How do we determine the candidate How do we determine the candidate “phone label” to alter?“phone label” to alter?• Initially, modelled each phone with two Initially, modelled each phone with two

Gaussians in the HMMGaussians in the HMM• Compared the two Gaussians to each other Compared the two Gaussians to each other

using their KL-divergencesusing their KL-divergences Took the phone label with the largest KL divergence Took the phone label with the largest KL divergence

as the one to alteras the one to alter Idea was that each Gaussian described a cluster – the Idea was that each Gaussian described a cluster – the

further these centers were from each other, the more further these centers were from each other, the more probable they were describing two different phonesprobable they were describing two different phones

Page 19: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

KL-divergence metric did not work KL-divergence metric did not work wellwell• System would pick candidates that a System would pick candidates that a

human would find unreasonable (such human would find unreasonable (such as “F” or “Q”)as “F” or “Q”)

• System would split or delete these System would split or delete these phones multiple times, continually phones multiple times, continually returning to the same phone labelreturning to the same phone label

Page 20: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal

Why did the KL divergence perform Why did the KL divergence perform this way?this way?• Suspcion: Large variations in the two Suspcion: Large variations in the two

Gaussians in areas that do not matter Gaussians in areas that do not matter for that phone pushed up the scores for that phone pushed up the scores (e.g. vowel features for consonants)(e.g. vowel features for consonants)

• Splitting these phones only allowed the Splitting these phones only allowed the coverage to spread wider, drawing the coverage to spread wider, drawing the system back to those phonessystem back to those phones

Page 21: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Summer at AFRL - ProposalSummer at AFRL - Proposal What next?What next? Tried Mahalanobis distance metric, with Tried Mahalanobis distance metric, with

poor results alsopoor results also Returned to Acoustic Sub-Word papers for Returned to Acoustic Sub-Word papers for

inspirationinspiration• Instead of looking at cluster stats, multiple Instead of looking at cluster stats, multiple

papers use an average frame likelihood metric papers use an average frame likelihood metric for each phone cluster to determine candidate for each phone cluster to determine candidate phone for alteringphone for altering

• Have started moving my code to use this Have started moving my code to use this framework – preliminary passes show promise, framework – preliminary passes show promise, but no results quite yetbut no results quite yet

Page 22: What I did on my Summer “Vacation“ Jeremy Morris 10/06/2006.

Conclusion – It’s 75 miles to DaytonConclusion – It’s 75 miles to Dayton Advice for those thinking of doing work at Advice for those thinking of doing work at

WPAFBWPAFB• Working in the SCREAM Lab was greatWorking in the SCREAM Lab was great

Hundreds of processors, tons of multi-lingual corporaHundreds of processors, tons of multi-lingual corpora Friendly people, decent work environment (if a bit dark)Friendly people, decent work environment (if a bit dark)

• Many hoops to jump through, even just for a Many hoops to jump through, even just for a summer student summer student

ID badges, computer usage training, etc.ID badges, computer usage training, etc.• Sometimes feels like you’re working at a Sometimes feels like you’re working at a

corporation…corporation… until the guys in uniform come arounduntil the guys in uniform come around

• The base is built like a campus crossed with a The base is built like a campus crossed with a prisonprison

cinderblock is the building material of choice.cinderblock is the building material of choice.• Don’t forget your ID BadgeDon’t forget your ID Badge

It’s 75 miles from Columbus to DaytonIt’s 75 miles from Columbus to Dayton