Top Banner
Javier Macías-Guarasa International Computer Science Institute Berkeley, CA - USA Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora
45

Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

Jan 13, 2016

Download

Documents

mireya

Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora. Javier Macías-Guarasa International Computer Science Institute Berkeley, CA - USA. Overview. Introduction Acoustic adaptation MR SI task MR SD task Accent identification MR SI task FAE task Conclusions - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

Javier Macías-GuarasaInternational Computer Science Institute

Berkeley, CA - USA

Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

Page 2: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

2

Overview

• Introduction• Acoustic adaptation

– MR SI task– MR SD task

• Accent identification– MR SI task– FAE task

• Conclusions• Future work

Page 3: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

3

Introduction (I)

• Work on improving WER for non-native speakers in the ICSI MR corpus

• General details on the Meeting Recording corpus:– Number of speakers: 61– Speech segmented: 85:08:21– Number of accents: 15– ‘Workable’ accents:

• American 53:12:35 15m+8f• German 11:37:01 10m+2f• Spanish 04:38:24 4m+1f• British 01:03:45 2m+0f just for reference

Page 4: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

4

Introduction (II)

• Initial idea:– Pronunciation modeling for non native

speakers

• Acoustic adaptation techniques to be tested first:– SRI Decipher system capabilities:

• MAP/MLLR/PhoneLoop

– Analyze different strategies

• Speaker dependent and independent tasks

Page 5: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

5

Introduction (III)

• Accent identification:– Needed to effectively use accent-dependent

models in a real-world system– Emphasis in ‘practical’ approaches using,

again, SRI Decipher capabilities

• MR task is a difficult acoustic environment:– Low number of speakers/speech material– Certain speakers dominance (more details?)

• FAE task also approached

Page 6: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

8

Introduction (VI)Baseline WERs

• Using SRI 2003 system, WER:

40.3%34.1%

52.3%

104.2%

95.6%

41.4%

33.0%

51.6%

88.2%

65.0%

0%

20%

40%

60%

80%

100%

120%

All American German Spanish British

New SI partition

New SD partition

Page 7: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

11

Acoustic adaptation (I)

• Initial studies with old partitioning shows that global task adaptation through MAP is the best approach:– Accent-dependent MAP adaptation also promising

• Initial attempt to do full retraining using 16KHz speech (also 8KHz speech as reference):– Very bad results (more details?)

• Worse than baseline!!

– Too few speakers in the training set given the task partition (speaker independent)

Page 8: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

12

Acoustic Adaptation (II) Previous work

• Interest in language learning tools (CALL)

• Standard acoustic adaptation techniques– MAP/MLLR using L1 or L2 speech data– Model interpolation– Clustering– Sufficient for high proficiency speakers

• Pronunciation modeling:– Little (if any) success reported

Page 9: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

14

Acoustic Adaptation (IV) Objectives

• Strategies for SI task, ¿combined improvement?:– Task MAP adaptation (TaskMAP)– Accent dependent MAP (AccMAP)– TaskMAP followed by AccMAP/MLLR

• Strategies for SD task:– Task MAP adaptation (TaskMAP) (includes

speaker adaptation)– Per speaker MAP adaptation (SpkMAP)

Page 10: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

15

Acoustic adaptation (V)

• Strategies for Acoustic adaptation:– Adaptation weights tuned per accent (heldout)

– Final phoneloop stage

MAP(task adaptation)Full DB MAP/MLLR

SWBmodels

MAP/MLLR

MAP/MLLR

Global MAPmodels

.

.

.

Am DB

Ge DB

Sp DB

Am MAPmodels

Ge MAPmodels

Sp MAPmodels

OR?

TaskMAP

AccMAPTask+AccMAP

Page 11: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

17

Acoustic Adaptation (VII) MR Speaker Independent Task

• SI adaptation summary, WER:

34.1%

52.3%

104.2%

95.6%

30.4%

42.3%

93.2%87.9%

0%

20%

40%

60%

80%

100%

120%

American German Spanish British

Baseline SI

TaskMAP-optimal

AccMAP-optimal

TaskMAP + AccMAP-optimal

+ phoneloop pass2

Page 12: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

18

Acoustic Adaptation (VIII) MR Speaker Independent Task

• SRI 5xRT system:– Using new dictionary and interpolated LMs– Using best map adapted models for mel

features– Still some bugs in the process (more details?)

American German Spanish BritishBest single system 30.4% 42.3% 93.2% 87.9%

SRI 5xRT system adapted 33.6% 44.9% 86.6% 79.7%SRI 5xRT system STD 31.0% 44.1% 93.4% 78.7%

err (%rel) [ 10.5% ] [ 6.1% ] [ -7.1% ] [ -9.3% ]

Page 13: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

20

Acoustic Adaptation (X) MR Speaker Dependent Task

• SD adaptation summary, WER:

33.0%

51.6%

88.2%

65.0%

29.5%

37.1%

54.3%

60.4%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

American German Spanish British

Baseline SDTaskMAP-optimalSpkMAP-optimal+ phoneloop pass2

Page 14: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

21

Accent Identification (I)• Background:

– Techniques similar to Language Identification– GMM based:

• Broad collection of features• GMM tokenizers

– Broad phonetic classes + HMMs– LM/AM score comparison– Based on phonotactic characteristics:

• PPRLM, PRLM

– More complex than LID– Hard to compare rates: No previous work in MR/FAE

Page 15: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

23

Accent Identification (II) Objectives

• Strategy: Use SRI Decipher characteristics– Practical approach: Reasonable run times– GMM classification module (for gender detection)

• Evaluate standard features and normalizations

– Hypothesis driven, phone recognition:• CD/CI models • Recognition using flat Phone LM or flat LM• View as a text classification problem

– Phone LM driven:• PRLM/PRLM

– Combination using NNs

Page 16: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

24

Accent Identification (III) MR data: MM classification approach

• GMM results for MR corpus:– Unbalanced data tested over and under

sampling– Use different features & normalization:

• No significant differences (except when using voicing features):

– lack of data– ~Uniform channels

23855 5814 2335 288American German Spanish British ID rate Naive rate err (%rel)

fc downsampling 256 96.1% 65.9% 0.0% 0.0% 82.9% 73.9% -34.5%fc 2048 96.1% 71.6% 0.0% 0.0% 83.9% 73.9% -38.2%

Page 17: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

25

Accent Identification (IV) MR data: GMM classification approach

• GMM results for MR corpus :– As a function of utterance length task

AM-GE-SP-BR

60%

65%

70%

75%

80%

85%

90%

95%

100%

1 10 100> Utterance length (in seconds)

AI

rate

Chance

256

Page 18: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

26

Accent Identification (V) MR data: Hypothesis driven approach

• Text classification view using MR data:– Input from phone recognition:

• From free phone recognition (CD/CI models, full/flat PLM)

– Rainbow: CMU tool for text classification• Naive bayes classification technique• N-grams (1..6)• No further restrictions (feature selection, stop list,

etc.)

Page 19: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

27

Accent Identification (VI) MR data: Hypothesis driven approach

• Text classification view using MR data:– Best results using CI models + flat PLM

(bigrams & trigrams)– Chunk based classification rates (simulation):

Chunks ID rate Naive rate err (%rel)

Full task 84.6% 65.8% -55.0%american nonnat 83.8% 68.0% -49.3%american german 95.9% 81.8% -77.4%american spanish 92.8% 90.2% -26.6%american british - 98.1%am ger bri spa 89.3% 75.0% -57.2%

am ger spa 90.8% 76.1% -61.3%

Page 20: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

28

Accent Identification (VII) MR data: Hypothesis driven approach

• Text classification view using MR data:– Utterance based classification rates

(simulation):

– Need longer sequences!!

Utterances ID rate Chunk rate Naive rate err (%rel)

Full task 64.88% 84.6% 66.8% 5.7%american nonnat 65.21% 83.8% 69.1% 12.6%american german 80.16% 95.9% 92.1% 152.4%american spanish 91.79% 92.8% 92.1% 4.5%american british 80.42% - 97.0% 544.1%am ger bri spa 73.19% 89.3% 74.8% 6.3%

am ger spa 74.94% 90.8% 76.6% 7.0%

Page 21: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

29

Accent Identification (VIII) MR data: Hypothesis driven approach

• Text classification view using MR data:– Real partition classification rates:

– Worse-than-chance rates if utterance based (pending to do length-dependent AI task)

Chunks RealPartition ID rate ID simul Naive rate err (%rel)

Full taskamerican nonnat 75.46% 83.8% 71.2% -14.9%american german 75.16% 95.9% 73.9% -5.0%american spanish 95.24% 92.8% 92.1% -40.0%american british 91.45% - 92.2% 10.2%am ger bri spa 67.68% 89.3% 64.0% -10.2%

am ger spa 73.01% 90.8% 69.3% -12.0%

Page 22: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

30

Accent Identification (IX) Phone LM approach

• PRLM: Phone recognition & LM

PhonerecognizerSpeech

AMLM scoring

LM scoring

PLM accent 1

PLM accent N

Scorecomparison

Scorecomparison

Scorecomparison

Decision...

Page 23: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

31

Accent Identification (X) Phone LM approach

• PRLM: Phone recognition & LM– Tested different AMs for phonetic string

generation:• Std forced• Std SWB• MAP adapted per accent• Best is Std SWB

– Tested 1-6gram: • Best is trigram

– But very poor results

Page 24: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

32

Accent Identification (XI) MR data: Phone LM approach

• PRLM: Phone recognition & LM:– As a function of utterance length, task

AM-GE-SP-BR: Very bad results

40%

50%

60%

70%

80%

90%

100%

110%

0 10 20 30 40 50 60 70 80 90 100> Utterance length (in seconds)

AI

rate

Chance

StdAM-trigram

Page 25: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

33

Accent Identification (XII) Phone LM approach

• PPRLM: Parallel Phone recognition & LM

Phonerecognizer Models Z

Speech

LM scoringAccent z

Avg accent a

Decision

.

.

.

Phonerecognizer Models A

.

.

.

LM scoringAccent a

LM scoringAccent z

.

.

.

LM scoringAccent a

Avg accent aAvg accent a

Scorecomparison

ScorecomparisonAvg accent a

Avg accent aAvg accent z

.

.

.

Scorecomparison

Page 26: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

34

Accent Identification (XIII) FAE database

• Experiments with the FAE database:– 4500 speakers: More acoustic context– 20 seconds per speaker– Proficiency is labeled

• Strategy:– Apply standard techniques – Possibly:

• Use FAE-generated models in MR data

Page 27: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

35

Accent Identification (XIV) FAE database: GMM classification

• GMM:– Gender independent classification (16-2048)– FAE results in GE-SP task:

– Norm better than CMN. CMN better than plain features– Pending to test GD models

GMM size fc fasf fc fasf fc fasf fasf+ffvf Naive rate128 54.4% 59.2% 59.2% 63.2% 53.6% 64.8% 61.6% 51.6%256 59.2% 59.2% 59.2% 65.6% 58.4% 72.0% 63.2% 51.6%512 51.2% 60.0% 60.8% 60.0% 60.0% 68.0% 65.6% 51.6%1024 52.8% 64.8% 54.4% 63.2% 56.0% 64.4% 61.6% 51.6%

NormNo norm CMN

Page 28: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

36

Accent Identification (XV) FAE database: GMM classification

• GMM:– Combining FAE models with MR data:

• Using frame_cepstrum + CMN (GMM 256)

– Combination is possible, but more experiments are needed!!

GS-SP task German Spanish ID rate Naive rate err (%rel)

MR models 81.1% 40.0% 72.3% 66.0% -18.6%FAE models (cmn) 100.0% 21.9% 73.4% 66.0% -21.8%

Page 29: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

37

Accent Identification (XVI) FAE database: hypothesis driven

• Text classification view:– FAE results:

– Better than chance but, still, far from useful– Pending to test FAE models in MR data

ID rate Naive rate err (%rel)

GE-SP 58.9% 51.6% -15.0%FR-GE-IT-SP 36.2% 28.9% -10.3%ALL accents 13.2% 9.4% -4.3%13 accents 17.7% 12.2% -6.2%

Page 30: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

38

Accent Identification (XVII) FAE database: Phone LM approach

• PRLM/PPRLM:– Pending

• GMM better than text based classification. GE-SP task, for example:– GMM: 72.0% – Text-based: 58.9%

• Results as a function of speech length to be evaluated

Page 31: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

39

Conclusions

• Acoustic adaptation is important to face non-native accents:– MAP adaptation provided best results:

• Task adaptation+accent adaptation

– Work on tuning adaptation weights for SD & SI task (magnitude differences)

– Low proficiency speakers need additional improvements

• Non native speech recognition may not be solvable!

Page 32: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

40

Conclusions

• Accent identification:– Proved to be more difficult than LID– Different techniques applied:

• GMM techniques and text classification techniques showed promising results

• Standard PRLM strategy didn’t work as expected (score normalization needed?)

– PPRLM to be tested– Integration to be tested

Page 33: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

41

Future work

• Finish current experimentation:– Accent identification:

• Test features and normalizations in GMM and phone LM based

• Test acoustic scores ratios• Test LM scores• Test NN based combination

– NonNat speech characterization:• Errors phone/word• Model ‘usage’ distributions

Page 34: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

42

Future work

• Pronunciation modeling:– Evaluation of pronunciation variants found in the

SRI SWB dictionary for NonNat speech– Rule based:

• Rules in German (from Silke Goronzy’s work)• Rules in Spanish• ‘Speaking mode’ probability estimation (accent + …)

• Use of new databases (FAE, TED, Fisher)

Page 35: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

43

Future work

• A note on work on pronunciation modeling in the MR task:– The MR corpus is not suitable for data-driven

pronunciation modeling:• High error rates for non native speakers & limited number of

them• Rule based methods are to be tested first

– Initial work on evaluating current pronunciation alternatives is needed

– I got relevant rules for initial testing in German and Spanish

Page 36: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

44

Thank you!!

• To ICSI and the ICSI Speech Group, with special emphasis to:– Morgan– Andreas– Qifeng, Barry, Adam, Yang, Yan, Dave, Jeremy,

…– Sven & all international visitors– The FrameNet people (Miriam, Michael & Co.)– Staff, specially Lila, María Eugenia and Diane

Page 37: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

45

Page 38: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

46

MR Partitioning• Speaker independent (SI subtask)

Male Female Train Test

American 36:07:02 14:58:45 33:12:06 17:53:41

51:05:47 15 8 9m + 5f 6m + 3f

German 11:06:43 0:06:56 7:12:09 4:01:30

11:13:39 10 2 6m + 1f 4m + 1f

Spanish 3:05:47 1:12:39 2:46:57 1:31:29

4:18:27 4 1 2m + 1f 2m + 0f

British 1:03:45 0:00:00 0:54:53 0:08:51

1:03:45 2 0 1m + 0f 1m + 0f

Page 39: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

47

Full retraining

• Initial attempt to do full retraining using 16KHz speech:– With old partitioning

– Too few speakers in the training set given the task partition (speaker independent)

All American NonNatSWB models 44.1% 34.5% 82.6%Retrain16K+SWBwordnets 53.0% 46.2% 80.1%Retrain16K+SWBwordnets+newHLDA 51.7% 44.6% 79.8%Retrain8K+SWBwordnets 46.8% 40.5% 72.1%Retrain8K+SWBwordnets+newHLDA

Page 40: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

48

Speaker dominance

• Few speakers concentrate most speech material: spkID #length

me013 13:53fe008 6:32me011 5:37mn015 4:55me018 4:19mn007 4:16fe016 3:55me010 3:47mn017 3:01

total 50:15

Page 41: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

50

Acoustic adaptation (IV)

• SI TaskMAP adaptation, WER:

– Optimal map weight ~proportional to size of accented speech subset

– Bigger improvements in non native accents– Bigger improvements for bigger data size

All NonNat American German Spanish British

Baseline SI 40.3% 64.5% 34.1% 52.3% 104.2% 95.6%

TaskMAP-optimal 37.8% 57.9% 32.5% 46.4% 95.2% 88.5% err (%rel) [ -6.2% ] [ -10.2% ] [ -4.7% ] [ -11.3% ] [ -8.6% ] [ -7.4% ]

Page 42: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

51

Acoustic adaptation (IV)

• SI AccMAP adaptation, WER:

– Similar trends than TaskMAP, but no further improvements, except german benefits from task data!

American German Spanish BritishBaseline SI 34.1% 52.3% 104.2% 95.6%

TaskMAP-optimal 32.5% 46.4% 95.2% 88.5% err (%rel) [ -4.7% ] [ -11.3% ] [ -8.6% ] [ -7.4% ]

AccMAP-optimal 32.5% 46.0% 96.8% 91.9% err (%rel) [ -4.9% ] [ -13.6% ] [ -7.8% ] [ -4.2% ]

Optimal Weight 5 40 40 50

Page 43: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

52

Acoustic adaptation (IV)

• SI TaskMAP+AccMAP adaptation, WER:

– Small improvements over TaskMAP– Also tested MLLR instead

(taskMAP+AccMLLR), but no improvements

American German Spanish BritishBaseline SI 34.1% 52.3% 104.2% 95.6%

TaskMAP-optimal 32.5% 46.4% 95.2% 88.5% err (%rel) [ -4.7% ] [ -11.3% ] [ -8.6% ] [ -7.4% ]

+ AccMAP-optimal 32.5% 45.8% 95.0% 88.6% err (%rel) [ 0.0% ] [ -1.3% ] [ -0.2% ] [ 0.1% ]

Optimal Weight 5 20 30 40

Page 44: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

53

Gender ID issues

• Gender identification:– Per chunk gender ID:

– Per utterance gender ID:

#chunks Male Female ID rateTrue male 350 100.0% 0.0%

True female 97 8.3% 91.8%

#utterances Male Female ID rateTrue male 68304 100.0% 0.0%

True female 19563 2.9% 97.1%

98.2%

99.4%

#utterances Male Female ID rateTrue male 68304 88.5% 11.5%

True female 19563 22.6% 77.4%86.0%

Page 45: Acoustic Adaptation and Accent Identification in the ICSI MR and FAE Corpora

54

Acoustic AdaptationSRI 5xRT System results

British 002-mel 002-mel-expanded 005-plp 005-plp-rescored roverSTD 91.5 89.3 78.2 80.6 78.7Adapted 87.9 84.3 78.8 79.7 79.7BestSimple 87.9------------------------------------------------------------------------------Spanish 002-mel 002-mel-expanded 005-plp 005-plp-rescored roverSTD 97.5 94.2 95.9 95.1 93.4Adapted 88.0 85.1 88.4 88.2 86.6BestSimple 93.2------------------------------------------------------------------------------German 002-mel 002-mel-expanded 005-plp 005-plp-rescored roverSTD 50.7 47.6 44.7 44.5 44.1Adapted 45.8 45.8 44.4 44.5 44.9BestSimple 42.3------------------------------------------------------------------------------American 002-mel 002-mel-expanded 005-plp 005-plp-rescored roverSTD 36.3 33.2 31.2 30.8 31.0Adapted 33.6 34.3 33.3 33.1 33.6BestSimple 30.4