Top Banner
Student: Chao-Hong Meng Advisor: Lin-Shan Lee June 15, 2009
30

Thesis Defense 2007

Jul 04, 2015

Download

Technology

Paul Meng
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Thesis Defense 2007

Student: Chao-Hong Meng

Advisor: Lin-Shan Lee

June 15, 2009

Page 2: Thesis Defense 2007

Phone Sequence sh iy hv ae

MFCC/PLP

Structural SVMHMM

Page 3: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 4: Thesis Defense 2007

Speech Recognition:

Modeling is hard

Traditional approach:

Acoustic Model

Language Model

Page 5: Thesis Defense 2007

Structural SVM:◦ A model can handle structural output

◦ Formulated by Joachims

◦ Directly model

As a preliminary research◦ is MFCC/PLP/Posterior

◦ is phone sequence

Page 6: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 7: Thesis Defense 2007

Baseline Model:◦ Monophone HMM

◦ Tandem

Difference is their inputs:◦ HMM: MFCC/PLP

◦ Tandem: Posterior

Page 8: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 9: Thesis Defense 2007

Define a compatibility function◦ is feature vector sequence

◦ is the correct phone sequence of

Goal is to find a such that

High to Low

Page 10: Thesis Defense 2007

We assumes that

is inner product

is combined feature function to encode the relationship of x and y

In this research, the transition count and emission count are considered

The output of :

◦ (transition count, emission count)

Page 11: Thesis Defense 2007

Settings:◦ 3 different phone labels

◦ 2-dim feature vectors

A training sample:

Calculate◦ Transition count matrix A

◦ Emission count matrix B:

A

B

C

A B C

A

B

C

z1

z2

z2z1

Page 12: Thesis Defense 2007

Concate the rows of A and B

The output of consists of◦ transition count

◦ emission count

Page 13: Thesis Defense 2007

Output label set (phone set) with size K:

Kronecker delta:

Define

◦ Only one element is not zero in

Tensor product:

Page 14: Thesis Defense 2007

Define as follows:

Emission Count

Transition Count

Page 15: Thesis Defense 2007

For different pairs of (x, y), there is a different value of

Recall we define compatibility function as follows

So we have a different preference to y for a given x

w to be estimated Obtain from training data

Page 16: Thesis Defense 2007

contains the information about transition and emission

w

Y

X

Page 17: Thesis Defense 2007

Purpose:

◦ Given a training sample , we want pairs with answer be the first.

◦ The gap between first ant the others be as large as possible

Margin:

Training sample 1

Training sample 2

Training sample 3

Maximise

Page 18: Thesis Defense 2007

The primal form of Structural SVM:

Where are slack variables, which could let margin be negative.

means the margin should be at least

Error tolerance weight

Page 19: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 20: Thesis Defense 2007

Corpus: TIMIT◦ Continuous English◦ Phone set: 48 phones

Frontend: ◦ MFCC/PLP + delta + delta delta (39 dim in total)◦ processed by CMS

HMM◦ 3 state HMM for each phone.◦ 32 Gaussian Mixture in each state.

Tandem◦ 1000 hidden nodes◦ Looking at 4 previous frames and 4 next frames and current frames◦ Reduce to 37 dimension with PCA

Training set Testing set

# of Sentences 3696 192

Page 21: Thesis Defense 2007

Structural SVM◦ Define as previously stated:

◦ The output dimension of

48 * 48 + 48 * 39 = 4176

Emission Matrix

Transition Matrix

Assume First Order Hypothesis

Could be Second Order

48 48

39 48

Page 22: Thesis Defense 2007

58.00%

60.00%

62.00%

64.00%

66.00%

68.00%

70.00%

72.00%

HMM

HMM Tandem

Best: 70.42%

Better

Page 23: Thesis Defense 2007

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

Primal C=10

Dual C=10

In theory, they should be the same

Solving Primal Form

Solving Dual Form

Page 24: Thesis Defense 2007

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

C=1

C=10

C=100

C=1000

Slack Variable Weight

Accuracy increases with C

Input: PosteriorInput: MFCC/PLPBest: 71.75%

Page 25: Thesis Defense 2007

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

C=1 C=10 C=100 C=1000

MFCC

PLP

MLP-MFCC

MLP-PLP

PCA-37-MLP-MFCC

PCA-37-MLP-PLP

Without dim-reduction is better than dim-reduction

Page 26: Thesis Defense 2007

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

HMM

SVM-struct

absolute1.33% improvement

71.75% 70.42%

Getting Worse

Page 27: Thesis Defense 2007

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

First, C=1

Second, C=1

First, C=10

Second, C=10

C = 1

C = 10

Better Better

Page 28: Thesis Defense 2007

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Page 29: Thesis Defense 2007

Structural SVM performs badly when input is MFCC/PLP

But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability.

Page 30: Thesis Defense 2007