Thesis Defense 2007

Post on 04-Jul-2015

497 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

Transcript

Student: Chao-Hong Meng

Advisor: Lin-Shan Lee

June 15, 2009

Phone Sequence sh iy hv ae

MFCC/PLP

Structural SVMHMM

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Speech Recognition:

Modeling is hard

Traditional approach:

Acoustic Model

Language Model

Structural SVM:◦ A model can handle structural output

◦ Formulated by Joachims

◦ Directly model

As a preliminary research◦ is MFCC/PLP/Posterior

◦ is phone sequence

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Baseline Model:◦ Monophone HMM

◦ Tandem

Difference is their inputs:◦ HMM: MFCC/PLP

◦ Tandem: Posterior

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Define a compatibility function◦ is feature vector sequence

◦ is the correct phone sequence of

Goal is to find a such that

High to Low

We assumes that

is inner product

is combined feature function to encode the relationship of x and y

In this research, the transition count and emission count are considered

The output of :

◦ (transition count, emission count)

Settings:◦ 3 different phone labels

◦ 2-dim feature vectors

A training sample:

Calculate◦ Transition count matrix A

◦ Emission count matrix B:

A

B

C

A B C

A

B

C

z1

z2

z2z1

Concate the rows of A and B

The output of consists of◦ transition count

◦ emission count

Output label set (phone set) with size K:

Kronecker delta:

Define

◦ Only one element is not zero in

Tensor product:

Define as follows:

Emission Count

Transition Count

For different pairs of (x, y), there is a different value of

Recall we define compatibility function as follows

So we have a different preference to y for a given x

w to be estimated Obtain from training data

contains the information about transition and emission

w

Y

X

Purpose:

◦ Given a training sample , we want pairs with answer be the first.

◦ The gap between first ant the others be as large as possible

Margin:

Training sample 1

Training sample 2

Training sample 3

Maximise

The primal form of Structural SVM:

Where are slack variables, which could let margin be negative.

means the margin should be at least

Error tolerance weight

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Corpus: TIMIT◦ Continuous English◦ Phone set: 48 phones

Frontend: ◦ MFCC/PLP + delta + delta delta (39 dim in total)◦ processed by CMS

HMM◦ 3 state HMM for each phone.◦ 32 Gaussian Mixture in each state.

Tandem◦ 1000 hidden nodes◦ Looking at 4 previous frames and 4 next frames and current frames◦ Reduce to 37 dimension with PCA

Training set Testing set

# of Sentences 3696 192

Structural SVM◦ Define as previously stated:

◦ The output dimension of

48 * 48 + 48 * 39 = 4176

Emission Matrix

Transition Matrix

Assume First Order Hypothesis

Could be Second Order

48 48

39 48

58.00%

60.00%

62.00%

64.00%

66.00%

68.00%

70.00%

72.00%

HMM

HMM Tandem

Best: 70.42%

Better

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

Primal C=10

Dual C=10

In theory, they should be the same

Solving Primal Form

Solving Dual Form

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

C=1

C=10

C=100

C=1000

Slack Variable Weight

Accuracy increases with C

Input: PosteriorInput: MFCC/PLPBest: 71.75%

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

C=1 C=10 C=100 C=1000

MFCC

PLP

MLP-MFCC

MLP-PLP

PCA-37-MLP-MFCC

PCA-37-MLP-PLP

Without dim-reduction is better than dim-reduction

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

HMM

SVM-struct

absolute1.33% improvement

71.75% 70.42%

Getting Worse

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

First, C=1

Second, C=1

First, C=10

Second, C=10

C = 1

C = 10

Better Better

Motivation

Baseline Model

Structural SVM

Experiments

Conclusion

Structural SVM performs badly when input is MFCC/PLP

But Structural SVM can improve absolute 1% over Tandem when using Posterior Probability.

top related