Top Banner
2000/05/03 1 ion using Gaussian M ixture Model Presented by CWJ
30

Speaker Identification using Gaussian Mixture Model

Feb 03, 2016

Download

Documents

gerodi

Speaker Identification using Gaussian Mixture Model. Presented by CWJ. Reference. D. A. Reynolds and R. C. Rose, “Robust Text- Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, vol.3, No.1, - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speaker Identification using          Gaussian Mixture Model

2000/05/03 1

Speaker Identification using Gaussian Mixture Model

Presented by CWJ

Page 2: Speaker Identification using          Gaussian Mixture Model

2000/05/03 2

Reference

D. A. Reynolds and R. C. Rose, “Robust Text-

Independent Speaker Identification Using

Gaussian Mixture Speaker Models”, IEEE Trans.

on Speech and Audio Processing, vol.3, No.1,

pp.72-83,January 1995.

Page 3: Speaker Identification using          Gaussian Mixture Model

2000/05/03 3

Outline

1. Introduction to Speaker Recognition

2. Gaussian Mixture Speaker Model (GMM)

3. Experimental Evaluation

Page 4: Speaker Identification using          Gaussian Mixture Model

2000/05/03 4

Introduction to Speaker Recognition

1. Two tasks of Speaker Recognition

-- Speaker Identification (this paper)

e.g. voice mail labeling

-- Speaker Verification

e.g. financial transactions

A. Some definitions of S.R.

Page 5: Speaker Identification using          Gaussian Mixture Model

2000/05/03 5

2. Two forms of spoken input

-- Text-dependent

-- Text-independent (this paper)

3. System Range

-- Closed Set (this paper)

-- Open Set

Page 6: Speaker Identification using          Gaussian Mixture Model

2000/05/03 6

B. Several Methods used in Speaker

Recognition

VQ

NN

1985 1995HMM

VQ

NN

GMM

HMM

VQ

NN

Page 7: Speaker Identification using          Gaussian Mixture Model

2000/05/03 7

1. Use long-term averages of acoustic features

(spectrum,pitch…) first and earliest

Idea :

To average out the factors influencing

intra-speaker variation, leave only

the speaker dependent component.

Drawback : required long speech utterance(>20s)

Page 8: Speaker Identification using          Gaussian Mixture Model

2000/05/03 8

2. Training SD model for each speaker

Explicit segmentation

HMM

Implicit segmentation

VQ,GMM

Page 9: Speaker Identification using          Gaussian Mixture Model

2000/05/03 9

HMM:

Advantage : Text-independent

Drawback : a significant increase in

computational complexity

VQ:

Advantage : unsupervised clustering

Drawback : Text-dependent

Page 10: Speaker Identification using          Gaussian Mixture Model

2000/05/03 10

3. The use of discriminative Neural Network (NN)

※ model the decision function which best discriminate speakers

Advantage : less parameters, higher performance compared to VQ model Drawback : The network must be retrained when a new speaker is added to the system.

Page 11: Speaker Identification using          Gaussian Mixture Model

2000/05/03 11

GMM :

Advantage : Text-Independent

probabilistic framework (robust)

computationally efficient

easily to be implemented

Page 12: Speaker Identification using          Gaussian Mixture Model

2000/05/03 12

The Gaussian mixture model (GMM)

A. Model Interpretations

Speech Recognition

(GMM) State Level

Page 13: Speaker Identification using          Gaussian Mixture Model

2000/05/03 13

Speaker RecognitionSpeaker k

1

1

2

2

1p 2p

……………………

i

i

ip

Acousticclass

1. Each Gaussian component models an acoustic class

Page 14: Speaker Identification using          Gaussian Mixture Model

2000/05/03 14

2. GMM gives the arbitrarily-shaped densities a better

approximation.

Page 15: Speaker Identification using          Gaussian Mixture Model

2000/05/03 15

Page 16: Speaker Identification using          Gaussian Mixture Model

2000/05/03 16

B. Signal Analysis

Page 17: Speaker Identification using          Gaussian Mixture Model

2000/05/03 17

C. Model Description

Gaussian Mixture Density

)()|(1

xbpxpM

iii

Where x

D-dimensional random vector

)()'(

2

1exp

)2(

1)( 1

212 iii

iDi xxxb

iiip ,, Mi ,,1

Nodal, Grand,Global

Nodal, diagonal (this)

Page 18: Speaker Identification using          Gaussian Mixture Model

2000/05/03 18

D. ML Parameter Estimation

Step:

1. Beginning with an initial model

2. Estimate a new model such that

3. Repeated 2. until convergence is reached.

)|()|( XpXp

Page 19: Speaker Identification using          Gaussian Mixture Model

2000/05/03 19

Mixture Weights

Means

Variances

T

tti xip

Tp

1

),|(1

T

t t

T

t tti

xip

xxip

1

1

),|(

),|(

2

1

1

22

),|(

),|(iT

t t

T

t tti

xip

xxip

M

k tkk

tiit

xbp

xbpxip

1)(

)(),|(

Page 20: Speaker Identification using          Gaussian Mixture Model

2000/05/03 20

E. Speaker Identification

a group of speakers S = {1,2,…,S} is represented by

GMM’s λ1, λ2, …, λs

)(

)Pr()|(maxarg)|Pr(maxargˆ11 Xp

XpXS kk

Skk

Sk

)|(maxargˆ1

kSk

XpS

)|(logmaxargˆ1

1kt

T

tSk

xpS

T

ttiikt xbpxp

1

)()|( which

logtake

Page 21: Speaker Identification using          Gaussian Mixture Model

2000/05/03 21

Experimental Evaluation

A. Performance Evaluation

,,,,, 21

1

21 TT

Segment

T xxxxx

e.g. frame rate = 10ms, T = 500

the length of a test utterance = 5 seconds

,,,,, 2

2

121 T

Segment

TT xxxxx

Page 22: Speaker Identification using          Gaussian Mixture Model

2000/05/03 22

% correct identification =

# of correctly identified segments

total # of segments

×100

Page 23: Speaker Identification using          Gaussian Mixture Model

2000/05/03 23

C. Algorithmic Issues

1. Model Initialization :

-- Use SI,context dependent subword HMM’s

mean and their global variance.

-- Randomly choose 50 vectors for initial

model mean, and an identity matrix for the

starting covariance matrix

Page 24: Speaker Identification using          Gaussian Mixture Model

2000/05/03 24

2. Variance Limiting :

When training a nodal variance GMM

the magnitude of variance

so, give the constraint

2min

2

2min

2

2min

22

i

iii if

if

The min variance, is determined empirically.2min

Page 25: Speaker Identification using          Gaussian Mixture Model

2000/05/03 25

3. Model Order :

I. Performance versus model order.

1,2,4,8,16,32,64

Page 26: Speaker Identification using          Gaussian Mixture Model

2000/05/03 26

II. Performance for different

amounts of training data

and model orders

III. Performance versus

model order for trained

with 30,60,and 90s of

speech.

Page 27: Speaker Identification using          Gaussian Mixture Model

2000/05/03 27

4. Spectral Variability Compensation :

1) Frequency Warping :

Nfff

fff

minmax

min'

Nf : original Nyquist frequency

Page 28: Speaker Identification using          Gaussian Mixture Model

2000/05/03 28

2) Spectral Shape Compensation :

Assumption :

ChannelSpeaker Signal Processing

f

Frequency response

mel-cepstral feature vector

hxz

Page 29: Speaker Identification using          Gaussian Mixture Model

2000/05/03 29

‧mean normalization for T.I. channel filter (CMS)

T

ttzT

m1

1 mzz tcompt

‧use “channel invariant” feature (delta-cepstral)

Page 30: Speaker Identification using          Gaussian Mixture Model

2000/05/03 30

5. Large Population Performance :