Top Banner
1 April 2012 Dereverberation in the STFT and log mel-frequency feature domains Takuya Yoshioka
50

Dereverberation in the stft and log mel frequency feature domains

Aug 07, 2015

Download

Technology

Takuya Yoshioka
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dereverberation in the stft and log mel frequency feature domains

1 April 2012

Dereverberation in the STFT and

log mel-frequency feature domains

Takuya Yoshioka

Page 2: Dereverberation in the stft and log mel frequency feature domains

Dereverberation is necessaryfor many speech applications“ ”

Page 3: Dereverberation in the stft and log mel frequency feature domains

0

10

20

30

0.2 0.3 0.4 0.5 0.6

ASR (connected digit recognition)

T60 in seconds

Word

err

or

rate

in %

Page 4: Dereverberation in the stft and log mel frequency feature domains

ASR (LVCSR using WSJ-20K)

0

20

40

60

80

100

Clean training +MLLR Multi-style training

Word

err

or

rate

in

%

Page 5: Dereverberation in the stft and log mel frequency feature domains

Source separation

T60=0.3 s T60=0.5 s0

2

4

6

8

10

12S

NR

in

dB

Page 6: Dereverberation in the stft and log mel frequency feature domains

And others…

• Source localization

• Adaptive beamforming

• VAD

Page 7: Dereverberation in the stft and log mel frequency feature domains

Dereverberation is necessaryfor many speech applications“ ”

Page 8: Dereverberation in the stft and log mel frequency feature domains

Acoustic feature extraction process

STFT

| ・ |2

Mel FB

Log compression

DCT

Δ, ΔΔ

Microphone

Decoder

Page 9: Dereverberation in the stft and log mel frequency feature domains

Acoustic feature extraction process

STFT

| ・ |2

Mel FB

Log compression

DCT

Δ, ΔΔ

Microphone

Decoder

STFT coefficients

Fully benefit fromthe use of

microphone arrays

Page 10: Dereverberation in the stft and log mel frequency feature domains

Acoustic feature extraction process

STFT

| ・ |2

Mel FB

Log compression

DCT

Δ, ΔΔ

Microphone

Decoder

Power spectra

Easy to combinewith noise suppressors

Page 11: Dereverberation in the stft and log mel frequency feature domains

Acoustic feature extraction process

STFT

| ・ |2

Mel FB

Log compression

DCT

Δ, ΔΔ

Microphone

Decoder

Log mel-frequencyfeatures

Efficient for reducingthe acoustic mismatchbetween observations

and training data

Page 12: Dereverberation in the stft and log mel frequency feature domains

n: frame index

ny: corrupted vector

nx: clean vector

nx̂: estimate of xn

Notations

Page 13: Dereverberation in the stft and log mel frequency feature domains

Optimal estimation in the MMSE sense

nn xx̂ ),,|(p 1nnYY,|X pastyyx ndx

Page 14: Dereverberation in the stft and log mel frequency feature domains

nn xx̂ ),,|(p 1nnYY,|X pastyyx ndx

),,,|(p 11-nnnYX,|Y pastyyxy )(p nX x

×

Clean speech modelReverberation model

Generative approach (using Bayes rule)

Page 15: Dereverberation in the stft and log mel frequency feature domains

nn xx̂ ),,|(p 1nnYY,|X pastyyx ndx

),,,|(p 11-nnnYX,|Y pastyyxy )(p nX x

×

Clean speech modelReverberation model

Generative approach (using Bayes rule)

Page 16: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Linear prediction

VTS

Page 17: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Page 18: Dereverberation in the stft and log mel frequency feature domains

n: frame index

ny :corrupted complex-valued spectrum(consisting of 257 bins)

nx: clean complex-valued spectrum

nx̂: estimate of xn

Notations

Page 19: Dereverberation in the stft and log mel frequency feature domains

j

Xjn,jn,CNnX,nX )λ;0,(xf)Λ;(p x

Clean STFT coefficients: normally distributed

XJn,

Xn,1 λ,...,λ

XnP1,...,p

Xpn, σ,)(a 2

p

piωXpn,

XnX

jn,jea1

σλ

All-pole model

No model

Model Form Parameters

Clean PSD

Page 20: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Page 21: Dereverberation in the stft and log mel frequency feature domains

1-source 1-microphone case: multi-step LP

Δpjp,njp,jn,jn, ygxy

1,2,...njn, )(y

1,2,...njn, )(x

Page 22: Dereverberation in the stft and log mel frequency feature domains

1-source 1-microphone case: multi-step LP

Δpjp,njp,jn,jn, ygxy

1,2,...njn, )(y

1,2,...njn, )(x

Page 23: Dereverberation in the stft and log mel frequency feature domains

)xygδ (y

)Λ;y,,y,x|(yp

jn,jn,p jp,jn,

Rj1,j1,-njn,jn,YX,|Y past

Page 24: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Page 25: Dereverberation in the stft and log mel frequency feature domains

When model parameters are known

jn,p jp,jn,jn, ygyx ˆˆ

)ygyδ (x jn,p jp,jn,jn, ˆ

)Λ,Λ;y,y|(xp RXj1,jn,jn,YY,|X past

ˆˆ

Inverse filtering

Page 26: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Page 27: Dereverberation in the stft and log mel frequency feature domains

ML for parameter estimation

j n

RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λpast

Page 28: Dereverberation in the stft and log mel frequency feature domains

ML for parameter estimation

j n

RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λpast

×

)xygδ (y

)Λ;y,,y,x|(yp

jn,jn,p jp,jn,

Rj1,j1,-njn,jn,YX,|Y past

j

Xjn,jn,CN

nX,nX

)λ;0,(xf

)Λ;(p x

Page 29: Dereverberation in the stft and log mel frequency feature domains

ML for parameter estimation

j n

RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λpast

j n

Xjn,

2

p jp,njp,jn,Xjn, λ

|ygy|)log(λ

Page 30: Dereverberation in the stft and log mel frequency feature domains

ML for parameter estimation

j n

RXj1,j1,-njn,Y|YRX )Λ,Λ;y,y|(ylogp)Λ,L(Λpast

j n

Xjn,

2

p jp,njp,jn,Xjn, λ

|ygy|)log(λ

n

Xjn,

2

p jp,njp,jn,

ΛjR,

λ

|ygy|argminΛ

jR,ˆ

ˆ

If is knownXjn,λ̂

Page 31: Dereverberation in the stft and log mel frequency feature domains

Iterative optimization

Initializing ΛR

Inverse filtering

Updating ΛR

Convergent?

Updating ΛR

RΛ̂

RΛ̂

XΛ̂

Page 32: Dereverberation in the stft and log mel frequency feature domains

Why LP model for reverberation?Chain rule is applicable to derive the likelihood function

Page 33: Dereverberation in the stft and log mel frequency feature domains

Drawback

Non-minimum phase terms cannot be accurately modeled

“ ”Solution: using extra microphones

Page 34: Dereverberation in the stft and log mel frequency feature domains

Extensions

• Integration with source separation

• Integration with additive noise reduction

• Adaptive inverse filtering – Using an RLS-like algorithm

• Application to music signals– Using a clean source model accounting for

strong harmonic structures

• Exploiting prior knowledge on room properties

Page 35: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Page 36: Dereverberation in the stft and log mel frequency feature domains

n: frame index

ny :corrupted log mel-frequency feature(consisting of 24 coefficients)

nx: clean log mel-frequency feature

nx̂: estimate of xn

Notations

Page 37: Dereverberation in the stft and log mel frequency feature domains

k

Xk

XknNkXnX ),;(fπ)Λ;(p Σμxx

Clean features: pre-trained GMM

)Λk;|(p XnK|X xDenoted by

Page 38: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Page 39: Dereverberation in the stft and log mel frequency feature domains

Reverberation model

Early reflections

Late reverberation

Directsound

Page 40: Dereverberation in the stft and log mel frequency feature domains

Reverberation model

Early reflections

Late reverberation

HnY nX nR

Directsound

Page 41: Dereverberation in the stft and log mel frequency feature domains

Reverberation model

Early reflections

Late reverberation

*Clean speech RIR > 50ms

HnY nX nR

Directsound

Page 42: Dereverberation in the stft and log mel frequency feature domains
Page 43: Dereverberation in the stft and log mel frequency feature domains

Reverberation model

Early reflections

Late reverberation

),,(

))--exp(log(1

nn

nnnn

hrxg

hxrhxy

)),,(δ ()Λ;,|(p nnnRnnnRX,|Y hrxgyrxy

Directsound

Page 44: Dereverberation in the stft and log mel frequency feature domains

Reverberation model

)),,(δ ()Λ;,|(p nnnRnnnRX,|Y hrxgyrxy

);( RR-nnNR11-nnY|R ,f)Λ;,,|(p

pastΣβyryyr

∫×

Page 45: Dereverberation in the stft and log mel frequency feature domains

Reverberation model

),;(f)Λk;,,,,|(p X|Ykn,

X|Ykn,nNR11-nnnK,YX,|Y past

Σμyyyxy

),,(

))(,,(R

ΔnXk

Xkn

RΔn

Xk

X|Ykn,

hβyμg

μxhβyμGμ

R2RΔn

Xk

X|Ykn, )),,(( ΣhβyμGIΣ

Page 46: Dereverberation in the stft and log mel frequency feature domains

STFT domain

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Clean speech model

Reverberation model

Posterior distribution

Parameter estimation

Log mel-frequency feature domain

Page 47: Dereverberation in the stft and log mel frequency feature domains

pastY|RpK|Xpg

KR,|YpK,YX,|Y pastp

pastY|YpK,Y|Y pastp

pastYY,|Kp

K,YY,|X pastp K,YY,|R past

p

pastYY,|Xpkπ

Relationship among pdfs

Page 48: Dereverberation in the stft and log mel frequency feature domains

Connected digit recognition

• 1024-component GMM for VTS

• Clean complex back-end defined in Aurora2

• Evaluation data set consisting of 4004 reverberant utterances– Simulated data

– Impulse responses measured in a varechoic room

– Speaker-microphone distance = 3.5 m

– T60 = 0.2~0.6 sec

Page 49: Dereverberation in the stft and log mel frequency feature domains

0

5

10

15

20

25

30

35

0.2 0.3 0.4 0.5 0.6

Unprocessed

Dereverberated

Dereverberated(lower bound)

Word

err

or

rate

in %

T60 in seconds

Page 50: Dereverberation in the stft and log mel frequency feature domains

Concluding remarks

• Dereverberation can be performed in different domains

• Reverberation model must accounts for the strong statistical dependencies between consecutive observation frames