Direct-to-reverberant ratio estimation using a null ...

Direct-to-reverberant ratio estimation using a null-steered beamformer

James Eaton, Alastair H. Moore, Patrick A. Naylor, and Jan Skoglund

ICASSP, Brisbane, Australia

22nd April 2015

Introduction •  Motivated by hands-free communications

and video-conferencing applications

•  Speech enhancement algorithm performance can be improved if the level of reverberation relative to the speech is known

•  Direct-to-reverberant ratio (DRR) is the ratio of sound energy arriving directly from source to that which arrives after being reflected

•  We wish to estimate the DRR blindly from noisy reverberant speech

ICASSP 2015 - James Eaton et al.

100 200 300 400 500 600 700 800 900 1000−11

−8.5

−6

−3.5

−1

1.5

4

6.5

9

11.5

5.69

5.535.55

5.936.096.52

6.97

7.61

8.7

3.4

3.353.34

3.553.673.83

3.83

4.36

5.65

7.83

2.56

2.592.45

2.622.5

2.84

2.68

3.01

3.84

6.14

2.02

2.152.04

2.192.172.17

2.19

2.36

3.06

5.13

1.84

1.97

1.92

1.941.991.97

1.89

2.06

2.59

4.4

1.81

1.81

1.81

1.921.91.86

1.74

1.89

2.41

4.1

1.67

1.91.84

1.851.77

1.82

1.73

1.77

2.15

3.75

1.62

1.791.79

1.81.8

1.78

1.67

1.77

2.12

3.59

1.59

1.79

1.73

1.691.791.79

1.64

1.78

1.95

3.53

GT T60 (ms)

GT

DR

R (d

B)

Negative side variance for GT DRR (dB) vs. GT T60 (ms)

Effect of DRR and T60 on speech

TIMIT/TRAIN/DR1/FCJF0/SI648.WAV

“A sailboat may have a bone in her teeth one

minute and lie becalmed the

next”


Outline •  Approach •  Acoustic model •  Method •  Experiments •  Results •  Conclusion


Approach •  Where the acoustic impulse response (AIR) is available,

the DRR can be estimated directly (e.g Mosayyebpour 2012)

•  Where the AIR is not available, DRR must be estimated from the reverberant speech

•  Increasingly devices have multiple microphones •  Inspired by beamforming approach toT60 estimation

(Dumortier 2014), we seek to exploit spatial cues to estimate DRR


Acoustic model •  Reverberant signal at the m-th microphone

ymptq “ hmptq ˚ sptq ` vmptq, (1)

s⌘m “≥ |hd,mptq|2dt≥ |hr,mptq|2dt . (3)

hmptq “ hd,mptq ` hr,mptq, (2)

•  Fullband DRR, is defined as

•  Acoustic impulse responses for direct sound and reverberation

s⌘m “≥ |hd,mptq|2dt≥ |hr,mptq|2dt . (3)


Acoustic model •  Use a null-steered beamformer to remove the direct path

from the noisy reverberant speech signal

⌘̄ =E{y2m(t)}

E{hr,m ⇤ s2(t)} � 1

•  Derive DRR from comparison of the power ratios of the reverberant signal obtained from the beamformer and the total power arriving at a microphone. i.e.

•  Account for the frequency dependent gain of the beamformer, and the impact of noise


Method overview

•  Integrate across frequency to give fullband DRR

Total signal at microphone m Noise at m-th microphone

Beamformer gain

Beamformer output Noise at

beamformer output

⌘mpj!q “ Et|Dmpj!q|2uEt|Rpj!q|2u . (15)

⌘mpj!q “ Et|Ympj!q|2u ´ Et|Vmpj!q|2u1

G2pj!q pEt|Zypj!q|2u ´ Et|Zvpj!q|2uq ´ 1.

(16)

s⌘m “ 1

!2 ´ !1

ª !2

!1

⌘mpj!q d!, (17)

•  Frequency dependent DRR


Beamformer output •  In frequency domain define microphone signals as

vectors

Zpj!q “ pwpj!qqTypj!q, (5)

wpj!q “ rW0pj!q,W1pj!q, . . . ,WM´1pj!qsT

ypj!q “ rY0pj!q, Y1pj!q, . . . , YM´1pj!qsT

wpj!q “ rW0pj!q,W1pj!q, . . . ,WM´1pj!qsT•  Given beamformer weights where

•  The beamformer output is therefore


Beamformer gain •  The beam pattern of the beamformer is

xpj!,⌦q “ rX0pj!,⌦q, X1pj!,⌦q, . . . , XM´1pj!,⌦qsT .

Bpj!,⌦q “ pwpj!qqT xpj!,⌦q, (6)

G2pj!q “ª

⌦|Bpj!,⌦q|2 d⌦. (12)

xpj!,⌦q “ rX0pj!,⌦q, X1pj!,⌦q, . . . , XM´1pj!,⌦qsT .•  where is the signal at the m-th microphone defined as

•  So for an isotropic sound field the beamformer gain is


Frequency domain signal

Ympj!q “ Dmpj!q ` Rmpj!q ` Vmpj!q (8)

Dmpj!q “ Hm,dpj!qSpj!q,Direct path

Rmpj!q “ Hm,rpj!qSpj!q.

Reverberation

Noise

11


Beamformer output •  The beamformer output in the frequency domain

Zypj!q “ Zdpj!q ` Zrpj!q ` Zvpj!q, (9)

Zdpj!q “ pwpj!qqTdpj!q,Zrpj!q “ pwpj!qqT rpj!q,Zvpj!q “ pwpj!qqTvpj!q,

dpj!q “ rD0pj!q, D1pj!q, . . . , DM´1pj!qsT ,

rpj!q vpj!q

•  where

•  and

•  and similarly for and


Null-steered beamformer •  Point a null in the

direction of arrival of direct path

wpj!qZdpj!q “ 0

•  i.e. choose such that


Null-steered beamformer output

•  Assuming the reverberation is isotropic

Et|Zrpj!q|2u “ G2pj!qEt|Rpj!q|2u, (11)

Zypj!q “ Zrpj!q ` Zvpj!q. (9)

Et|Rpj!q|2u “ Et|Rmpj!q|2u @m “ 1 : M, (10)

G2pj!q “ª

⌦|Bpj!,⌦q|2 d⌦. (12)

•  Beamformer output reduces to

•  Assume that the reverberant energy is the same at all microphones

•  where


Estimating reverberant and direct signal power

Et|Rpj!q|2u “ 1

G2pj!q`Et|Zypj!q|2u ´ Et|Zvpj!q|2u

˘.

(13)

•  Estimate reverberation from beamformer output by subtracting expected value of noise

Et|Dmpj!q|2u “ Et|Ympj!q|2u ´ Et|Vmpj!q|2u´Et|Rpj!q|2u. (14)

•  Estimate direct path power at m-th microphone by subtracting estimates of noise and reverberation (assuming reverberation power equal at all sensors)


Method

•  Integrate across frequency to give fullband DRR

Total signal at microphone m Noise at m-th microphone

Beamformer gain

Beamformer output Noise at

beamformer output

⌘mpj!q “ Et|Dmpj!q|2uEt|Rpj!q|2u . (15)

⌘mpj!q “ Et|Ympj!q|2u ´ Et|Vmpj!q|2u1

G2pj!q pEt|Zypj!q|2u ´ Et|Zvpj!q|2uq ´ 1.

(16)

s⌘m “ 1

!2 ´ !1

ª !2

!1

⌘mpj!q d!, (17)

•  Frequency dependent DRR


Experiments •  Room impulse responses

simulated using image-source method –  3 rooms: 54, 72 and 90 m2

–  9 T60s: 0.2-1.0 seconds in 0.1 second increments

–  6 source distances: {0.05, 0.1, 0.5, 1, 2, 3} metres

•  Source perpendicular to array

•  4 random positions/rotations within each room ICASSP 2015 - James Eaton et al.

Experiments 48 clean speech files from TIMIT database for each arrangement

Experiment 1 •  White Gaussian noise added independently to each

microphone at 10, 20 and 30 dB SNR •  Noise estimate at microphone 1 and beamformer output is

–  oracle –  assumed to be 0

•  Baseline is [Jeub2011] which uses spatial coherence

Experiment 2 •  Noise added at 20 dB SNR •  Noise at microphone 1 and beamformer output is

–  known with bias (error)


Results – experiment 1












Conclusion •  DRR estimation error within +/- 3dB over -5 to +5 dB

range •  Can be extended to >2 channels •  Can provide frequency dependent DRR •  Not limited to speech •  Requires a good estimate of the background noise


Direct-to-reverberant ratio estimation using a null ...

Documents