Direct-to-reverberant ratio estimation using a null-steered beamformer James Eaton, Alastair H. Moore, Patrick A. Naylor, and Jan Skoglund ICASSP, Brisbane, Australia 22 nd April 2015
Direct-to-reverberant ratio estimation using a null-steered beamformer
James Eaton, Alastair H. Moore, Patrick A. Naylor, and Jan Skoglund
ICASSP, Brisbane, Australia
22nd April 2015
Introduction • Motivated by hands-free communications
and video-conferencing applications
• Speech enhancement algorithm performance can be improved if the level of reverberation relative to the speech is known
• Direct-to-reverberant ratio (DRR) is the ratio of sound energy arriving directly from source to that which arrives after being reflected
• We wish to estimate the DRR blindly from noisy reverberant speech
ICASSP 2015 - James Eaton et al.
100 200 300 400 500 600 700 800 900 1000−11
−8.5
−6
−3.5
−1
1.5
4
6.5
9
11.5
5.69
5.535.55
5.936.096.52
6.97
7.61
8.7
3.4
3.353.34
3.553.673.83
3.83
4.36
5.65
7.83
2.56
2.592.45
2.622.5
2.84
2.68
3.01
3.84
6.14
2.02
2.152.04
2.192.172.17
2.19
2.36
3.06
5.13
1.84
1.97
1.92
1.941.991.97
1.89
2.06
2.59
4.4
1.81
1.81
1.81
1.921.91.86
1.74
1.89
2.41
4.1
1.67
1.91.84
1.851.77
1.82
1.73
1.77
2.15
3.75
1.62
1.791.79
1.81.8
1.78
1.67
1.77
2.12
3.59
1.59
1.79
1.73
1.691.791.79
1.64
1.78
1.95
3.53
GT T60 (ms)
GT
DR
R (d
B)
Negative side variance for GT DRR (dB) vs. GT T60 (ms)
Effect of DRR and T60 on speech
TIMIT/TRAIN/DR1/FCJF0/SI648.WAV
“A sailboat may have a bone in her teeth one
minute and lie becalmed the
next”
ICASSP 2015 - James Eaton et al.
Outline • Approach • Acoustic model • Method • Experiments • Results • Conclusion
ICASSP 2015 - James Eaton et al.
Approach • Where the acoustic impulse response (AIR) is available,
the DRR can be estimated directly (e.g Mosayyebpour 2012)
• Where the AIR is not available, DRR must be estimated from the reverberant speech
• Increasingly devices have multiple microphones • Inspired by beamforming approach toT60 estimation
(Dumortier 2014), we seek to exploit spatial cues to estimate DRR
ICASSP 2015 - James Eaton et al.
Acoustic model • Reverberant signal at the m-th microphone
ymptq “ hmptq ˚ sptq ` vmptq, (1)
s⌘m “≥ |hd,mptq|2dt≥ |hr,mptq|2dt . (3)
hmptq “ hd,mptq ` hr,mptq, (2)
• Fullband DRR, is defined as
• Acoustic impulse responses for direct sound and reverberation
s⌘m “≥ |hd,mptq|2dt≥ |hr,mptq|2dt . (3)
ICASSP 2015 - James Eaton et al.
Acoustic model • Use a null-steered beamformer to remove the direct path
from the noisy reverberant speech signal
⌘̄ =E{y2m(t)}
E{hr,m ⇤ s2(t)} � 1
• Derive DRR from comparison of the power ratios of the reverberant signal obtained from the beamformer and the total power arriving at a microphone. i.e.
• Account for the frequency dependent gain of the beamformer, and the impact of noise
ICASSP 2015 - James Eaton et al.
Method overview
• Integrate across frequency to give fullband DRR
Total signal at microphone m Noise at m-th microphone
Beamformer gain
Beamformer output Noise at
beamformer output
⌘mpj!q “ Et|Dmpj!q|2uEt|Rpj!q|2u . (15)
⌘mpj!q “ Et|Ympj!q|2u ´ Et|Vmpj!q|2u1
G2pj!q pEt|Zypj!q|2u ´ Et|Zvpj!q|2uq ´ 1.
(16)
s⌘m “ 1
!2 ´ !1
ª !2
!1
⌘mpj!q d!, (17)
• Frequency dependent DRR
ICASSP 2015 - James Eaton et al.
Beamformer output • In frequency domain define microphone signals as
vectors
Zpj!q “ pwpj!qqTypj!q, (5)
wpj!q “ rW0pj!q,W1pj!q, . . . ,WM´1pj!qsT
ypj!q “ rY0pj!q, Y1pj!q, . . . , YM´1pj!qsT
wpj!q “ rW0pj!q,W1pj!q, . . . ,WM´1pj!qsT• Given beamformer weights where
• The beamformer output is therefore
ICASSP 2015 - James Eaton et al.
Beamformer gain • The beam pattern of the beamformer is
xpj!,⌦q “ rX0pj!,⌦q, X1pj!,⌦q, . . . , XM´1pj!,⌦qsT .
Bpj!,⌦q “ pwpj!qqT xpj!,⌦q, (6)
G2pj!q һ
⌦|Bpj!,⌦q|2 d⌦. (12)
xpj!,⌦q “ rX0pj!,⌦q, X1pj!,⌦q, . . . , XM´1pj!,⌦qsT .• where is the signal at the m-th microphone defined as
• So for an isotropic sound field the beamformer gain is
ICASSP 2015 - James Eaton et al.
Frequency domain signal
Ympj!q “ Dmpj!q ` Rmpj!q ` Vmpj!q (8)
Dmpj!q “ Hm,dpj!qSpj!q,Direct path
Rmpj!q “ Hm,rpj!qSpj!q.
Reverberation
Noise
11
ICASSP 2015 - James Eaton et al.
Beamformer output • The beamformer output in the frequency domain
Zypj!q “ Zdpj!q ` Zrpj!q ` Zvpj!q, (9)
Zdpj!q “ pwpj!qqTdpj!q,Zrpj!q “ pwpj!qqT rpj!q,Zvpj!q “ pwpj!qqTvpj!q,
dpj!q “ rD0pj!q, D1pj!q, . . . , DM´1pj!qsT ,
rpj!q vpj!q
• where
• and
• and similarly for and
ICASSP 2015 - James Eaton et al.
Null-steered beamformer • Point a null in the
direction of arrival of direct path
wpj!qZdpj!q “ 0
• i.e. choose such that
ICASSP 2015 - James Eaton et al.
Null-steered beamformer output
• Assuming the reverberation is isotropic
Et|Zrpj!q|2u “ G2pj!qEt|Rpj!q|2u, (11)
Zypj!q “ Zrpj!q ` Zvpj!q. (9)
Et|Rpj!q|2u “ Et|Rmpj!q|2u @m “ 1 : M, (10)
G2pj!q һ
⌦|Bpj!,⌦q|2 d⌦. (12)
• Beamformer output reduces to
• Assume that the reverberant energy is the same at all microphones
• where
ICASSP 2015 - James Eaton et al.
Estimating reverberant and direct signal power
Et|Rpj!q|2u “ 1
G2pj!q`Et|Zypj!q|2u ´ Et|Zvpj!q|2u
˘.
(13)
• Estimate reverberation from beamformer output by subtracting expected value of noise
Et|Dmpj!q|2u “ Et|Ympj!q|2u ´ Et|Vmpj!q|2u´Et|Rpj!q|2u. (14)
• Estimate direct path power at m-th microphone by subtracting estimates of noise and reverberation (assuming reverberation power equal at all sensors)
ICASSP 2015 - James Eaton et al.
Method
• Integrate across frequency to give fullband DRR
Total signal at microphone m Noise at m-th microphone
Beamformer gain
Beamformer output Noise at
beamformer output
⌘mpj!q “ Et|Dmpj!q|2uEt|Rpj!q|2u . (15)
⌘mpj!q “ Et|Ympj!q|2u ´ Et|Vmpj!q|2u1
G2pj!q pEt|Zypj!q|2u ´ Et|Zvpj!q|2uq ´ 1.
(16)
s⌘m “ 1
!2 ´ !1
ª !2
!1
⌘mpj!q d!, (17)
• Frequency dependent DRR
ICASSP 2015 - James Eaton et al.
Experiments • Room impulse responses
simulated using image-source method – 3 rooms: 54, 72 and 90 m2
– 9 T60s: 0.2-1.0 seconds in 0.1 second increments
– 6 source distances: {0.05, 0.1, 0.5, 1, 2, 3} metres
• Source perpendicular to array
• 4 random positions/rotations within each room ICASSP 2015 - James Eaton et al.
Experiments 48 clean speech files from TIMIT database for each arrangement
Experiment 1 • White Gaussian noise added independently to each
microphone at 10, 20 and 30 dB SNR • Noise estimate at microphone 1 and beamformer output is
– oracle – assumed to be 0
• Baseline is [Jeub2011] which uses spatial coherence
Experiment 2 • Noise added at 20 dB SNR • Noise at microphone 1 and beamformer output is
– known with bias (error)
ICASSP 2015 - James Eaton et al.
Results – experiment 1
ICASSP 2015 - James Eaton et al.
Results – experiment 1
ICASSP 2015 - James Eaton et al.
Results – experiment 1
ICASSP 2015 - James Eaton et al.
Results – experiment 1
ICASSP 2015 - James Eaton et al.
Results – experiment 1
ICASSP 2015 - James Eaton et al.
Results – experiment 2
ICASSP 2015 - James Eaton et al.
Conclusion • DRR estimation error within +/- 3dB over -5 to +5 dB
range • Can be extended to >2 channels • Can provide frequency dependent DRR • Not limited to speech • Requires a good estimate of the background noise
ICASSP 2015 - James Eaton et al.