A noise-estimation algorithm for highly non-stationary environments
Post on 03-Jan-2016
44 Views
Preview:
DESCRIPTION
Transcript
1
A noise-estimation algorithm for highly non-stationary environments
Sundarrajan Rangachari, Philipos C. Loizou
Department of Electrical Engineering, University of Texas at Dallas, P.O. Box 830688, EC 33 Richardson, TX 75083-0688, USA
Presenter: Shih-Hsiang( 士翔 )
SPEECH COMMUNICATION Vol. 48(2), 2006
2
Reference Doblinger, G., 1995. Computationally efficient speech enhancement
by spectral minima tracking in subbands. Proc. Eurospeech 2, 1513–1516.
Hirsch, H., Ehrlicher, C., 1995. Noise estimation techniques for robust speech recognition. Proc. IEEE Internat. Conf. on Acoust. Speech Signal Process., 153–156.
Martin, R., 2001. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9 (5), 504–512.
Cohen, I., 2002. Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process. Lett. 9 (1), 12–15.
Hu, Y., Loizou, P., 2004. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. Speech Audio Process. 12 (1), 59–67.
3
Introduction In most speech-enhancement algorithms, it is make assumed
that an estimate of the noise spectrum is available It is critical for the performance of speech-enhancement algorithms
The noise estimate can have a major impact on the quality of the enhanced signal If the noise estimate is too low, annoying residual noise will be audible If the noise estimate is too high, speech will be distorted
The simplest approach is to estimate and update the noise spectrum during the silent segments of the signal Using a voice activity detection (VAD) algorithm It only work satisfactorily in stationary noise, not work well in more
realistic environments (non-stationary noise) Hence there is a need to update the noise spectrum
continuously over time
4
Proposed noise-estimation algorithmsCompute smooth speech power spectrumCompute smooth speech power spectrum
)()()( ndnxny Let the noisy speech signal in the time domain be denoted as
Noisy speech Clean speech Additive noise
The smoothed power spectrum of noisy speech is computed using the following first-order recursive equation
2),()1(),1(),( kYkPkP
Smooth power spectrum
Smoothing constant
Frame index Frequency index
5
END
),),
ELSE
1),1
11),
THEN ),1 IF
min
minmin
min
kP(λk(λP
,k))P(λk(P(λ,k)(λγPk(P
kP(λ,k)(λP
Proposed noise-estimation algorithmsTracking the minimum of noisy speechTracking the minimum of noisy speech
Local minimum of the noisy speech power spectrum
β and γ are constants which are determined experimentally
The look ahead factor β controls the adaptation time of the local minimum
6
Proposed noise-estimation algorithmsTracking the minimum of noisy speechTracking the minimum of noisy speech
Plot of noisy speech power spectrum and local minimum using (3) for a speech degraded by babble noise at 5 dB SNR at frequency bin k=5
7
),()1(),1(),( kIkpkp pp
Proposed noise-estimation algorithmsSpeech-presence probabilitypeech-presence probability
),(/),(),( min kPkpkSr
Let the ratio of noisy speech power spectrum and its local minimum be defined as
The power spectrum of noisy speech will be nearly equalto its local minimum when speech is absent
Smoothing constant
END
absentspeech 0),(
ELSE
presentspeech 1),(
THEN )(),( IF
kI
kI
kkλSr
The speech-presence probability, P(λ,k), is updated using the following first-order recursion
The above recursion implicitly exploits the correlation for speech presence in adjacent frames
8
Proposed noise-estimation algorithmsSpeech-presence probabilitypeech-presence probability
Top panel: Plot of estimated speech-presence probability based on the ratio Sr(λ,k)Bottom panel: spectrogram of the clean signal.
9
Proposed noise-estimation algorithmsComputing frequency-dependent smoothing constantsComputing frequency-dependent smoothing constants
),()1(),( kpk dds
Using the speech-presence probability estimate, we compute thetime-frequency dependent smoothing factor as follows
constant
Note that αs(λ ,k) take values in the range of αd ≤ αs(λ ,k) ≤ 1
Finally, the noise spectrum estimate is updated as
2),()),(1(),1(),(),( kYkkDkkD ss
10
Proposed noise-estimation algorithms
Plot of true noise spectrum and the estimated noise spectrum using our proposed method for a speech degraded by babble noise at 5 dB SNR and single frequency f = 250 Hz.
11
Minimum statistics (MS) (Martin, 2001)
Comparison with existing algorithmsMMinimum statistics (MS)inimum statistics (MS) (Martin, 2001)(Martin, 2001)
)(1
)(2),(),(
~
DM
DMkQkQ eq
eq
),(~
2)1(1),(min
kQDkB
eq
),()),(,(),(2 kPkQDBk eqN the power spectral densities of the noise signal
Equivalent degrees of freedom
2),()),(1(),1(),(),( kYkkPkkP
22 )1),(/),1((1
1),(
kkPk
N
12
Comparison with existing algorithmsMMinimum statistics (MS)inimum statistics (MS) (Martin, 2001)(Martin, 2001)
Comparison between the noise spectrum estimated using the proposed algorithm (thick line) and Martins (Martin, 2001) (dashed line) algorithm for a sentence corrupted by car noise (t < 1.8 s) followed by a sentence corrupted by multi-talker babble (t > 1.8 s).
13
Comparison with existing algorithmsContinuous minima tracking (Doblinger, 1995)ontinuous minima tracking (Doblinger, 1995) Continuous minima tracking (Doblinger, 1995)
2),()1(),1(),( kYkPkP
END
),),
ELSE
1),1
11),
THEN ),1 IF
kP(λk(λP
,k))P(λk(P(λ,k)(λγPk(P
kP(λ,k)(λP
N
NN
N
Drawback: the noise estimate increases whenever the noisy speech power increases
14
Comparison with existing algorithmsContinuous minima tracking (Doblinger, 1995)ontinuous minima tracking (Doblinger, 1995)
Top panel: Plot of true noise spectrum and estimated noise spectrum using the proposed method Bottom panel: Plot of true noise spectrum and estimated noise spectrum using (Doblinger, 1995)Arrows indicate regions where noise is overestimated.
15
Weighted average technique (Hirsch and Ehrliche , 1995)
Comparison with existing algorithmsWeighted average technique (Hirsch et al., 1995)Weighted average technique (Hirsch et al., 1995)
)1(ˆ)()1()(ˆ kNkXkN iii l-th frame
i-th subbandestimate noise magnitude
spectral magnitude
estimate valuesnegative
stop valuespositive)(ˆ)(:threshold
kNkX ii
It fails when there is a sudden increase in noise level. This will result in a situation where the noisy speech spectrum will never be smaller than the threshold, since the threshold is based on the past noise estimates already very low. Thus, the noiseestimate will not be updated if the noise power remains at that high level
16
Comparison with existing algorithmsWeighted average technique (Hirsch et al., 1995)Weighted average technique (Hirsch et al., 1995)
Comparison of estimated noise spectrum (f = 500 Hz) of proposed method (dashed line) with that of Hirsch and Ehrlicher (1995) (solid line) for a noisy speech of SNR 20 dB (t < 1.8 s) followed by a noisy speech of SNR 5 dB (t > 1.8 s).
17
Comparison with existing algorithmsMinima controlled Recursive Averaging (Cohen,2002)Minima controlled Recursive Averaging (Cohen,2002)
Minima controlled Recursive Averaging (MRCA) (Cohen,2002)
),(),(:),(0 lkDlkYlkH Given two hypotheses
),(),(),(:),(1 lkDlkXlkYlkH speech absence
speech presence
l-th frame
k-th subband
Let λd(k,l)=E[|D(k,l)|2] denote the variance of the noise in the k-th band2
0 ),()1(),(ˆ)1,(ˆ:),(' lkYlklklkH dddd ),(ˆ)1,(ˆ:),('1 lklklkH dd
speech absence
speech presenceSmoothing constant
Let p’(k,l)=p(H’1(k,l)|Y(k,l)) denote the conditional signal presence probability
2
2
),()],(~1[),(ˆ),(~
)),('1(]),()1(),(ˆ[),('),(ˆ)1,(ˆ
lkYlklklk
lkplkYlklkplklk
ddd
ddddd
),(')1(),(~ lkplk ddd where
18
Comparison with existing algorithmsMinima controlled Recursive Averaging (Cohen,2002)Minima controlled Recursive Averaging (Cohen,2002)
w
wif likYiblkS
2),()(),(
),()1()1,(),( lkSlkSlkS fss
)},(),1,(min{),( minmin lkSlkSlkS
0),(,
1),(,
),(
),(),(
'0
'1
min
lkIH
lkIH
lkS
lkSlkSratio
),()1()1,('ˆ),('ˆ lkIlkplkp pp
Let the local energy of the noisy speech be obtained by smoothing the magnitude squared of its STFT in time and frequency. In frequency, use a window function b whose length is 2w+1
In time, the smoothing is performed by a first order recursive averaging, given by
Track the minimum of the local energy
Speech presence is determined by the ratio between the local energy of the noisy speech and its minimum within a specified time window
The conditional signal presence probability calculated as follow
19
Comparison with existing algorithmsMinima controlled Recursive Averaging (Cohen,2002)Minima controlled Recursive Averaging (Cohen,2002)
The local minimum in (Cohen, 2002) was found by tracking the minimum of noisy speech over a search window spanning L frames, this has some drawbacks: The minimum is sensitive to outliers The minima tracking may lag by as many as 2L frames
In this paper The estimate of the noise spectrum in the proposed method is not
influenced by the minimum-search window the threshold used in our method for identifying speech presence
/absence regions is frequency dependent while that of Cohen (2002) is fixed for all frequencies
20
Experimental Combined with a Wiener-type speech-enhancement algorith
m (Hu and Loizou, 2004) Estimate the spectral gain function
),(),(
),(),(
kDkC
kCkG
k
)},(),,(),(max{),(2
kvDkDkYkC
where C(λ,k) is the estimated clean speech spectrum compute as follow
where v=0.001 is a small positive number
5SNR
20SNR
20SNR5-
1
)(
dB
dB
dB
max
0
s
SNRdb
k
μmax is the maximum allowable value of μ,which was set to 10
μ0=(1+4 μmax)/5
s=25/(μmax-1)
21
Experimental (cont.) Obtain the enhanced spectrum
Other parameters
αd=0.85, αp=0.2, β=0.8, γ=0.998, η=0.7
),(),(),( kGkYkX where X(λ,k) is the enhanced spectrum
2
1
5
2
2
)(
/FkMF
MFkLF
LFk
k
s
where LF and MF are the bins corresponding to 1 and3 kHz, and Fs is the sampling frequency
22
Experimental ResultSubjective evaluationSubjective evaluation Using formal listening tests
Single noiseSentences were degraded by either multi-talker babble noise or factory noise
Triplet noiseThree different noise types (multi-talker babble, factory noise, and white noise) appear in proper order without any pauses in the middle
The listeners were asked to select from the pair of stimuli presented the sentence which was more natural, easier to listen and free of artifacts
A preference score of 100% would indicate that listeners preferred the proposed method over the other methods all the time
23
Experimental ResultSubjective evaluationSubjective evaluation
due to the fact that proposed noise-estimation algorithm adapts quickly to the highly non-stationary environments
24
Mean squared error between the true noise spectrum and the estimated noise spectrum
Log-likelihood ratio (LLR) measure
autocorrelation matrix of the original (clean) speech frame
Experimental ResultObjective evaluationObjective evaluation
1
02
22
),(
),(),(1 M
k D
k D
k
kkD
MMSE
Txxx
Tyxy
LLR aRa
aRad 10log
true noise power spectrum
estimated noise power spectrum
total frame number
linear prediction coefficient vector of the original (clean) speech frame
linear prediction coefficient vector of the enhanced speech frame
The LLR is a spectral distance measure which mainly models the mismatch between the formants of the original and enhanced signals
25
Experimental ResultObjective evaluationObjective evaluation Segmental SNR
Ll
N
k
N
k
lkD
lkX
LSegSNR 2/
0
2
2/
0
2
),(
),(log
10
the set of frames that contain speech
26
Experimental ResultObjective evaluation (MSE)Objective evaluation (MSE)
The MSE results are not consistent with the preference outcomes, in that lower MSE values did not suggest better preference.
This indicates that the MSE measure might not be a reliable measure for assessing performance of noise-estimation algorithms. 1. this measure is sensitive to outlier values 2. it treats noise overestimation and noise underestimation errors the same
27
Experimental ResultObjective evaluation (LLR and SNR)Objective evaluation (LLR and SNR)
The segmental SNR values and the LLR values shown in Table 3 were found to be more consistent with the subjective evaluation results
28
Experimental Result
Panel A – Clean Speech Panel C – Martins (2001) Panel E - Proposed method Panel B – Noisy Speech Panel D – Cohen (2003)
29
Conclusions The noise estimate was updated continuously in every frame
using time–frequency smoothing factors calculated based on speech-presence probability in each frequency bin of the noisy speech spectrum The speech-presence probability was estimated using the ratio of noisy s
peech power spectrum to its local minimum The update of noise estimate was faster for very rapidly varyin
g non-stationary noise environments
top related