-
Perceptual WPT and time-adaptive level thresholding based
enhancement of degraded speech Presented by Nitesh Kumar Chaudhary
Department of Electronics & Communication Engineering The LNM
Institute Of Information Technology, Jaipur
Under the Supervision of Dr. Navneet upadhyay
-
Why speech enhancement ?...The presence of noise in speech can
significantly reduce the intelligibility of speech and degrade
automatic speech recognition performance.Reduction of noise has
become an important issue in speech signal processing system, such
as speech coding and speech recognition system.
(a) Additive acoustic noise - such as the noise added to the
speech signal when recorded in an environment with noticeable
background noise, like in an aircraft cockpit.(b) Acoustic
reverberation - results from the additive effect of multiple
reflections of an acoustic signal.(c) Convolutive channel effects -
resulting in an uneven or band-limited response, can result when
the communication channel is not modeled effectively for the
channel equalizer to remove the channel impulse response.
.
-
(d) Electrical interference
(e) Codec distortion - distortion caused by the coding algorithm
due to compression
(f) Distortion introduced by recording apparatus - poor response
of microphone
Keywords: Perceptual Wavelet packet transform (PWPT), Time
adaptive Thresholding, TEO, Probability of detection Pd and false
alarm Pf, Masking.
-
Block Diagram
-
Perceptual Wavelet Packet Transform :The Wavelet Packet
Transform (WPT) is one such time frequency analysis tools. It is a
transform that brings the signal into a domain that contains both
time and frequency information.
In wavelet analysis, a signal is split into an approximation and
a detail. The approximation is then itself split into a
second-level approximation and detail, and the process is
repeated.
In the corresponding Perceptual wavelet packet situation, each
detail coefficient vector is also decomposed into two parts using
the same approach as in approximation vector splitting and 17
critical bands are selected because for speech with 8 kHz sampling
rate, 17 critical bands are required to cover the entire range of
frequency
-
TEO & level dependent thresholdingTEO is powerful non-linear
operator which has been successfully used in various speech
applications, TEO can then be used to estimate the second moment
angular bandwidth of a signal and the moments of a signal duration
and that of its spectrum. TEO can determine the energy functions of
quite complicated functions For a given band limited signal, TEO
introduced by Kaiser is given by
The time adaptive threshold selection for wavelet coefficients
has been computed, which takes care of varying noise time into
account.
-
For a selected band, mask is obtained by
The voice activity shape V(n) is calculated by
Masking Construction:
-
Time adaptive threshold calculation :
-
Level 3Level 3, node by node denoising
-
Level 4Level 4, node by node denoising
-
Level 5Level 5, node by node denoising
-
EvaluationTo verify the effectiveness of the proposed
algorithms, we compared the speech detection and false-alarm
probabilities
The proposed methods are all evaluated by receiver operating
characteristic (ROC) curves which show discriminative properties of
VAD between noise-only and noisy speech frames in terms of the
Probability of Correct detection (Pd) and Probability of
false-alarm (Pf) such that
-
Wavelet Filter type (filter Length)Probability Of Correct
Detection (Pd %)Probability Of False Alarm (Pf %)Computation
time(CP)Daubechies 286.415.62.872 sDaubechies 489.311.72.884
sDaubechies 891.89.23.023 sDaubechies 1094.35.73.074 sDaubechies
1294.55.53.898 sDaubechies 1494.85.23.899 s
-
References :Shi-Huang Chen, HsinTe Wu, Yukon Chang and T.K.
Truong Robust voice activity detection using perceptual
wavelet-packet transform and Teager energy operator in Pattern
Recognition Letters 28 (2007) 13271332.Daubechies, I. (1992), Ten
lectures on wavelets, CBMS-NSF conference series in applied
mathematics, SIAM Ed.D. L. Donoho, I. M. Johnstone, Ideal Spatial
Adaptation via Wavelet Shrinkage, Biometrika, vol. 81, pp. 425-455,
1994.S. Mallat, A theory for multiresolution signal decompo-sition:
The wavelet representation, IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 11, No. 7, pp. 674693, July 1989.M.
Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech
corrupted by acoustic noise, in Proc. IEEE ICASSP, Apr. 1979, pp.
208211.Johnstone, I.M., Silverman, B.W., 1997. Wavelet threshold
estimators for data with correlated noise. J. Roy. Stat. Soc. B 59,
319351.G. David Forney, Jr., Exponential error bounds for erasure,
list, and decision feedback schemes, Information Theory, IEEE
Transactions on, vol. 14, no. 2, pp. 206220, Mar 1968.
-
***********