Perceptual WPT and time-adaptive level thresholding based enhancement of degraded speech

Perceptual WPT and time-adaptive level thresholding based enhancement of degraded speech Presented by Nitesh Kumar Chaudhary Department of Electronics & Communication Engineering The LNM Institute Of Information Technology, Jaipur

Under the Supervision of Dr. Navneet upadhyay

Why speech enhancement ?...The presence of noise in speech can significantly reduce the intelligibility of speech and degrade automatic speech recognition performance.Reduction of noise has become an important issue in speech signal processing system, such as speech coding and speech recognition system.

(a) Additive acoustic noise - such as the noise added to the speech signal when recorded in an environment with noticeable background noise, like in an aircraft cockpit.(b) Acoustic reverberation - results from the additive effect of multiple reflections of an acoustic signal.(c) Convolutive channel effects - resulting in an uneven or band-limited response, can result when the communication channel is not modeled effectively for the channel equalizer to remove the channel impulse response.

.

(d) Electrical interference

(e) Codec distortion - distortion caused by the coding algorithm due to compression

(f) Distortion introduced by recording apparatus - poor response of microphone

Keywords: Perceptual Wavelet packet transform (PWPT), Time adaptive Thresholding, TEO, Probability of detection Pd and false alarm Pf, Masking.

Block Diagram

Perceptual Wavelet Packet Transform :The Wavelet Packet Transform (WPT) is one such time frequency analysis tools. It is a transform that brings the signal into a domain that contains both time and frequency information.

In wavelet analysis, a signal is split into an approximation and a detail. The approximation is then itself split into a second-level approximation and detail, and the process is repeated.

In the corresponding Perceptual wavelet packet situation, each detail coefficient vector is also decomposed into two parts using the same approach as in approximation vector splitting and 17 critical bands are selected because for speech with 8 kHz sampling rate, 17 critical bands are required to cover the entire range of frequency

TEO & level dependent thresholdingTEO is powerful non-linear operator which has been successfully used in various speech applications, TEO can then be used to estimate the second moment angular bandwidth of a signal and the moments of a signal duration and that of its spectrum. TEO can determine the energy functions of quite complicated functions For a given band limited signal, TEO introduced by Kaiser is given by

The time adaptive threshold selection for wavelet coefficients has been computed, which takes care of varying noise time into account.

For a selected band, mask is obtained by

The voice activity shape V(n) is calculated by

Masking Construction:

Time adaptive threshold calculation :

Level 3Level 3, node by node denoising

EvaluationTo verify the effectiveness of the proposed algorithms, we compared the speech detection and false-alarm probabilities

The proposed methods are all evaluated by receiver operating characteristic (ROC) curves which show discriminative properties of VAD between noise-only and noisy speech frames in terms of the Probability of Correct detection (Pd) and Probability of false-alarm (Pf) such that

Wavelet Filter type (filter Length)Probability Of Correct Detection (Pd %)Probability Of False Alarm (Pf %)Computation time(CP)Daubechies 286.415.62.872 sDaubechies 489.311.72.884 sDaubechies 891.89.23.023 sDaubechies 1094.35.73.074 sDaubechies 1294.55.53.898 sDaubechies 1494.85.23.899 s

References :Shi-Huang Chen, HsinTe Wu, Yukon Chang and T.K. Truong Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator in Pattern Recognition Letters 28 (2007) 13271332.Daubechies, I. (1992), Ten lectures on wavelets, CBMS-NSF conference series in applied mathematics, SIAM Ed.D. L. Donoho, I. M. Johnstone, Ideal Spatial Adaptation via Wavelet Shrinkage, Biometrika, vol. 81, pp. 425-455, 1994.S. Mallat, A theory for multiresolution signal decompo-sition: The wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674693, July 1989.M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proc. IEEE ICASSP, Apr. 1979, pp. 208211.Johnstone, I.M., Silverman, B.W., 1997. Wavelet threshold estimators for data with correlated noise. J. Roy. Stat. Soc. B 59, 319351.G. David Forney, Jr., Exponential error bounds for erasure, list, and decision feedback schemes, Information Theory, IEEE Transactions on, vol. 14, no. 2, pp. 206220, Mar 1968.

***********

Perceptual WPT and time-adaptive level thresholding based enhancement of degraded speech

Documents

speech detection

speech coding

intelligibility of speech

level approximation

node denoising level

acoustic signal

speech recognition system

enhancement of degraded