Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and Hao-Teng Fan Wen-Yi Chu Department of Computer Science & Information Engineering National Taiwan Normal University
12
Embed
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform
for Robust Speech RecognitionJeih-weih Hung, Member, IEEE, and Hao-Teng Fan
Wen-Yi Chu
Department of Computer Science & Information Engineering
National Taiwan Normal University
2
Outline
Introduction
Subband Feature Statistics Normalization
Method
Experimental Setup
Experimental Results And Discussions
Concluding Remarks And Feature Works
Introduction
• This letter proposes a novel scheme that applies feature statistics normalization techniques for robust speech recognition.
• Partially motivated by the above observations, we propose decomposing the feature stream into subband streams and then performing the normalization process on some or all of the subband streams separately. The new feature stream is reconstructed by properly integrating all substreams.
• In particular, the above decomposition and reconstruction procedures are based on the well-known discrete wavelet transform (DWT).
• Given that the frame rate of is in Hz, and that is within the modulation spectral band , the band range of the subband stream can be approximately represented as
• If MVN is selected as the normalization method, then the relationship between and is
• If HEQ is selected as the normalization method, then the relationship between and is
• Finally, we reconstruct the new feature stream for the utterance from the updated subband streams together with the other unchanged streams using the -level inverse discrete wavelet transform (IDWT), as depicted on the right side of Fig. 1.
• In SB-MVN, the streams corresponding to different subbands have different target means and variances. A similar condition holds for SB-HEQ : the streams for different subbands employ different target distribution functions.
• In the proposed methods, more subbands with a narrower bandwidth are at the lower frequencies.
• Due to the down-sampling operation in DWT, the total number of data points of all of the subband streams is approximately equivalent to that of the original stream.
7
]}[~{ nxl
)1( L
Experimental Setup
• Each feature sequence for each utterance in both the training and testing sets is decomposed into L subband streams. For each subband, the features of all of the utterances in the training set are used to estimate the required target statistics, which will be used for each utterance in the training and testing sets.
• The parameter L is preliminarily set to 4, which indicates that a three-level DWT is performed, and the frequency ranges for the four octave subband streams are approximately , ,
and , respectively.
8
]Hz25.6 ,0[ ]12.5Hz Hz,25.6[
]25Hz 12.5Hz,[ ]50Hz 25Hz,[
Experimental Results And Discussions(1/3)
• The results in Fig. 2 indicate that, all the normalization methods provide significant accuracy improvement for all noise types.
9
Experimental Results And Discussions (2/3)
• These results are somewhat consistent with the observation in past research that the modulation frequency components between 1 Hz and 16 Hz are particularly important for speech recognition.
• These results imply that, given a fixed number of subbands, placing more subbands in lower frequencies is more helpful in the proposed methods.
10
Experimental Results And Discussions(3/3)
• 11
11
Concluding Remarks And Feature Works
• In this letter, we propose performing a normalization process on the subband feature streams and show that the subband MVN and HEQ are superior to the conventional full-band MVN and HEQ.
• In future works, we will integrate other normalization techniques such as HOCMN and CSN in the subband processing scheme to determine if better performance can be achieved.
• Besides, we will apply other types of wavelet functions in the DWT and IDWT processes of our approach to investigate if a different analysis/synthesis operation will influence the recognition accuracy.