International Journal of Computer Applications (0975 – 8887) Volume 74– No.5, July 2013 31 Text Dependent Speaker Recognition using MFCC features and BPANN Praveen N Research Scholar Department of Electronics Audio and Image Research Laboratory Cochin University of Science and Technology Kochi, India 682022 Tessamma Thomas Professor Department of Electronics Cochin University of Science and Technology Kochi, India 682022 ABSTRACT Mel-Frequency Cepstral Coefficients are spectral feature which are widely used for speaker recognition and text dependent speaker recognition systems are the most accurate in voice based authentication systems. In this paper, a text dependent speaker recognition method is developed. MFCCs are computed for a selected sentence. The first 13 MFCCs are considered for each frames of duration 26ms and each coefficient is clustered to a 5 element cluster centres and finally to a form a 65 element speech code vector for the entire speech. The speech code is trained using a multi-layer perceptron backpropagation gradient descent network and the network is tested for various test patterns. The performance is measured using FAR, FRR and EER parameters. The recognition rate achieved is 96.18% for a cluster size of 5 in each coefficient. General Terms Speaker Recognition, Artificial Neural Networks, Clustering Keywords Mel-Frequency Cepstral Coefficients, False Acceptance Rate, False Rejection Rate, Equal Error Rate 1. INTRODUCTION Speaker verification/identification tasks are typically a pattern recognition problem. The important step in speaker recognition is the extraction of relevant features from the speech data that is used to characterize the speakers. There are two speaker dependent voice characteristics: high level and low level attributes [1][2]. Low level attributes related to the physical structure of the vocal tract are categorized as spectral characteristics, whereas high level attributes are prosody (pitch, duration, energy) or behavioral cues like dialect, word usage, conversation patterns, topics of conversation etc. There are two types of speaker recognition methods: text dependent and text independent. Text dependent speaker recognition method uses phoneme context information and hence high recognition accuracy is easily achieved. The text independent speaker recognition method does not require specially designed utterances and hence is user friendly. A general speaker recognition system consists of an enrolment phase and recognition phase. In the speaker enrolment phase, speech samples are collected from the speakers to train their models. The collection of enrolled models is also called a speaker database. In the identification phase, a test sample from an unknown speaker is compared against the speaker database. Both phases include the feature extraction step to extract the speaker dependent characteristics from speech.. 2. METHODOLOGY The acoustic analysis based on MFCC has proved good results in speaker recognition. Also MFCC has proved to be good in confrontation with different variation such as noise, prosody, intonation. In this paper, speech samples of a given text are recorded for 40 speakers. 13 MFCCs are computed for about 40-45 frames of voiced speech samples. Vector Quantisation (VQ) based K-means clustering is done for the entire MFCC with respect to the cluster index. The code vector is trained using a discriminative classifier, multilayer perceptron with gradient descent backpropagation ANN. The system is tested with another data set of test patterns. The performance of the system is measured using false acceptance rate (FAR), false rejection rate (FRR) and equal error rate (EER). The result is compared with minimum distance based classifier. The methodology adopted is depicted in fig. 1. In the training phase, speech is recorded and pre-processed. MFC coefficients are computed and clustered using K-means clustering method. The speech code vector thus formed is trained to recognise using backpropagation neural network and stored. In the testing phase, the test speech samples are pre-processed and MFC coefficients are extracted and clustered. The test patterns are given to the trained neural network, which gives a decision of presence or absence of the speech pattern in the database. 3. FEATURE EXTRACTION 3.1 Speech features According to Kinnunen [3], a vast number of features have been proposed for speaker recognition, such as: Spectral features Dynamic features Source features Supra segmental features High level features Spectral features involve short time speech spectrum description and represent the physical characteristics of the vocal tract. Hence spectral feature relates to the physical behavior/characteristics of the vocal tract.
9
Embed
Text Dependent Speaker Recognition using MFCC features …Text Dependent Speaker Recognition using MFCC features and BPANN Praveen N Research Scholar Department of Electronics Audio
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
International Journal of Computer Applications (0975 – 8887)
Volume 74– No.5, July 2013
31
Text Dependent Speaker Recognition using MFCC
features and BPANN
Praveen N Research Scholar
Department of Electronics Audio and Image Research Laboratory
Cochin University of Science and Technology
Kochi, India 682022
Tessamma Thomas Professor
Department of Electronics Cochin University of Science and Technology
Kochi, India 682022
ABSTRACT
Mel-Frequency Cepstral Coefficients are spectral feature
which are widely used for speaker recognition and text
dependent speaker recognition systems are the most accurate
in voice based authentication systems. In this paper, a text
dependent speaker recognition method is developed. MFCCs
are computed for a selected sentence. The first 13 MFCCs are
considered for each frames of duration 26ms and each
coefficient is clustered to a 5 element cluster centres and
finally to a form a 65 element speech code vector for the
entire speech. The speech code is trained using a multi-layer
perceptron backpropagation gradient descent network and the
network is tested for various test patterns. The performance is
measured using FAR, FRR and EER parameters. The
recognition rate achieved is 96.18% for a cluster size of 5 in