American Journal of Computer Science and Engineering 2015; 2(5): 32-37 Published online August 30, 2015 (http://www.openscienceonline.com/journal/ajcse) A Hybrid Model of MFCC/MSFLA for Speaker Recognition Majida Ali Abed 1 , Hamid Ali Abed Alasadi 2 1 College of Computers Sciences & Mathematics, University of Tikrit, Tikrit, Iraq 2 Computers Sciences Department, Education for Pure Science College, University of Basra, Basra, Iraq Email address [email protected] (M. A. Abed), [email protected] (H. A. A. Alasadi) To cite this article Majida Ali Abed, Hamid Ali Abed Alasadi. A Hybrid Model of MFCC/MSFLA for Speaker Recognition. American Journal of Computer Science and Engineering. Vol. 2, No. 5, 2015, pp. 32-37. Abstract In this paper, speaker recognition system is optimized based on one of Swarm Intelligence Algorithm called Modified Shuffle Frog Leaping Algorithm (MSFLA) with Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) feature extraction approach. In this algorithm Search has been applied on speaker recognition systems and voice. Thus by applying this algorithm, the process of speaker recognition is optimized by a fitness function by matching of voices being done on only the extracted optimized features produced by the MSFLA. The recognition accuracy for various noise conditions (white Gaussian noises, car-noises and B-noises) with same dataset are 94.02%, 96.78% and 84.33%, respectively, using a Hybrid model of MFCC/MSFLA. Keywords Speaker Recognition, Mel Frequency Cepstral Coefficients (MFCCs), Modified Shuffled Frog Leaping Algorithm (MSFLA) 1. Introduction Speaker recognition systems became the topic of research in the early 1970’s [1]. Some of the first studies of speaker recognition were published in 1971, which used feature extraction technique included, pitch contours [2], Linear Prediction (LP), Cepstral analysis, linear prediction error energy and autocorrelation coefficients .Current speaker recognition research depend on the Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) are the most common short-time feature extraction approaches [3]. Speaker recognition includes speaker identification or speaker verification based on his/her voice in the form of speech. Speech signal carries information about speech message, speaker and also the environment of recording. For speaker recognition, speech data from a speaker is collected and is used to develop a model for capturing the speaker specific information. For text-independent speaker recognition the speech data is usually of about one minute duration. The model speaker is divided two models [4]. (1). Statistical model like a Gaussian Mixture Model, Hidden Markov Model, Support Vector Machines (SVM) and Vector Quantization (VQ). (2). Neural network model like Feed forward Auto associative network Now these two models are used as classification methods in speaker recognition based by applying the evolutionary algorithms such as genetic algorithms and genetic programming, Swarm Intelligence (SI) algorithms such as Ant Colony Optimization (ACO), Bee Colony Optimization (BCO), Cat Swarm Optimization (CSO), Shuffled Frog Leaping Algorithm (SFLA), and Cuckoo Search Algorithm (CSA). The process of Speaker Recognition is optimized by a fitness function of these algorithms by matching of voices being done on only the extracted optimized features produced by the Swarm Intelligence (SI) algorithms [5, 6]. In Our paper we used Modified Shuffled Frog Leaping Algorithm (MSFLA). Our paper is prepared as, Section 2; we discuss the principle of speaker recognition, Section 3, features extraction used in this paper. In Sections 4 and 5, the principle of MSFLA and the speaker recognition system using the MSFLA are described, respectively. The performance of the recognition systems based on principle of speaker recognition and system features is evaluated, and the
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
American Journal of Computer Science and Engineering 2015; 2(5): 32-37
Published online August 30, 2015 (http://www.openscienceonline.com/journal/ajcse)
A Hybrid Model of MFCC/MSFLA for Speaker
Recognition
Majida Ali Abed1, Hamid Ali Abed Alasadi
2
1College of Computers Sciences & Mathematics, University of Tikrit, Tikrit, Iraq 2Computers Sciences Department, Education for Pure Science College, University of Basra, Basra, Iraq
35 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition
Figure (3). Process of our proposed Speaker Recognition.
5. Voices Speaker Matching
After the feature extraction stage we obtained stored
extracted features voice of speakers and these extracted
features voice must be matched with input voice’s features.
We used relationship between them, when the extracted
features nearby to the stored features will be the one that will
be matched. To evade the voice matching in all stage of our
system especially when we have un - aboveboard speaker, a
basis small value is used to correct un-aboveboard or
abandonment a speaker which stipulates a probability ratio,
which will denote the amount of match of speaker
recognition. Then the voice will either be accepted or
disallowed. Acceptance means that the speaker is aboveboard
as the voice is matched otherwise it will be disallowed. The
matching between the input voice and the database voice gets
when the matched voice will have a high relationship
otherwise a low value below the threshold is mistreated,
hence the speaker is not permitted the admittance. In our
paper text dependent speaker recognition is used, in which
the conscription and test safety codes are same [17]. The
following Figure (3) explains the process of Text
Independent Speaker Recognition using Modified Shuffled
Frog Leaping Algorithm.
6. Simulation and Results
In this section, they described the Simulation by using
MATLAB, in order to simulate it and discussed, first explain
the database of our system contains different enunciation of
40 different speakers, both male and female speakers (as an
examples in Figure (4)), and each speaker has expressed 5
different sentences.
(a)
(b)
Figure (4). Speaker Signal examples (a) Male (b) Female.
American Journal of Computer Science and Engineering 2015; 2(5): 32-37 36
The database is required the extracted features of the user
be relevant to different enunciation. In our work the Mel
Frequency Cepstral Coefficients (MFCC) is the popular
acoustic features used in speech recognition system for
different speech data. The extracted feature database of the
enunciation is made using MFCC for making a hearty speech
recognizer for different users, and for efficient working of the
MSFLA. The features extracted are accessed by the MSFLA
to search out the best match. The enunciation is added with
different types of noise (white Gaussian noises, car-noises
and B-noises) the features of the signal with added noise are
extracted and the MSFLA discoveries optimally the best
match for the features extracted with admiration to the
feature database, and shows the result for best match. The
obtained results of the recognition accuracy are found to be
best using MFCC features with MSFLA for various noise
conditions using same dataset are as below in Figure (5). The
recognition accuracy for added white Gaussian noises, car-
noises and B-noises are 94.02%, 96.78% and 84.33%,
respectively.
Figure (5). Simulation results for different types of noises.
7. Conclusion
Our paper is based on one of Swarm Intelligence
Algorithm called Modified Shuffle Frog Leaping Algorithm
(MSFLA). The aim of this algorithm use Biometrics is to
identify an individual as per their some special characteristics
as voice. In this MSFLA Search has been applied on speaker
recognition systems and voice. Thus by applying this
algorithm, the process of speaker recognition is optimized by
a fitness function by matching of voices being done on only
the extracted optimized features produced by the MSFLA.
The recognition accuracy is found to be best using a hybrid
model of MFCC/MSFLA (MFCC features with MSFLA) for
various noise conditions. This work addresses the hybrid
model of MFCC/MSFLA as a system reliability optimization
with a multi-criteria approach provided useful insights into
patterns of interaction among articulatory-acoustic feature
dimensions in the further work.
References
[1] D. Ververidis, C. Kotropoulos, “Gaussian mixture modeling by exploiting the mahalanobis distance”, IEEE transactions on signal processing, Vol. 56, No. 7, July 2008.
[2] K. Sri Rama Murty and B. Yegnanarayana, “Combining evidence from residual phase and MFCC features for speaker recognition”, IEEE Signal Processing Letters, vol 13, no 1, Jan. 2006.
[3] S.R.M. Prasanna, S.G. Cheedella, B. Yegnanarayana, “Extraction of speaker-specific excitation information from linear prediction residual of speech”, Speech Communication, Vol. 48, Issue 10, October 2006.
[4] S. Chakroborty, A. Roy, S. Majumdar, G. Saha, “Capturing Complementary Information via Reversed Filter Bank and Parallel Implementation with MFCC for Improved Text-Independent Speaker Identification”, International conference on Computing theory and applications, March 2007.
[5] Y. Liu, M. Russell, M. Carey,” The Role of Dynamic Features in Text-Dependent and Independent Speaker Verification”, IEEE international conf. on acousto. Speech and signal processing (ICASSP), Vol. 1, May 2006.
[6] E. Elbeltagi, T. Hegazy, and D. Grierson, “Comparison among five evolutionary based optimization algorithms,” Advanced Engineering Informatics, Vol. 19, Jan. 2005.
[7] D. A. Reynolds, “Speaker identification and verification using Gaussian mixture models,” Speech Comm., vol. 17, Aug. 1995.
[8] Chu, W. C., "Speech Coding Algorithms'', John Wiley & Sons, Vol.4, USA. 2003.
37 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition
[9] S. P. Kishore and B. Yegnanarayana, “Speaker verification Minimizing the channel effects using auto associative neural network models,” in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, Istanbul, 2000.
[10] M. Shajith Ikbal, Hemant Misra, and B. Yegnanarayana,“Analysis of auto associative mapping neural networks,” in Int. Joint Conf. on Neural Networks,Washington, USA, 1999.
[11] B.Wildermoth and K. K. Paliwal. Use of voicing and pitch information for speaker recognition. In Use of Voicing and Pitch Information for Speaker Recognition, 2000.
[12] Eusuff, M.M. and Lansey, K.E. ‘Optimization of water distribution network design using the shuffled frog leaping algorithm’, Journal of Water Resources Planning andManagement, Vol. 129, No. 3, 2003.
[13] Taher Niknam, Ehsan Azad Farsani, A hybrid self-adaptive particle swarm optimization and modified shuffled frog leaping algorithm for distribution feeder reconfiguration , Engineering Applications of Artificial Intelligence, 2010.
[14] B. Amiri, M. Fathian, A. Maroosi, Application of shuffled frog-leaping algorithm on clustering, Journal of International Advanced Manufacturing Technology, Vol.45, 2009.
[15] X. H. Luo, Y. Yang, and X. Li, “Modified shuffled frog-leaping algorithm to solve traveling salesman problem,” Journal of Communications, Vol. 30, Jul. 2009.
[16] A. Khorsandi, A. Alimardani, B. Vahidi, and S.H. Hosseinian, “Hybrid shuffled frog leaping algorithm and Nelder–Mead simplexsearch for optimal reactive power dispatch,” IET Genetation Transmission & Distribution, Vol. 5, 2, 2011.
[17] H.B. Kekre, Vaishali Kulkarni, Prashant Gaikar and Nishant Gupta, “Speaker Identification using Spectrograms of Varying Frame Sizes”, International Journal of Computer Applications Vol. 50 - No. 20, July 2012.