A Hybrid Model of MFCC/MSFLA for Speaker

American Journal of Computer Science and Engineering 2015; 2(5): 32-37

Published online August 30, 2015 (http://www.openscienceonline.com/journal/ajcse)

A Hybrid Model of MFCC/MSFLA for Speaker

Recognition

Majida Ali Abed1, Hamid Ali Abed Alasadi

2

1College of Computers Sciences & Mathematics, University of Tikrit, Tikrit, Iraq 2Computers Sciences Department, Education for Pure Science College, University of Basra, Basra, Iraq

Email address

[email protected] (M. A. Abed), [email protected] (H. A. A. Alasadi)

To cite this article Majida Ali Abed, Hamid Ali Abed Alasadi. A Hybrid Model of MFCC/MSFLA for Speaker Recognition. American Journal of Computer

Science and Engineering. Vol. 2, No. 5, 2015, pp. 32-37.

Abstract

In this paper, speaker recognition system is optimized based on one of Swarm Intelligence Algorithm called Modified Shuffle

Frog Leaping Algorithm (MSFLA) with Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) feature

extraction approach. In this algorithm Search has been applied on speaker recognition systems and voice. Thus by applying this

algorithm, the process of speaker recognition is optimized by a fitness function by matching of voices being done on only the

extracted optimized features produced by the MSFLA. The recognition accuracy for various noise conditions (white Gaussian

noises, car-noises and B-noises) with same dataset are 94.02%, 96.78% and 84.33%, respectively, using a Hybrid model of

MFCC/MSFLA.

Keywords

Speaker Recognition, Mel Frequency Cepstral Coefficients (MFCCs), Modified Shuffled Frog Leaping Algorithm (MSFLA)

1. Introduction

Speaker recognition systems became the topic of research

in the early 1970’s [1]. Some of the first studies of speaker

recognition were published in 1971, which used feature

extraction technique included, pitch contours [2], Linear

Prediction (LP), Cepstral analysis, linear prediction error

energy and autocorrelation coefficients .Current speaker

recognition research depend on the Cepstral analysis and the

Mel Frequency Cepstral Coefficients (MFCC) are the most

common short-time feature extraction approaches [3].

Speaker recognition includes speaker identification or

speaker verification based on his/her voice in the form of

speech. Speech signal carries information about speech

message, speaker and also the environment of recording. For

speaker recognition, speech data from a speaker is collected

and is used to develop a model for capturing the speaker

specific information. For text-independent speaker

recognition the speech data is usually of about one minute

duration. The model speaker is divided two models [4].

(1). Statistical model like a Gaussian Mixture Model,

Hidden Markov Model, Support Vector Machines

(SVM) and Vector Quantization (VQ).

(2). Neural network model like Feed forward Auto

associative network

Now these two models are used as classification methods

in speaker recognition based by applying the evolutionary

algorithms such as genetic algorithms and genetic

programming, Swarm Intelligence (SI) algorithms such as

Ant Colony Optimization (ACO), Bee Colony Optimization

(BCO), Cat Swarm Optimization (CSO), Shuffled Frog

Leaping Algorithm (SFLA), and Cuckoo Search Algorithm

(CSA). The process of Speaker Recognition is optimized by a

fitness function of these algorithms by matching of voices

being done on only the extracted optimized features produced

by the Swarm Intelligence (SI) algorithms [5, 6]. In Our

paper we used Modified Shuffled Frog Leaping Algorithm

(MSFLA). Our paper is prepared as, Section 2; we discuss

the principle of speaker recognition, Section 3, features

extraction used in this paper. In Sections 4 and 5, the

principle of MSFLA and the speaker recognition system

using the MSFLA are described, respectively. The

performance of the recognition systems based on principle of

speaker recognition and system features is evaluated, and the

33 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition

results are discussed in Section 6. Section 7, gives a

conclusion of the paper.

2. Speaker Recognition

The speaker recognition task is often divided into two

related applications and Characterized into text-independent

and text-dependent recognition [7]. As shown in Figure (1):

� Speaker Identification.

� Speaker Verification.

Speaker identification is used to determine the speaker

from a set of registered speakers when the result of this set is

finest speaker matched, the set is called closed set

identification but when the result can be a speaker or a no-

match result and is called open set identification. Speaker

Verification determines if the voice matches a particular

registered speaker result is the probability of a match or a

similarity measure [8].

Figure (1). The two essential tasks of speaker recognition.

3. Feature Extraction

Modified Shuffle Frog Leaping Algorithm (MSFLA) work

on only on best features, so there is a need to initially extract

the features from the voices [9]. There are many different

speech features that have been shown to be indicative of

speaker identity. These include field related features:

� Linear Prediction Cepstral Coefficients (LPCCs).

� Maximum Autocorrelation Value (MACV).

� Mel Frequency Cepstral Coefficients (MFCCs).

We used in our research the speech feature Mel Frequency

Cepstral Coefficients (MFCCs) extracted from the spectrum.

The reason for use this speech feature is that in many

applications speaker identification is a precursor to speech

recognition, to identify what is being said. Among the

possible features MFCCs have verified to be the most

successful and hearty features for speech recognition [10].

The features will be extracted from the inputted voice. This

inputted voice will be in the form of spectrograms consisting

of various frequencies as per time. Fourier-Bessel Cepstral

coefficients (FBCC) based feature extraction indicates an

improved accuracy and efficiency in comparison to (LPCCs)

and (MACV) feature extracted [11].

4. Modified Shuffled Frog Leaping

Algorithm (MSFLA)

Shuffled Frog Leaping Algorithm (SFLA) and Modified

Shuffled Frog Leaping Algorithm (MSFLA) is a newly

developed nature-inspired method [12-16], which is

characterized by great capability in global search and easy

execution. MSFLA combines the advantages of Genetic

Algorithm (GA) and Particle Swarm Optimization (PSO), is

shown in Figure (2).

American Journal of Computer Science and Engineering 2015; 2(5): 32-37 34

Figure (2). Modified Shuffle Frog Leaping Algorithm.


Figure (3). Process of our proposed Speaker Recognition.

5. Voices Speaker Matching

After the feature extraction stage we obtained stored

extracted features voice of speakers and these extracted

features voice must be matched with input voice’s features.

We used relationship between them, when the extracted

features nearby to the stored features will be the one that will

be matched. To evade the voice matching in all stage of our

system especially when we have un - aboveboard speaker, a

basis small value is used to correct un-aboveboard or

abandonment a speaker which stipulates a probability ratio,

which will denote the amount of match of speaker

recognition. Then the voice will either be accepted or

disallowed. Acceptance means that the speaker is aboveboard

as the voice is matched otherwise it will be disallowed. The

matching between the input voice and the database voice gets

when the matched voice will have a high relationship

otherwise a low value below the threshold is mistreated,

hence the speaker is not permitted the admittance. In our

paper text dependent speaker recognition is used, in which

the conscription and test safety codes are same [17]. The

following Figure (3) explains the process of Text

Independent Speaker Recognition using Modified Shuffled

Frog Leaping Algorithm.

6. Simulation and Results

In this section, they described the Simulation by using

MATLAB, in order to simulate it and discussed, first explain

the database of our system contains different enunciation of

40 different speakers, both male and female speakers (as an

examples in Figure (4)), and each speaker has expressed 5

different sentences.

(a)

(b)

Figure (4). Speaker Signal examples (a) Male (b) Female.

American Journal of Computer Science and Engineering 2015; 2(5): 32-37 36

The database is required the extracted features of the user

be relevant to different enunciation. In our work the Mel

Frequency Cepstral Coefficients (MFCC) is the popular

acoustic features used in speech recognition system for

different speech data. The extracted feature database of the

enunciation is made using MFCC for making a hearty speech

recognizer for different users, and for efficient working of the

MSFLA. The features extracted are accessed by the MSFLA

to search out the best match. The enunciation is added with

different types of noise (white Gaussian noises, car-noises

and B-noises) the features of the signal with added noise are

extracted and the MSFLA discoveries optimally the best

match for the features extracted with admiration to the

feature database, and shows the result for best match. The

obtained results of the recognition accuracy are found to be

best using MFCC features with MSFLA for various noise

conditions using same dataset are as below in Figure (5). The

recognition accuracy for added white Gaussian noises, car-

noises and B-noises are 94.02%, 96.78% and 84.33%,

respectively.

Figure (5). Simulation results for different types of noises.

7. Conclusion

Our paper is based on one of Swarm Intelligence

Algorithm called Modified Shuffle Frog Leaping Algorithm

(MSFLA). The aim of this algorithm use Biometrics is to

identify an individual as per their some special characteristics

as voice. In this MSFLA Search has been applied on speaker

recognition systems and voice. Thus by applying this

algorithm, the process of speaker recognition is optimized by

a fitness function by matching of voices being done on only

the extracted optimized features produced by the MSFLA.

The recognition accuracy is found to be best using a hybrid

model of MFCC/MSFLA (MFCC features with MSFLA) for

various noise conditions. This work addresses the hybrid

model of MFCC/MSFLA as a system reliability optimization

with a multi-criteria approach provided useful insights into

patterns of interaction among articulatory-acoustic feature

dimensions in the further work.

References

[1] D. Ververidis, C. Kotropoulos, “Gaussian mixture modeling by exploiting the mahalanobis distance”, IEEE transactions on signal processing, Vol. 56, No. 7, July 2008.

[2] K. Sri Rama Murty and B. Yegnanarayana, “Combining evidence from residual phase and MFCC features for speaker recognition”, IEEE Signal Processing Letters, vol 13, no 1, Jan. 2006.

[3] S.R.M. Prasanna, S.G. Cheedella, B. Yegnanarayana, “Extraction of speaker-specific excitation information from linear prediction residual of speech”, Speech Communication, Vol. 48, Issue 10, October 2006.

[4] S. Chakroborty, A. Roy, S. Majumdar, G. Saha, “Capturing Complementary Information via Reversed Filter Bank and Parallel Implementation with MFCC for Improved Text-Independent Speaker Identification”, International conference on Computing theory and applications, March 2007.

[5] Y. Liu, M. Russell, M. Carey,” The Role of Dynamic Features in Text-Dependent and Independent Speaker Verification”, IEEE international conf. on acousto. Speech and signal processing (ICASSP), Vol. 1, May 2006.

[6] E. Elbeltagi, T. Hegazy, and D. Grierson, “Comparison among five evolutionary based optimization algorithms,” Advanced Engineering Informatics, Vol. 19, Jan. 2005.

[7] D. A. Reynolds, “Speaker identification and verification using Gaussian mixture models,” Speech Comm., vol. 17, Aug. 1995.

[8] Chu, W. C., "Speech Coding Algorithms'', John Wiley & Sons, Vol.4, USA. 2003.


[9] S. P. Kishore and B. Yegnanarayana, “Speaker verification Minimizing the channel effects using auto associative neural network models,” in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, Istanbul, 2000.

[10] M. Shajith Ikbal, Hemant Misra, and B. Yegnanarayana,“Analysis of auto associative mapping neural networks,” in Int. Joint Conf. on Neural Networks,Washington, USA, 1999.

[11] B.Wildermoth and K. K. Paliwal. Use of voicing and pitch information for speaker recognition. In Use of Voicing and Pitch Information for Speaker Recognition, 2000.

[12] Eusuff, M.M. and Lansey, K.E. ‘Optimization of water distribution network design using the shuffled frog leaping algorithm’, Journal of Water Resources Planning andManagement, Vol. 129, No. 3, 2003.

[13] Taher Niknam, Ehsan Azad Farsani, A hybrid self-adaptive particle swarm optimization and modified shuffled frog leaping algorithm for distribution feeder reconfiguration , Engineering Applications of Artificial Intelligence, 2010.

[14] B. Amiri, M. Fathian, A. Maroosi, Application of shuffled frog-leaping algorithm on clustering, Journal of International Advanced Manufacturing Technology, Vol.45, 2009.

[15] X. H. Luo, Y. Yang, and X. Li, “Modified shuffled frog-leaping algorithm to solve traveling salesman problem,” Journal of Communications, Vol. 30, Jul. 2009.

[16] A. Khorsandi, A. Alimardani, B. Vahidi, and S.H. Hosseinian, “Hybrid shuffled frog leaping algorithm and Nelder–Mead simplexsearch for optimal reactive power dispatch,” IET Genetation Transmission & Distribution, Vol. 5, 2, 2011.

[17] H.B. Kekre, Vaishali Kulkarni, Prashant Gaikar and Nishant Gupta, “Speaker Identification using Spectrograms of Varying Frame Sizes”, International Journal of Computer Applications Vol. 50 - No. 20, July 2012.

A Hybrid Model of MFCC/MSFLA for Speaker

Documents