ooseProj

SOFTWARE REQUIREMENT SPECIFICATION

17 | P a g e

6. Introduction

6.1 Purpose

The purpose of this document is to present the detailed description of the Speaker Recognition System. This report will discuss each stage of the project, the Requirements Specification Phase, the System Design Phase, the Implementation Phase, and the Testing Phase. This report will also discuss Recommendations to the project.

6.2 Intended Audience

The different types of reader that the document is intended for can be developers, project managers, testers, documentation writers and users which include the faculty members, the institute staff, students and the alumni of the institute. It is advisable for all audiences except users to have a look through the complete document for a better understanding of this software product.

6.3 Scope of the Project

Speaker recognition system is a standalone application. It can be used to restrict access to confidential information. It can be integrated into other systems to provide security.

6.4 References

IEEE. IEEE Std 830-1998 IEEE Recommended Practice for Software Requirements Specifications. IEEE Computer Society, 1998.

18 | P a g e

7. Requirement Model

Requirement model is used in the systems and software engineering to refer to a process of building up a specification of the system. In this we will find actors and will make use cases, interface descriptions and problem domain objects. To draw all the relevant diagrams and notations we use UML (Unified Modeling Language).

7.1 User Requirements

7.1.1Functional Requirements

a. The user should be able to enter records.

b. Each record represents information about a person and contains his/her voice sample.

c. Records may consist of: i. First name.ii. Last name.iii. Phone.iv. Address (city-street address).v. Voiceprint.vi. ID-number.

d. The system can do noisy filter on voice signal, which may have come from environment noise or sensitivity of the microphones.

e. The system must be able to take voiceprint and user-id (in case of speaker verification) as an input and search for a match the database, and then show the result.

f. The result should be viewed by showing the user-id’s matching the input.

g. The user should be able to see his/her full information upon successful identification/verification.

7.1.2 Non Functional Requirements

a. Records are maintained in a database.

b. Every record shall be allocated a unique identifier (id-number).

19 | P a g e

c. User should be able to retrieve data by entering id and voiceprint on successful identification/verification.

d. To improve the performance, the database should store the compressed codebook for each user instead of voiceprint. Voiceprint of user is discarded after calculating codebook.

7.2 System Requirements

7.2.1 Actors with their description

Users: Provides the system with a voice sample and expects the system to show match and details about the user.

Administrator: Manages the entire speaker recognition system.

20 | P a g e

Enroll

Request Match

Edit Information

User

Remove

Add Users

Remove Users

Administrator

View Statistics

Figure 1: Use Case Diagram

21 | P a g e

7.2.2 Use Cases with their description

Use Case DescriptionAdd Records Administrator adds the new users to the system. The user

must provide his/her details and voice sample to the system during enrollment.

Request Match The user requests a voice sample to be matched with a voiceprint in the database and retrieve details about it (on successful verification).

Update Records Allows the user to add or update (remove) the records in the system, such as name, ID, phone number, etc (on successful verification)

Remove User/Records System allows the administrator to remove user.

View Statistics Administrator can view the performance of the system.

7.2.2.1 Administrator Use case

Add Users

Remove Users

Administrator

View Statistics

Figure 12: Administrator Use Case Diagram

22 | P a g e

Use Case Name: Add Users

Brief Description: Administrator enrolls the users into system.

1. Preconditions: a. The system must be fully functional and connected to the Database.b. Administrator should be logged into the system.

2. Main flow of events: a. The administrator inputs user details into the system.b. The administrator inputs user’s voice sample.c. Notification appears that user is enrolled.

3. Post conditions: a. User is enrolled.b. A user-id with relevant details is displayed.

4. Special Requirements: none

Use Case Name: Remove Users

Brief Description: Administrator removes the users from system.


2. Main flow of events: a. The administrator inputs user-id into the system.b. Notification appears for confirmation.c. User is removed.

3. Post conditions: System no longer contains any information about the user.

4. Alternate Flow:a. User-id given by administrator is not found in the system.

Administrator once again enters user-id.

b. On confirmation for removal of user, administrator selects no.User is not removed from the system.


23 | P a g e

Use Case Name: View Statistics

Brief Description: Administrator views the performance statistics of the system.


2. Main flow of events: a. Administrator selects to see the performance statistics.b. Statistics is shown.

3. Post conditions: None.

4. Alternate Flow:None


24 | P a g e

7.2.2.2 User Use case

Enroll

Request Match

Edit Information

User

Remove

Figure 13: User use case diagram

Use Case Name: Enroll

Brief Description: User enrolls into system.

1. Preconditions: The system must be fully functional and connected to the Database.

2. Main flow of events: a. The user inputs his/her details into the system.b. The user inputs his/her voice sample.c. Notification appears that user is enrolled.

3. Post conditions: c. User is enrolled.d. A user-id with relevant details is displayed.


25 | P a g e

Use Case Name: Remove

Brief Description: User removes himself/herself from system.


2. Main flow of events: a. The user inputs his/her user-id into the system.b. Notification appears for confirmation.c. User is removed.

3. Post conditions: System no longer contains any information about the user.

4. Alternate Flow:a. User-id given by user is not found in the system.

user once again enters user-id.

b. On confirmation for removal of user, user selects no.User is not removed from the system.


Use Case Name: Request Match

Brief Description: User enters his/her voice sample and runs the test phase.


2. Main flow of events: a. User selects to test.b. System asks user to enter his/her user-id and voice sample.c. Matching is done and result is shown to the user.

3. Post conditions: User is allowed to login into the system.


26 | P a g e


Use Case Name: Edit Information

Brief Description: User edits his/her information stored in the system. User is not allowed to edit his/her already stored voice sample.

1. Preconditions: The system must be fully functional and connected to the Database. User must be logged into the system.

2. Main flow of events: a. User selects to edit.b. System displays full detail of the user.c. User edits his/her information and selects save.

3. Post conditions: System contains updated user information.



7.3 Safety Requirements

There are no safety requirements that concerned, such as possible loss, damage, or harm that could result from the use of Speaker Recognition System.

7.4 User Interfaces

In the main menu, the user will be presented with several buttons. These will be Enroll, Voiceprint Test. After the user logs in, he/she is also presented with Edit Information and Remove buttons.

Administrator is presented with Add Users, Remove Users and View Statistics button.

7.4.1 Enroll

Clicking on the new user button will cause a dialog box to open with the title New User. The dialog box will have following fields:

27 | P a g e

i. First name.ii. Last name.iii. Phone.iv. Address (city-street address).

It will also contain the two buttons, Enroll and Cancel. Cancel will return the user to the main menu with no user created. Enroll will prompt the user to speak so as to record his/her voice and give a countdown starting from 2 seconds. Note: the New User dialogue box will remain in the foreground when recording begins. After the countdown is complete, the system will begin recording from the microphone for 10 second. If the recording was successful then the user will be returned to the main menu. If an error occurred during recording (for example silence) a descriptive message will be displayed (for example no sound recorded) and the dialogue box will remain.

7.4.2 Voiceprint Match

Clicking on Test will allow users to test their voiceprint with the implemented verification algorithm. A dialogue box will pop up with the title Voiceprint Test. Two buttons will give the option to return to the main menu (OK button) or perform the test. Recording will be carried out in the same way specified for enrollment but the responses at the end will be different. Note: the Voiceprint Test dialogue box will remain in theforeground when recording begins. At the end of the recording, the program will respondwith a success, fail, or error.

7.4.3 Remove

The Remove option will delete a user’s profile (user-id and voiceprint). Upon clicking on Remove a dialog box will pop up with the title Remove User for confirmation. There will also be the buttons Cancel and Delete. Cancel will bring the user back to the main menu. If Delete is clicked, the user will be prompted with “Are you sure?” The user will then have to either hit ‘y’ for yes or ‘n’ for no. Hitting ‘y’ and then enter will delete the profile and return the user to the main menu. Hitting ‘n’ will return the user to the dialogue box without deleting the profile.

7.4.4 Statistics

The display of statistics is an element that will be given flexibility. At this time, the onlyrequirements are that performance statistics be available.

28 | P a g e

7.5 Hardware Interfaces

Speaker Recognition System requires access to system’s microphone to capture user voice.

7.6 Software Interfaces

Speaker Recognition System is built for windows operating system. It requires Microsoft XP Service Pack 3 and above to run. Since the software is built on Matlab, it requires Matlab runtime to function properly.

29 | P a g e

Problem Domain Object (PDO)

User Administrator

VoicePrint

Figure 14: Problem Domain Object

30 | P a g e

8. Analysis Model

The analysis model describes the structure of the system or application that you are modeling. It aims to structure the system independently of the actual implementation environment. Focus is on the logical structure of the system

The following are the three types of analysis objects into which the use cases can be classified:

Figure 15: Types of Analysis Object

Entity objects: Information to be held for a longer time all behavior naturally coupled to information. Example: A person with the associated data and behavior.

Interface objects: Models behavior and information that is dependent on the interface to the system. Example: User interface functionality for requesting information about a person.

Control objects: Models functionality that is not naturally tied to any other object. Behavior consists of operating on several different entity objects, doing some computations and then returning the result to an interface object. Example: Calculate taxes using several different factors.

The analysis model identifies the main classes in the system and contains a set of use case realizations that describe how the system will be built. Sequence diagrams realize the use cases by describing the flow of events in the use cases when they are executed. These use case realizations model how the parts of the system interact within the context of a specific use case. It can be used as the foundation of the design model since it describes the logical structure of the system, but not how it will be implemented.

31 | P a g e

AnalysisObjects

InterfaceObjects

EntityObjects

ControlObjects

Interface Objects

MICROPHONE IF.

Figure 16: Interface Objects

32 | P a g e

Entity Objects

User Information Voice Sample

Figure 17: Entity Objects

Control Objects:

Figure 18:Control Objects

33 | P a g e

Start Panel Receive Information

User Information

<includes>

Generate ResultVoice Sample

Request Match

Add/Remove/Edit User User Panel

Admin Interface

34 | P a g e

Figure 19: Analysis ModelView Statistics

9. SEQUENCE DIAGRAMS

A sequence diagram in a Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams typically are associated with use case realizations in the Logical View of the system under development. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.

Sequence Diagram: User Enrollment

Enrollment Profile Feature Extract UsersCodebook Calculation

Create

Add To User List

Voice Sample/Training Speech

Request for Voice Sample

Voice Sample

Acoustic Vectors

Acoustic Vectors

Codebook

Return User IdReturn User Id

35 | P a g e

Figure 20: Sequence diagram for user enrollment

36 | P a g e

Sequence Diagram: Voice Match

User Match Voice Feature Extractor

Feature Comparator

Codebook

Request to initiate match

VoicePrint and User IdVoicePrint Acoustic Vector

Requests for user's codebook. Input: UserId

Requests for voice and user id input

Codebook

Result is returned

Result

Result

Figure 21: Sequence diagram for Voice Match

37 | P a g e

Sequence Diagram: Edit Information

User Authenticator Edit User Database

Voice Sample and UserId

Successfully logged in

User Id & Request to Retrieve Information UserId

User InformationInformation

Updated Information Updated Information

SuccessSuccess

Figure 22: Sequence Diagram for Editing Information

38 | P a g e

10. Activity Diagrams

An activity diagram is like a state diagram, except that it has a few additional symbols and is used in a different context. In a state diagram, most transitions are caused by external events; however’ in an activity diagram, most transitions are caused by internal events, such as the completion of an activity.

An activity diagram is used to understand the flow of work that an object or component performs. It can also be used to visualize the interaction between different use cases.

1. Enroll new user

Figure 23: Activity Diagram for enrolling new user

2. Request Matching

Figure 24: Activity Diagram for voice matching

39 | P a g e

3. Remove User

Figure 25: Activity Diagram for removing user

4. Update user information

Figure 26: Activity Diagram for updating user information

40 | P a g e

11. Design Model

11.1 High Level DesignThere are two main modules in this speaker recognition system: The User Enrollment Module and the User Verification Module.

Figure 27: High Level Block Diagram of Speaker Verification System

11.1.1 User Enrollment Module

The User Enrollment Module is used when a new user is added to the system. This module is used to essentially “teach” the system the new user’s voice. The input of this module is a voiceprint of the user along with other details. By analyzing this training speech, the module outputs a model that parameterizes the user’s voice. This model will be used later in the User Verification Module.

11.1.1.1 Signal Preprocessing Subsystem

The signal preprocessing subsystem conditions the raw speech signal and prepares it forsubsequent manipulations and analysis. This subsystem performs analog-to-digitalconversion, and perform and signal conditioning necessary.

11.1.1.2 Feature Extraction Subsystem

The feature extraction subsystem analyzes the user’s digitized voice signal and creates aseries of values to use as a model for the user’s speech pattern.

41 | P a g e

11.1.1.3 Feature Data Compression Subsystem

The disk size required for the model created in the Feature Extraction subsystem will be significant when many users are enrolled in the system. In order to store this data effectively, a form of data compression is used. After the model is compressed, it will be stored for later use in the User Verification Module.

11.1.2 Threshold Generation Module

This module is used to set the sensitivity level of the system for each user enrolled in thesystem. This sensitivity value is called the threshold and needs to be generated whenever a new user is enrolled. This module can also be invoked when a user feels they are receiving too many false rejections and wants to re-calculate an appropriate sensitivity level.

After a user enrolls with the system, running this module will essentially invoke a userverification session. However, instead of receiving a pass or fail verdict, the system willtake the similarity factor found in the Feature Comparison Subsystem, and use it todetermine the threshold value. This similarity factor will be scaled-up and then saved as the threshold value. Scaling the value up will hopefully account for any variances in future verification sessions.

This module is required for speaker verification functionality. As of now, implementation of this module is suspended due to timing constraint.

11.1.2.1 Threshold Generation Subsystem

This subsystem will set the user threshold to a scaled-up version of the similarity factordetermined in the Feature Comparison Subsystem.

11.1.3 Verification Module

The User Verification Module is used when the system tries to verify a user. The userinforms the system that he or she is a certain user. The system will then prompt the user to say something. This utterance is referred to as the “testing speech.” The module performs the same signal pre-processing and feature extraction as the User Enrollment Module. The extracted speech parameterization data is then compared to the stored model. Based on the similarity, a verdict will be given to indicate whether the user has passed or failed the voice verification test.

42 | P a g e

11.1.3.1 Feature Comparison Subsystem

After the Feature Extraction Subsystem parameterizes the training speech, this data iscompared to the model of the user stored on disk. After comparing all the data, a similarity factor will be produced.

11.1.3.2 Decision Subsystem

Based on the similarity factor produced by the Feature Comparison Subsystem, and theuser’s threshold value, a verdict will be given by this subsystem to indicate whether the user has passed or failed the voice verification test.

11.2 Low Level Design The following section describes the information used for implementation of each subsystem.

11.2.1 Signal Preprocessing Subsystem

Input: Raw speech signalOutput: Digitized and conditioned speech signal (one vector containing all sampled values)

Figure 28: Signal Preprocessing Subsystem Low-Level Block Diagram

The sampling will produce a digital signal in the form of a vector or array. The silence at the beginning and end of the speech sample will be removed.

11.2.2 Feature Extraction Subsystem Input: Digital speech signal (one vector containing all sampled values)Output: A set of acoustic vectors

Figure 29: Feature Extraction Subsystem Low-Level Block DiagramMel-Cepstral Coefficients will be used to parameterize the speech sample and voice.

43 | P a g e

The original vector of sampled values will be framed into overlapping blocks. Each block will be windowed to minimize spectral distortion and discontinuities. A Hamming window will be used. The Fast Fourier Transform will then be applied to each windowed block as the beginning of the Mel-Cepstral Transform. After this stage, the spectral coefficients of each block are generated.

The Mel Frequency Transform will then be applied to each spectral block to convert thescale to a mel-scale. The mel-scale is a logarithmic scale similar to the way the human ear perceives sound. Finally, the Discrete Cosine Transform will be applied to each Mel Spectrum to convert the values back to real values in the time domain.

11.2.3 Feature Compression Subsystem Inputs: A set of acoustic vectorsOutput: Codebook

Figure 30: Feature Data Compression Subsystem Low-Level Block Diagram

The K Means Vector Quantization Algorithm will be used.

4.2.4 Feature Data Comparison Subsystem

Inputs: Set of acoustic vectors from testing speech; codebookOutputs: average distortion factor

Figure 31: Comparison Subsystem Low-Level Block Diagram

The acoustic vectors generated by the testing voice signal will be individually compared to the codebook. The codeword closest to each test vector is found based on Euclideandistance. This minimum Euclidean Distance, or Distortion Factor, is then stored until the

44 | P a g e

Distortion Factor for each test vector has been calculated. The Average Distortion Factor is then found and normalized.

Figure 32: Distortion Calculation Algorithm Flow Chart

11.2.5 Decision Subsystem

Inputs: Average distortion factor; User specific thresholdOutputs: Verdict

45 | P a g e

Figure 33: Comparison Subsystem Low-Level Block Diagram

46 | P a g e

12. Alternative Options

There is more than one way to perform speaker recognition. The methods chosen for this project were mostly chosen because of their implementability and low complexity. The list of alternatives below is in no way a complete listing. 12.1 Feature Extraction Alternatives

Linear PredictionCepstrum: Identifies the vocal track parameters. Used for text-independent recognition.Discrete Wavelet TransformDelta-Cepstrum: Analyses changing tones

12.2 Feature Matching Alternatives

Dynamic Time Warping: Accounts for inconsistencies in the rate of speech by stretching or compressing parts of the signal in the time domain.AI based: Hidden Markov Models, Gaussian Mixture Models, and Neural Networks.

13. Implementation

13.1 Platform

Matlab was chosen as the platform for ease of implementation. A third party GNU Matlab toolbox, Voicebox, was used. This toolbox provides functions that calculate Mel-Frequency Coefficients and performs vector quantization.

47 | P a g e

Conclusion

In this project, we have developed a text-independent speaker identification system that is a system that identifies a person who speaks regardless of what he/she is saying. Our speaker verification system consists of two sections: (i) Enrolment section to build a database of known speakers and (ii) Unknown speaker identification system. Enrollment session is also referred to as training phase while the unknown speaker identification system is also referred to as the operation session or testing phase.

In the training phase, each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. It consists of two main parts. The first part consists of processing each persons input voice sample to condense and summarize the characteristics of their vocal tracts. The second part involves pulling each person's data together into a single, easily manipulated matrix. In the testing phase, above calculated matrix is used recognition.

Future Work

Currently, this application lacks easy-to-use user interface. This application can be

extended to provide user interface and also this application can be fine tuned to meet real-

time constraint. Other techniques may be used for implementing this application to

minimize the false-acceptance and false-rejection rate.

48 | P a g e

Snapshots

Figure 34: Matlab Command Window

49 | P a g e

Figure 35: Matlab Editor

50 | P a g e

51 | P a g e

REFERENCES

1. Manfred U. A. Bromba. "Biometrics: Frequently Asked Questions" http://www.bromba.com/faq/biofaqe.htm

2. Wikipedia, the free encyclopedia. " Biometrics" http://en.wikipedia.org/wiki/Biometrics

3. Douglas A. Reynolds. “Automatic Speaker Recognition Using Gaussian Mixture Speaker Models” http://www.ll.mit.edu/publications/journal/pdf/vol08_no2/8.2.4.speakerrecognition.pdf

4. Patricia Melin, Jerica Urias, Daniel Solano, Miguel Soto, Miguel Lopez, and Oscar Castillo, “Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms”, Engineering Letters, 13:2, EL_13_2_9. http://www.engineeringletters.com/issues_v13/issue_2/EL_13_2_9.pdf

Code

52 | P a g e

http://www.engineeringletters.com/issues_v13/issue_2/EL_13_2_9.pdf

http://www.ll.mit.edu/publications/journal/pdf/vol08_no2/8.2.4.speakerrecognition.pdf%20

http://en.wikipedia.org/wiki/Biometrics

http://www.bromba.com/faq/biofaqe.htm

speakerTest.m

function speakerTest(a) voiceboxpath='C:/Users/test/voicebox';addpath(voiceboxpath);% A speaker recognition program. a is a string of the filename to be tested against the % database of sampled voices and it will be evaluated whose voice it is. % Mike Brooks, VOICEBOX, Free toolbox for MATLab, % www.ncl.ac.uk/CPACTsoftware/MatlabLinks.html % disteusq.m, enframe.m, kmeans.m, melbankm.m, melcepst.m, rdct.m and rfft.m from % VOICEBOX are used in this program. test.data = wavread(a); % read the test file name = ['vimal';'anand']; % name of people in the database fs = 16000; % sampling frequency C = 8; % number of centroids % Load data disp('Reading data for training:') [train.data] = Load_data(name); % Calculate mel-frequecny cepstral coefficients for training set fprintf('\nCalculating mel-frequency cepstral coefficients for training set:\n') [train.cc] = mfcc(train.data,name,fs); % Perform K-means algorithm for clustering (Vector Quantization) fprintf('\nApplying Vector Quantization(K-means) for feature extraction:\n') [train.kmeans] = kmean(train.cc,C); % Calculate mel-frequecny cepstral coefficients for test set %fprintf('\nCalculating mel-frequency cepstral coefficients for test set:\n') test.cc = melcepst(test.data,fs,'x'); % Compute average distances between test.cc with all the codebooks in % database, and find the lowest distortion %fprintf('\nComputing a distance measure for each codebook.\n') [result index] = distmeasure(train.kmeans,test.cc); % Display results - average distances between the features of unknown voice % (test.cc) with all the codebooks in database and identify the person with % the lowest distance fprintf('\nDisplaying the result:\n') dispresult(name,result,index)

53 | P a g e

Load_data.m

function [data] = Load_data(name) % Training mode - Load all the wave files to database (codebooks) % data = cell(size(name,1),1); for i=1:size(name,1) temp = [name(i,:) '.wav']; tempwav = wavread(temp); data{i} = tempwav; end

distmeasure.m

function [result,index] = distmeasure(x,y) result = cell(size(x,1),1); dist = cell(size(x,1),1); mins = inf; k=size(x,2); for i = 1:size(x,1) dist{i} = disteusq(x{i}(:,1:k),y(:,1:k),'x'); temp = sum(min(dist{i}))/size(dist{i},2); result{i} = temp; if temp < mins mins = temp; index = i; end end

dispresult.m

function dispresult(x,y,z) disp('The average of Euclidean distances between database and test wave file') color = ['r'; 'g'; 'c'; 'b'; 'm'; 'k']; for i = 1:size(x,1) disp(x(i,:)) disp(y{i}) end disp('The test voice is most likely from') disp(x(z,:))

54 | P a g e

mfcc.m

function [cepstral] = mfcc(x,y,fs) % Calculate mfcc's with a frequency(fs) and store in ceptral cell. Display % y at a time when x is calculated cepstral = cell(size(x,1),1); for i = 1:size(x,1) disp(y(i,:)) cepstral{i} = melcepst(x{i},fs,'x'); end

kmean.m

function [data] = kmean(x,C) % Calculate k-means for x with C number of centroids train.kmeans.x = cell(size(x,1),1); train.kmeans.esql = cell(size(x,1),1); train.kmeans.j = cell(size(x,1),1); for i = 1:size(x,1) [train.kmeans.j{i} train.kmeans.x{i}] = kmeans(x{i}(:,1:12),C); end data = train.kmeans.x;

melcepst.m

function c=melcepst(s,fs,w,nc,p,n,inc,fl,fh)%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)%%% Simple use: c=melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames% c=melcepst(s,fs,'e0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs%% Inputs:% s speech signal% fs sample rate in Hz (default 11025)

55 | P a g e

% nc number of cepstral coefficients excluding 0'th coefficient (default 12)% n length of frame in samples (default power of 2 < (0.03*fs))% p number of filters in filterbank (default: floor(3*log(fs)) = approx 2.1 per ocatave)% inc frame increment (default n/2)% fl low end of the lowest filter as a fraction of fs (default = 0)% fh high end of highest filter as a fraction of fs (default = 0.5)%% w any sensible combination of the following:%% 'R' rectangular window in time domain% 'N' Hanning window in time domain% 'M' Hamming window in time domain (default)%% 't' triangular shaped filters in mel domain (default)% 'n' hanning shaped filters in mel domain% 'm' hamming shaped filters in mel domain%% 'p' filters act in the power domain% 'a' filters act in the absolute magnitude domain (default)%% '0' include 0'th order cepstral coefficient% 'E' include log energy% 'd' include delta coefficients (dc/dt)% 'D' include delta-delta coefficients (d^2c/dt^2)%% 'z' highest and lowest filters taper down to zero (default)% 'y' lowest filter remains at 1 down to 0 frequency and% highest filter remains at 1 up to nyquist freqency%% If 'ty' or 'ny' is specified, the total power in the fft is preserved.%% Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the% first element of each row followed by the delta and then the delta-delta% coefficients.%

if nargin<2 fs=11025; endif nargin<3 w='M'; endif nargin<4 nc=12; endif nargin<5 p=floor(3*log(fs)); endif nargin<6 n=pow2(floor(log2(0.03*fs))); endif nargin<9 fh=0.5; if nargin<8 fl=0;

56 | P a g e

if nargin<7 inc=floor(n/2); end endend

if isempty(w) w='M';endif any(w=='R') z=enframe(s,n,inc);elseif any (w=='N') z=enframe(s,hanning(n),inc);else z=enframe(s,hamming(n),inc);endf=rfft(z.');[m,a,b]=melbankm(p,n,fs,fl,fh,w);pw=f(a:b,:).*conj(f(a:b,:));pth=max(pw(:))*1E-20;if any(w=='p') y=log(max(m*pw,pth));else ath=sqrt(pth); y=log(max(m*abs(f(a:b,:)),ath));endc=rdct(y).';nf=size(c,1);nc=nc+1;if p>nc c(:,nc+1:end)=[];elseif p<nc c=[c zeros(nf,nc-p)];endif ~any(w=='0') c(:,1)=[]; nc=nc-1;endif any(w=='E') c=[log(sum(pw)).' c]; nc=nc+1;end

% calculate derivative

if any(w=='D')

57 | P a g e

vf=(4:-1:-4)/60; af=(1:-1:-1)/2; ww=ones(5,1); cx=[c(ww,:); c; c(nf*ww,:)]; vx=reshape(filter(vf,1,cx(:)),nf+10,nc); vx(1:8,:)=[]; ax=reshape(filter(af,1,vx(:)),nf+2,nc); ax(1:2,:)=[]; vx([1 nf+2],:)=[]; if any(w=='d') c=[c vx ax]; else c=[c ax]; endelseif any(w=='d') vf=(4:-1:-4)/60; ww=ones(4,1); cx=[c(ww,:); c; c(nf*ww,:)]; vx=reshape(filter(vf,1,cx(:)),nf+8,nc); vx(1:8,:)=[]; c=[c vx];end if nargout<1 [nf,nc]=size(c); t=((0:nf-1)*inc+(n-1)/2)/fs; ci=(1:nc)-any(w=='0')-any(w=='E'); imh = imagesc(t,ci,c.'); axis('xy'); xlabel('Time (s)'); ylabel('Mel-cepstrum coefficient');

map = (0:63)'/63;colormap([map map map]);colorbar;

end

58 | P a g e

ooseProj

Documents

unified modeling

comparison

frequecny

feature comparison

average distortion

problem domain

level block

frequency