SOFTWARE REQUIREMENT SPECIFICATION 17 | Page
SOFTWARE REQUIREMENT SPECIFICATION
17 | P a g e
6. Introduction
6.1 Purpose
The purpose of this document is to present the detailed description of the Speaker Recognition System. This report will discuss each stage of the project, the Requirements Specification Phase, the System Design Phase, the Implementation Phase, and the Testing Phase. This report will also discuss Recommendations to the project.
6.2 Intended Audience
The different types of reader that the document is intended for can be developers, project managers, testers, documentation writers and users which include the faculty members, the institute staff, students and the alumni of the institute. It is advisable for all audiences except users to have a look through the complete document for a better understanding of this software product.
6.3 Scope of the Project
Speaker recognition system is a standalone application. It can be used to restrict access to confidential information. It can be integrated into other systems to provide security.
6.4 References
IEEE. IEEE Std 830-1998 IEEE Recommended Practice for Software Requirements Specifications. IEEE Computer Society, 1998.
18 | P a g e
7. Requirement Model
Requirement model is used in the systems and software engineering to refer to a process of building up a specification of the system. In this we will find actors and will make use cases, interface descriptions and problem domain objects. To draw all the relevant diagrams and notations we use UML (Unified Modeling Language).
7.1 User Requirements
7.1.1Functional Requirements
a. The user should be able to enter records.
b. Each record represents information about a person and contains his/her voice sample.
c. Records may consist of: i. First name.ii. Last name.iii. Phone.iv. Address (city-street address).v. Voiceprint.vi. ID-number.
d. The system can do noisy filter on voice signal, which may have come from environment noise or sensitivity of the microphones.
e. The system must be able to take voiceprint and user-id (in case of speaker verification) as an input and search for a match the database, and then show the result.
f. The result should be viewed by showing the user-id’s matching the input.
g. The user should be able to see his/her full information upon successful identification/verification.
7.1.2 Non Functional Requirements
a. Records are maintained in a database.
b. Every record shall be allocated a unique identifier (id-number).
19 | P a g e
c. User should be able to retrieve data by entering id and voiceprint on successful identification/verification.
d. To improve the performance, the database should store the compressed codebook for each user instead of voiceprint. Voiceprint of user is discarded after calculating codebook.
7.2 System Requirements
7.2.1 Actors with their description
Users: Provides the system with a voice sample and expects the system to show match and details about the user.
Administrator: Manages the entire speaker recognition system.
20 | P a g e
Enroll
Request Match
Edit Information
User
Remove
Add Users
Remove Users
Administrator
View Statistics
Figure 1: Use Case Diagram
21 | P a g e
7.2.2 Use Cases with their description
Use Case DescriptionAdd Records Administrator adds the new users to the system. The user
must provide his/her details and voice sample to the system during enrollment.
Request Match The user requests a voice sample to be matched with a voiceprint in the database and retrieve details about it (on successful verification).
Update Records Allows the user to add or update (remove) the records in the system, such as name, ID, phone number, etc (on successful verification)
Remove User/Records System allows the administrator to remove user.
View Statistics Administrator can view the performance of the system.
7.2.2.1 Administrator Use case
Add Users
Remove Users
Administrator
View Statistics
Figure 12: Administrator Use Case Diagram
22 | P a g e
Use Case Name: Add Users
Brief Description: Administrator enrolls the users into system.
1. Preconditions: a. The system must be fully functional and connected to the Database.b. Administrator should be logged into the system.
2. Main flow of events: a. The administrator inputs user details into the system.b. The administrator inputs user’s voice sample.c. Notification appears that user is enrolled.
3. Post conditions: a. User is enrolled.b. A user-id with relevant details is displayed.
4. Special Requirements: none
Use Case Name: Remove Users
Brief Description: Administrator removes the users from system.
1. Preconditions: a. The system must be fully functional and connected to the Database.b. Administrator should be logged into the system.
2. Main flow of events: a. The administrator inputs user-id into the system.b. Notification appears for confirmation.c. User is removed.
3. Post conditions: System no longer contains any information about the user.
4. Alternate Flow:a. User-id given by administrator is not found in the system.
Administrator once again enters user-id.
b. On confirmation for removal of user, administrator selects no.User is not removed from the system.
5. Special Requirements: none
23 | P a g e
Use Case Name: View Statistics
Brief Description: Administrator views the performance statistics of the system.
1. Preconditions: a. The system must be fully functional and connected to the Database.b. Administrator should be logged into the system.
2. Main flow of events: a. Administrator selects to see the performance statistics.b. Statistics is shown.
3. Post conditions: None.
4. Alternate Flow:None
5. Special Requirements: none
24 | P a g e
7.2.2.2 User Use case
Enroll
Request Match
Edit Information
User
Remove
Figure 13: User use case diagram
Use Case Name: Enroll
Brief Description: User enrolls into system.
1. Preconditions: The system must be fully functional and connected to the Database.
2. Main flow of events: a. The user inputs his/her details into the system.b. The user inputs his/her voice sample.c. Notification appears that user is enrolled.
3. Post conditions: c. User is enrolled.d. A user-id with relevant details is displayed.
4. Special Requirements: none
25 | P a g e
Use Case Name: Remove
Brief Description: User removes himself/herself from system.
1. Preconditions: The system must be fully functional and connected to the Database.
2. Main flow of events: a. The user inputs his/her user-id into the system.b. Notification appears for confirmation.c. User is removed.
3. Post conditions: System no longer contains any information about the user.
4. Alternate Flow:a. User-id given by user is not found in the system.
user once again enters user-id.
b. On confirmation for removal of user, user selects no.User is not removed from the system.
5. Special Requirements: none
Use Case Name: Request Match
Brief Description: User enters his/her voice sample and runs the test phase.
1. Preconditions: The system must be fully functional and connected to the Database.
2. Main flow of events: a. User selects to test.b. System asks user to enter his/her user-id and voice sample.c. Matching is done and result is shown to the user.
3. Post conditions: User is allowed to login into the system.
4. Alternate Flow:None
26 | P a g e
5. Special Requirements: none
Use Case Name: Edit Information
Brief Description: User edits his/her information stored in the system. User is not allowed to edit his/her already stored voice sample.
1. Preconditions: The system must be fully functional and connected to the Database. User must be logged into the system.
2. Main flow of events: a. User selects to edit.b. System displays full detail of the user.c. User edits his/her information and selects save.
3. Post conditions: System contains updated user information.
4. Alternate Flow:None
5. Special Requirements: none
7.3 Safety Requirements
There are no safety requirements that concerned, such as possible loss, damage, or harm that could result from the use of Speaker Recognition System.
7.4 User Interfaces
In the main menu, the user will be presented with several buttons. These will be Enroll, Voiceprint Test. After the user logs in, he/she is also presented with Edit Information and Remove buttons.
Administrator is presented with Add Users, Remove Users and View Statistics button.
7.4.1 Enroll
Clicking on the new user button will cause a dialog box to open with the title New User. The dialog box will have following fields:
27 | P a g e
i. First name.ii. Last name.iii. Phone.iv. Address (city-street address).
It will also contain the two buttons, Enroll and Cancel. Cancel will return the user to the main menu with no user created. Enroll will prompt the user to speak so as to record his/her voice and give a countdown starting from 2 seconds. Note: the New User dialogue box will remain in the foreground when recording begins. After the countdown is complete, the system will begin recording from the microphone for 10 second. If the recording was successful then the user will be returned to the main menu. If an error occurred during recording (for example silence) a descriptive message will be displayed (for example no sound recorded) and the dialogue box will remain.
7.4.2 Voiceprint Match
Clicking on Test will allow users to test their voiceprint with the implemented verification algorithm. A dialogue box will pop up with the title Voiceprint Test. Two buttons will give the option to return to the main menu (OK button) or perform the test. Recording will be carried out in the same way specified for enrollment but the responses at the end will be different. Note: the Voiceprint Test dialogue box will remain in theforeground when recording begins. At the end of the recording, the program will respondwith a success, fail, or error.
7.4.3 Remove
The Remove option will delete a user’s profile (user-id and voiceprint). Upon clicking on Remove a dialog box will pop up with the title Remove User for confirmation. There will also be the buttons Cancel and Delete. Cancel will bring the user back to the main menu. If Delete is clicked, the user will be prompted with “Are you sure?” The user will then have to either hit ‘y’ for yes or ‘n’ for no. Hitting ‘y’ and then enter will delete the profile and return the user to the main menu. Hitting ‘n’ will return the user to the dialogue box without deleting the profile.
7.4.4 Statistics
The display of statistics is an element that will be given flexibility. At this time, the onlyrequirements are that performance statistics be available.
28 | P a g e
7.5 Hardware Interfaces
Speaker Recognition System requires access to system’s microphone to capture user voice.
7.6 Software Interfaces
Speaker Recognition System is built for windows operating system. It requires Microsoft XP Service Pack 3 and above to run. Since the software is built on Matlab, it requires Matlab runtime to function properly.
29 | P a g e
Problem Domain Object (PDO)
User Administrator
VoicePrint
Figure 14: Problem Domain Object
30 | P a g e
8. Analysis Model
The analysis model describes the structure of the system or application that you are modeling. It aims to structure the system independently of the actual implementation environment. Focus is on the logical structure of the system
The following are the three types of analysis objects into which the use cases can be classified:
Figure 15: Types of Analysis Object
Entity objects: Information to be held for a longer time all behavior naturally coupled to information. Example: A person with the associated data and behavior.
Interface objects: Models behavior and information that is dependent on the interface to the system. Example: User interface functionality for requesting information about a person.
Control objects: Models functionality that is not naturally tied to any other object. Behavior consists of operating on several different entity objects, doing some computations and then returning the result to an interface object. Example: Calculate taxes using several different factors.
The analysis model identifies the main classes in the system and contains a set of use case realizations that describe how the system will be built. Sequence diagrams realize the use cases by describing the flow of events in the use cases when they are executed. These use case realizations model how the parts of the system interact within the context of a specific use case. It can be used as the foundation of the design model since it describes the logical structure of the system, but not how it will be implemented.
31 | P a g e
AnalysisObjects
InterfaceObjects
EntityObjects
ControlObjects
Interface Objects
MICROPHONE IF.
Figure 16: Interface Objects
32 | P a g e
Entity Objects
User Information Voice Sample
Figure 17: Entity Objects
Control Objects:
Figure 18:Control Objects
33 | P a g e
Start Panel Receive Information
User Information
<includes>
Generate ResultVoice Sample
Request Match
Add/Remove/Edit User User Panel
Admin Interface
34 | P a g e
Figure 19: Analysis ModelView Statistics
9. SEQUENCE DIAGRAMS
A sequence diagram in a Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams typically are associated with use case realizations in the Logical View of the system under development. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.
Sequence Diagram: User Enrollment
Enrollment Profile Feature Extract UsersCodebook Calculation
Create
Add To User List
Voice Sample/Training Speech
Request for Voice Sample
Voice Sample
Acoustic Vectors
Acoustic Vectors
Codebook
Return User IdReturn User Id
35 | P a g e
Figure 20: Sequence diagram for user enrollment
36 | P a g e
Sequence Diagram: Voice Match
User Match Voice Feature Extractor
Feature Comparator
Codebook
Request to initiate match
VoicePrint and User IdVoicePrint Acoustic Vector
Requests for user's codebook. Input: UserId
Requests for voice and user id input
Codebook
Result is returned
Result
Result
Figure 21: Sequence diagram for Voice Match
37 | P a g e
Sequence Diagram: Edit Information
User Authenticator Edit User Database
Voice Sample and UserId
Successfully logged in
User Id & Request to Retrieve Information UserId
User InformationInformation
Updated Information Updated Information
SuccessSuccess
Figure 22: Sequence Diagram for Editing Information
38 | P a g e
10. Activity Diagrams
An activity diagram is like a state diagram, except that it has a few additional symbols and is used in a different context. In a state diagram, most transitions are caused by external events; however’ in an activity diagram, most transitions are caused by internal events, such as the completion of an activity.
An activity diagram is used to understand the flow of work that an object or component performs. It can also be used to visualize the interaction between different use cases.
1. Enroll new user
Figure 23: Activity Diagram for enrolling new user
2. Request Matching
Figure 24: Activity Diagram for voice matching
39 | P a g e
3. Remove User
Figure 25: Activity Diagram for removing user
4. Update user information
Figure 26: Activity Diagram for updating user information
40 | P a g e
11. Design Model
11.1 High Level DesignThere are two main modules in this speaker recognition system: The User Enrollment Module and the User Verification Module.
Figure 27: High Level Block Diagram of Speaker Verification System
11.1.1 User Enrollment Module
The User Enrollment Module is used when a new user is added to the system. This module is used to essentially “teach” the system the new user’s voice. The input of this module is a voiceprint of the user along with other details. By analyzing this training speech, the module outputs a model that parameterizes the user’s voice. This model will be used later in the User Verification Module.
11.1.1.1 Signal Preprocessing Subsystem
The signal preprocessing subsystem conditions the raw speech signal and prepares it forsubsequent manipulations and analysis. This subsystem performs analog-to-digitalconversion, and perform and signal conditioning necessary.
11.1.1.2 Feature Extraction Subsystem
The feature extraction subsystem analyzes the user’s digitized voice signal and creates aseries of values to use as a model for the user’s speech pattern.
41 | P a g e
11.1.1.3 Feature Data Compression Subsystem
The disk size required for the model created in the Feature Extraction subsystem will be significant when many users are enrolled in the system. In order to store this data effectively, a form of data compression is used. After the model is compressed, it will be stored for later use in the User Verification Module.
11.1.2 Threshold Generation Module
This module is used to set the sensitivity level of the system for each user enrolled in thesystem. This sensitivity value is called the threshold and needs to be generated whenever a new user is enrolled. This module can also be invoked when a user feels they are receiving too many false rejections and wants to re-calculate an appropriate sensitivity level.
After a user enrolls with the system, running this module will essentially invoke a userverification session. However, instead of receiving a pass or fail verdict, the system willtake the similarity factor found in the Feature Comparison Subsystem, and use it todetermine the threshold value. This similarity factor will be scaled-up and then saved as the threshold value. Scaling the value up will hopefully account for any variances in future verification sessions.
This module is required for speaker verification functionality. As of now, implementation of this module is suspended due to timing constraint.
11.1.2.1 Threshold Generation Subsystem
This subsystem will set the user threshold to a scaled-up version of the similarity factordetermined in the Feature Comparison Subsystem.
11.1.3 Verification Module
The User Verification Module is used when the system tries to verify a user. The userinforms the system that he or she is a certain user. The system will then prompt the user to say something. This utterance is referred to as the “testing speech.” The module performs the same signal pre-processing and feature extraction as the User Enrollment Module. The extracted speech parameterization data is then compared to the stored model. Based on the similarity, a verdict will be given to indicate whether the user has passed or failed the voice verification test.
42 | P a g e
11.1.3.1 Feature Comparison Subsystem
After the Feature Extraction Subsystem parameterizes the training speech, this data iscompared to the model of the user stored on disk. After comparing all the data, a similarity factor will be produced.
11.1.3.2 Decision Subsystem
Based on the similarity factor produced by the Feature Comparison Subsystem, and theuser’s threshold value, a verdict will be given by this subsystem to indicate whether the user has passed or failed the voice verification test.
11.2 Low Level Design The following section describes the information used for implementation of each subsystem.
11.2.1 Signal Preprocessing Subsystem
Input: Raw speech signalOutput: Digitized and conditioned speech signal (one vector containing all sampled values)
Figure 28: Signal Preprocessing Subsystem Low-Level Block Diagram
The sampling will produce a digital signal in the form of a vector or array. The silence at the beginning and end of the speech sample will be removed.
11.2.2 Feature Extraction Subsystem Input: Digital speech signal (one vector containing all sampled values)Output: A set of acoustic vectors
Figure 29: Feature Extraction Subsystem Low-Level Block DiagramMel-Cepstral Coefficients will be used to parameterize the speech sample and voice.
43 | P a g e
The original vector of sampled values will be framed into overlapping blocks. Each block will be windowed to minimize spectral distortion and discontinuities. A Hamming window will be used. The Fast Fourier Transform will then be applied to each windowed block as the beginning of the Mel-Cepstral Transform. After this stage, the spectral coefficients of each block are generated.
The Mel Frequency Transform will then be applied to each spectral block to convert thescale to a mel-scale. The mel-scale is a logarithmic scale similar to the way the human ear perceives sound. Finally, the Discrete Cosine Transform will be applied to each Mel Spectrum to convert the values back to real values in the time domain.
11.2.3 Feature Compression Subsystem Inputs: A set of acoustic vectorsOutput: Codebook
Figure 30: Feature Data Compression Subsystem Low-Level Block Diagram
The K Means Vector Quantization Algorithm will be used.
4.2.4 Feature Data Comparison Subsystem
Inputs: Set of acoustic vectors from testing speech; codebookOutputs: average distortion factor
Figure 31: Comparison Subsystem Low-Level Block Diagram
The acoustic vectors generated by the testing voice signal will be individually compared to the codebook. The codeword closest to each test vector is found based on Euclideandistance. This minimum Euclidean Distance, or Distortion Factor, is then stored until the
44 | P a g e
Distortion Factor for each test vector has been calculated. The Average Distortion Factor is then found and normalized.
Figure 32: Distortion Calculation Algorithm Flow Chart
11.2.5 Decision Subsystem
Inputs: Average distortion factor; User specific thresholdOutputs: Verdict
45 | P a g e
Figure 33: Comparison Subsystem Low-Level Block Diagram
46 | P a g e
12. Alternative Options
There is more than one way to perform speaker recognition. The methods chosen for this project were mostly chosen because of their implementability and low complexity. The list of alternatives below is in no way a complete listing. 12.1 Feature Extraction Alternatives
Linear PredictionCepstrum: Identifies the vocal track parameters. Used for text-independent recognition.Discrete Wavelet TransformDelta-Cepstrum: Analyses changing tones
12.2 Feature Matching Alternatives
Dynamic Time Warping: Accounts for inconsistencies in the rate of speech by stretching or compressing parts of the signal in the time domain.AI based: Hidden Markov Models, Gaussian Mixture Models, and Neural Networks.
13. Implementation
13.1 Platform
Matlab was chosen as the platform for ease of implementation. A third party GNU Matlab toolbox, Voicebox, was used. This toolbox provides functions that calculate Mel-Frequency Coefficients and performs vector quantization.
47 | P a g e
Conclusion
In this project, we have developed a text-independent speaker identification system that is a system that identifies a person who speaks regardless of what he/she is saying. Our speaker verification system consists of two sections: (i) Enrolment section to build a database of known speakers and (ii) Unknown speaker identification system. Enrollment session is also referred to as training phase while the unknown speaker identification system is also referred to as the operation session or testing phase.
In the training phase, each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. It consists of two main parts. The first part consists of processing each persons input voice sample to condense and summarize the characteristics of their vocal tracts. The second part involves pulling each person's data together into a single, easily manipulated matrix. In the testing phase, above calculated matrix is used recognition.
Future Work
Currently, this application lacks easy-to-use user interface. This application can be
extended to provide user interface and also this application can be fine tuned to meet real-
time constraint. Other techniques may be used for implementing this application to
minimize the false-acceptance and false-rejection rate.
48 | P a g e
Snapshots
Figure 34: Matlab Command Window
49 | P a g e
Figure 35: Matlab Editor
50 | P a g e
51 | P a g e
REFERENCES
1. Manfred U. A. Bromba. "Biometrics: Frequently Asked Questions" http://www.bromba.com/faq/biofaqe.htm
2. Wikipedia, the free encyclopedia. " Biometrics" http://en.wikipedia.org/wiki/Biometrics
3. Douglas A. Reynolds. “Automatic Speaker Recognition Using Gaussian Mixture Speaker Models” http://www.ll.mit.edu/publications/journal/pdf/vol08_no2/8.2.4.speakerrecognition.pdf
4. Patricia Melin, Jerica Urias, Daniel Solano, Miguel Soto, Miguel Lopez, and Oscar Castillo, “Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms”, Engineering Letters, 13:2, EL_13_2_9. http://www.engineeringletters.com/issues_v13/issue_2/EL_13_2_9.pdf
Code
52 | P a g e
speakerTest.m
function speakerTest(a) voiceboxpath='C:/Users/test/voicebox';addpath(voiceboxpath);% A speaker recognition program. a is a string of the filename to be tested against the % database of sampled voices and it will be evaluated whose voice it is. % Mike Brooks, VOICEBOX, Free toolbox for MATLab, % www.ncl.ac.uk/CPACTsoftware/MatlabLinks.html % disteusq.m, enframe.m, kmeans.m, melbankm.m, melcepst.m, rdct.m and rfft.m from % VOICEBOX are used in this program. test.data = wavread(a); % read the test file name = ['vimal';'anand']; % name of people in the database fs = 16000; % sampling frequency C = 8; % number of centroids % Load data disp('Reading data for training:') [train.data] = Load_data(name); % Calculate mel-frequecny cepstral coefficients for training set fprintf('\nCalculating mel-frequency cepstral coefficients for training set:\n') [train.cc] = mfcc(train.data,name,fs); % Perform K-means algorithm for clustering (Vector Quantization) fprintf('\nApplying Vector Quantization(K-means) for feature extraction:\n') [train.kmeans] = kmean(train.cc,C); % Calculate mel-frequecny cepstral coefficients for test set %fprintf('\nCalculating mel-frequency cepstral coefficients for test set:\n') test.cc = melcepst(test.data,fs,'x'); % Compute average distances between test.cc with all the codebooks in % database, and find the lowest distortion %fprintf('\nComputing a distance measure for each codebook.\n') [result index] = distmeasure(train.kmeans,test.cc); % Display results - average distances between the features of unknown voice % (test.cc) with all the codebooks in database and identify the person with % the lowest distance fprintf('\nDisplaying the result:\n') dispresult(name,result,index)
53 | P a g e
Load_data.m
function [data] = Load_data(name) % Training mode - Load all the wave files to database (codebooks) % data = cell(size(name,1),1); for i=1:size(name,1) temp = [name(i,:) '.wav']; tempwav = wavread(temp); data{i} = tempwav; end
distmeasure.m
function [result,index] = distmeasure(x,y) result = cell(size(x,1),1); dist = cell(size(x,1),1); mins = inf; k=size(x,2); for i = 1:size(x,1) dist{i} = disteusq(x{i}(:,1:k),y(:,1:k),'x'); temp = sum(min(dist{i}))/size(dist{i},2); result{i} = temp; if temp < mins mins = temp; index = i; end end
dispresult.m
function dispresult(x,y,z) disp('The average of Euclidean distances between database and test wave file') color = ['r'; 'g'; 'c'; 'b'; 'm'; 'k']; for i = 1:size(x,1) disp(x(i,:)) disp(y{i}) end disp('The test voice is most likely from') disp(x(z,:))
54 | P a g e
mfcc.m
function [cepstral] = mfcc(x,y,fs) % Calculate mfcc's with a frequency(fs) and store in ceptral cell. Display % y at a time when x is calculated cepstral = cell(size(x,1),1); for i = 1:size(x,1) disp(y(i,:)) cepstral{i} = melcepst(x{i},fs,'x'); end
kmean.m
function [data] = kmean(x,C) % Calculate k-means for x with C number of centroids train.kmeans.x = cell(size(x,1),1); train.kmeans.esql = cell(size(x,1),1); train.kmeans.j = cell(size(x,1),1); for i = 1:size(x,1) [train.kmeans.j{i} train.kmeans.x{i}] = kmeans(x{i}(:,1:12),C); end data = train.kmeans.x;
melcepst.m
function c=melcepst(s,fs,w,nc,p,n,inc,fl,fh)%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)%%% Simple use: c=melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames% c=melcepst(s,fs,'e0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs%% Inputs:% s speech signal% fs sample rate in Hz (default 11025)
55 | P a g e
% nc number of cepstral coefficients excluding 0'th coefficient (default 12)% n length of frame in samples (default power of 2 < (0.03*fs))% p number of filters in filterbank (default: floor(3*log(fs)) = approx 2.1 per ocatave)% inc frame increment (default n/2)% fl low end of the lowest filter as a fraction of fs (default = 0)% fh high end of highest filter as a fraction of fs (default = 0.5)%% w any sensible combination of the following:%% 'R' rectangular window in time domain% 'N' Hanning window in time domain% 'M' Hamming window in time domain (default)%% 't' triangular shaped filters in mel domain (default)% 'n' hanning shaped filters in mel domain% 'm' hamming shaped filters in mel domain%% 'p' filters act in the power domain% 'a' filters act in the absolute magnitude domain (default)%% '0' include 0'th order cepstral coefficient% 'E' include log energy% 'd' include delta coefficients (dc/dt)% 'D' include delta-delta coefficients (d^2c/dt^2)%% 'z' highest and lowest filters taper down to zero (default)% 'y' lowest filter remains at 1 down to 0 frequency and% highest filter remains at 1 up to nyquist freqency%% If 'ty' or 'ny' is specified, the total power in the fft is preserved.%% Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the% first element of each row followed by the delta and then the delta-delta% coefficients.%
if nargin<2 fs=11025; endif nargin<3 w='M'; endif nargin<4 nc=12; endif nargin<5 p=floor(3*log(fs)); endif nargin<6 n=pow2(floor(log2(0.03*fs))); endif nargin<9 fh=0.5; if nargin<8 fl=0;
56 | P a g e
if nargin<7 inc=floor(n/2); end endend
if isempty(w) w='M';endif any(w=='R') z=enframe(s,n,inc);elseif any (w=='N') z=enframe(s,hanning(n),inc);else z=enframe(s,hamming(n),inc);endf=rfft(z.');[m,a,b]=melbankm(p,n,fs,fl,fh,w);pw=f(a:b,:).*conj(f(a:b,:));pth=max(pw(:))*1E-20;if any(w=='p') y=log(max(m*pw,pth));else ath=sqrt(pth); y=log(max(m*abs(f(a:b,:)),ath));endc=rdct(y).';nf=size(c,1);nc=nc+1;if p>nc c(:,nc+1:end)=[];elseif p<nc c=[c zeros(nf,nc-p)];endif ~any(w=='0') c(:,1)=[]; nc=nc-1;endif any(w=='E') c=[log(sum(pw)).' c]; nc=nc+1;end
% calculate derivative
if any(w=='D')
57 | P a g e
vf=(4:-1:-4)/60; af=(1:-1:-1)/2; ww=ones(5,1); cx=[c(ww,:); c; c(nf*ww,:)]; vx=reshape(filter(vf,1,cx(:)),nf+10,nc); vx(1:8,:)=[]; ax=reshape(filter(af,1,vx(:)),nf+2,nc); ax(1:2,:)=[]; vx([1 nf+2],:)=[]; if any(w=='d') c=[c vx ax]; else c=[c ax]; endelseif any(w=='d') vf=(4:-1:-4)/60; ww=ones(4,1); cx=[c(ww,:); c; c(nf*ww,:)]; vx=reshape(filter(vf,1,cx(:)),nf+8,nc); vx(1:8,:)=[]; c=[c vx];end if nargout<1 [nf,nc]=size(c); t=((0:nf-1)*inc+(n-1)/2)/fs; ci=(1:nc)-any(w=='0')-any(w=='E'); imh = imagesc(t,ci,c.'); axis('xy'); xlabel('Time (s)'); ylabel('Mel-cepstrum coefficient');
map = (0:63)'/63;colormap([map map map]);colorbar;
end
58 | P a g e