Top Banner
MAJOR PROJECT FINAL PRESENTATION : TEXT PROMPTED REMOTE SPEAKER AUTHENTICATION Project Members: Ganesh Tiwari (75010) Madhav Pandey(75014) Manoj Shrestha(75018) Project Supervisor : Dr. Subarna Shakya Associate Professor Internal Examiner: Er. Manoj Ghimire External Examiner Er. Bimal Acharya Tribhuvan University Institute of Engineering Pulchowk Campus Department of Electronics and Computer Engineering
31

Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Dec 05, 2014

Download

Technology

gt_ebuddy

Joint Speech and Speaker Recognition using Hidden Markov Model/Vector Quantization for speaker independent Speech Recognition and Gaussian Mixture Model for speech independent speaker recognition- used MFCC (Mel-Frequency Cepstral Coefficient) for Feature Extraction (delta,delta delta and energy - 39 coefficients).
Developed in JAVA with client/server Architecture, web interface developed in Adobe Flex.
This project was done at TU, IOE - Pulchowk Campus, Nepal.
For more details visit http://ganeshtiwaridotcomdotnp.blogspot.com

ABSTRACT OF PROJECT>>>

Biometric is physical characteristic unique to each individual. It has a very useful application in authentication and access control.
The designed system is a text-prompted version of voice biometric which incorporates text-independent speaker verification and speaker-independent speech verification system implemented independently. The foundation for this joint system is that the speech signal conveys both the speech content and speaker identity. Such systems are more-secure from playback attack, since the word to speak during authentication is not previously set.
During the course of the project various digital signal processing and pattern classification algorithms were studied. Short time spectral analysis was performed to obtain MFCC, energy and their deltas as feature. Feature extraction module is same for both systems. Speaker modeling was done by GMM and Left to Right Discrete HMM with VQ was used for isolated word modeling. And results of both systems were combined to authenticate the user.
The speech model for each word was pre-trained by using utterance of 45 English words. The speaker model was trained by utterance of about 2 minutes each by 15 speakers. While uttering the individual words, the recognition rate of the speech recognition system is 92 % and speaker recognition system is 66%. For longer duration of utterance (>5sec) the recognition rate of speaker recognition system improves to 78%.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

MAJOR PROJECT FINAL PRESENTATION :

TEXT PROMPTED REMOTE

SPEAKER AUTHENTICATION

Project Members:

Ganesh Tiwari (75010)

Madhav Pandey(75014)

Manoj Shrestha(75018)

Project Supervisor :

Dr. Subarna Shakya

Associate Professor

Internal Examiner:

Er. Manoj Ghimire

External Examiner

Er. Bimal Acharya

Tribhuvan University

Institute of Engineering

Pulchowk Campus

Department of Electronics and Computer Engineering

Page 2: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

INTRODUCTION

Voice biometric system

User login

Text-Prompted system

Claimant is asked to speak a prompted(random) text

Speech and Speaker Recognition

Why Text prompted ?

Playback attack

Page 3: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

OUR SYSTEM

Feature : MFCC

Modeling and Classifications : both statistical

GMM - Speaker Modeling :

HMM/VQ - Speech Modeling :

Page 4: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

PROPERTIES OF SPEECH SIGNAL

Carries both Speech Content and Speaker identity

What makes Speech Signal Unique ?

Each phoneme resonates at its own fundamental frequency

and harmonics of it

Studied over short period : short time spectral analysis

What is Speaker Dependent information

Fundamental frequency, primarily

function of the dimensions and tension of the vocal chords

size and shape of the mouth, throat, nose, and teeth

Studied over long period : all the variations from that speaker

Page 5: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

UNIQUENESS IN PHONEME

0 500 1000 1500 2000 2500-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Samples

Am

plitu

de

Phoneme /ah/

Phoneme /i:/

Page 6: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Pre-Processing and Feature Extraction

Page 7: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

PREPROCESSING : STEPS

1)Silence Removal

0 1 2 3 4 5 6 7 8 9

x 104

-1

-0.5

0

0.5

1

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

-1

-0.5

0

0.5

1

Silence Signal

Silence Removed

Page 8: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

PREPROCESSING :STEPS (CONTD..)

1)Silence Removal2)Pre-Emphasis

0 2000 4000 6000 8000 10000 120000

0.01

0.02

0.03

0.04

0.05

Frequency (Hz)

|Y(f

)|

0 2000 4000 6000 8000 10000 120000

1

2

3

4

5x 10

-3

Frequency (Hz)

|Y(f

)|

Boosted high

Frequencies

Suppressed high

Frequencies

Page 9: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

1)Silence Removal2)Pre-Emphasis3)Framing

50% overlapped, 23ms

PREPROCESSING :STEPS (CONTD..)

Page 10: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

1)Silence Removal2)Pre-Emphasis3)Framing 4)Windowing

0 10 20 30 40 50 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Hamming Window

0 200 400 600 800 1000 1200-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0 200 400 600 800 1000 1200-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

PREPROCESSING :STEPS (CONTD..)

Hamming Window

Windowed Signal

Page 11: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

FEATURE EXTRACTION

MFCC : Mel Filter Cepstral Coefficients

Perceptual approach

Human Ear processes audio signal in Mel scale

Mel scale : linear up to 1KHz and logarithmic after

1KHz

Page 12: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

MFCC EXTRACTION: (CONTD..)

Steps :

FFT Mel Filter Log DCT CMS

Mel Filter : 12 Filtering of absolute fft coefficients using triangular filter bank in

Mel scale

MFCC gives distribution of energy acc. to filters in Mel frequency band

Mel Filter Bank

Page 13: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

EXTRA FEATURES :ENERGY AND DELTAS

For achieving high recognition rate

A Energy Feature

Delta and Delta-Delta

delta velocity feature

double delta acceleration feature

Co-articulation

Page 14: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

COMPOSITION OF FEATURE VECTOR

12 MFCC Features

12 Δ MFCC

12 Δ Δ MFCC

1 Energy Feature

1 Δ Energy

1 Δ Δ Energy

39 Features from each frame

Page 15: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Speech Recognition/Verification by

HMM/VQ

Page 16: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

HIDDEN MARKOV MODEL (HMM)

HMM is the extension of Markov Process

Markov Process consist of observable states

HMM has hidden states and observable symbols

per states

HMM is the stochastic model

Page 17: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

HMM (CONTD…)

Parameters

1) The initial state distribution (π)

2) State transition probability distribution (A)

3) Observation symbol probability distribution (B)

The HMM Model

(A,B,)

Page 18: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

EXAMPLE:

PRONUNCIATION MODEL OF WORD TOMATO

(A,B,)

Page 19: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

HMM IMPLEMENTATION

Feature Vector observation symbols , 256

Phonemes hidden states, 6

Left to right HMM

Discrete Hidden Markov Model (DHMM) with

Vector Quantization (VQ) technique

Page 20: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

SPEECH RECOGNITION SYSTEM

Page 21: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

VECTOR QUANTIZATION

Page 22: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Speaker Recognition/Verification by

GMM

Page 23: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

SPEAKER VERIFICATION SYSTEM

Page 24: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

SPEAKER MODELING (GMM)

Gaussian Mixture Model

Parametric probability density function

Based on soft clustering technique

Mixture of Gaussian components

= (𝑤𝑚, 𝜇 𝑚 , 𝐶𝑚)

Page 25: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

SPEAKER MODEL TRAINING

Estimate the model parameters

Expectation Maximization algorithm

Page 26: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

SPEAKER VERIFICATION

Based on likelihood ratio

= 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑆 𝑐𝑜𝑚𝑒𝑠 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑠𝑝𝑒𝑎𝑘𝑒𝑟′𝑠 𝑚𝑜𝑑𝑒𝑙

𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑆 𝑐𝑜𝑚𝑒𝑠 𝑓𝑟𝑜𝑚 𝑖𝑚𝑝𝑜𝑠𝑡𝑒𝑟′𝑠 𝑚𝑜𝑑𝑒𝑙

Page 27: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

TOOLS USED

Languages: Adobe Flex

Java

Blaze DS for RPC

Servers: Apache Tomcat

MySQL

Versioning Tortoise SVN

Page 28: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

OUTPUT : SNAPSHOT (GUI)

Page 29: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

APPLICATION AREAS

Telephone transaction

Telephone credit card purchase,

Telephone stock trading

Access control

Physical facilities

Computer networks

Information retrieval

Customers information

Forensics

Voice sample matching

Page 30: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

LIMITATION AND FUTURE ENHANCEMENT

Noise reduction

Training on more data

Combine with

other features

other classification methods

Page 31: Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recognition/Verification System Final Presentation Slide

Thanks

Any queries ?