Top Banner
SPEAKER RECOGNITION SYSTEMS BY NAMRATHA D’CRUZ
34

Speaker recognition systems

Jun 20, 2015

Download

Engineering

namdcruz

This presentation consists of description about speaker verification system and speaker identification system
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speaker recognition systems

SPEAKER RECOGNITION SYSTEMS

BY

NAMRATHA D’CRUZ

Page 2: Speaker recognition systems

Sub areas of speaker recognition

• Speaker verification system

• Speaker identification system

Page 3: Speaker recognition systems

Speaker recognition problem

Signal processor

Comparison distance

measurement

Decision logic

Reference patterns

x Ds(n)identificationDistance Pattern

vector

General representation of the speaker recognition problem

Page 4: Speaker recognition systems

A representation of the speech signal is obtained

using digital speech processing techniques

which preserve the features of the speech signal

that are relevant to speaker identity.

The resulting pattern is compared to previously

prepared reference patterns.

Decision logic is used to make a choice among

available alternatives

Page 5: Speaker recognition systems

For speaker verification system if we denote the PDF for the measurement vector x for the ith speaker as pi(x) then the decision rule is given by

Where ci is a constant for the ith speaker and pav(x) is the average PDF for the measurement vector x

For speaker identification system the decision rule is given by

Page 6: Speaker recognition systems

Speaker verification systemComputer verification of speakers

Block diagram of a speaker verification system

Page 7: Speaker recognition systems

Online digital speaker verification system was

developed by Rosenberg and others.

The person wishing to be verified first enters his

claimed identity.

On request from verification system utters his

verification phrase, and requests some transaction to

be made in the event he is verified.

The spoken utterance is processed to obtain a pattern

which is compared to the stored reference patterns for

the claimed identity.

Page 8: Speaker recognition systems

On the basis of the transaction requested the error mix

constant (Ci) is determined .

Based on error mix constant decision to accept or reject

is made.

Page 9: Speaker recognition systems

Accept

Reject

Signal processing aspects of the speaker verification system

Page 10: Speaker recognition systems

Signal Processing Parts Of The Speaker Verification System

End point detection system: the sample

utterances which occurs somewhere within a pre

selected time interval is located.

Pitch detector : it is used to measure the pitch

contour of the utterance.

Energy measurements: short-time energy

measurements is made to give energy contours.

Page 11: Speaker recognition systems

Signal Processing Parts Of The Speaker Verification SystemLPC analysis: is used to give predictor parameter

contours.  LPC is a tool used for representing the spectral

envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

Autocorrelation formulation method is used.

Formant analysis: estimates of the formant locations is made.

LPF: 16hz low pass is used

Page 12: Speaker recognition systems

Measurement contours for the test utterance “we were away a year ago”

Data are estimated at 100 times per second

Smoothened by 16hz LPF, linear phase, FIR

digital filter.

Page 13: Speaker recognition systems

Pitch period and intensity contours of an utterance used in speaker verification

Page 14: Speaker recognition systems

Plot of first 3 formants ,pitch and intensity for a speaker verification utterance

Page 15: Speaker recognition systems

Plots of the first 8 LPC coefficients for a speaker verification utterance

Page 16: Speaker recognition systems

After the desired parametric representation has been computed it is compared with the corresponding reference patterns for the speaker whose identity is claimed.

Speaker is generally not able to speak at precisely the same rate for different repetitions of the verification phase.

As a solution to this problem non linear time warping of the input patterns is done to obtain the best possible registration between stored pattern and the measured patterns for speakers sample utterance.

Page 17: Speaker recognition systems

Time warping

The time scale t of a reference utterance is warped so that significant events in some measurement contour a(t) line up with the same significant events in the reference contour r(t).

The warping function is assumed to be

τ=α t+q(t)

Where

q(t) - is the non linear time warp function

α – average slope of the time warp function

Page 18: Speaker recognition systems

Time warpingBoundary condition s are imposed to ensure that the

beginning and ending points of both the sample and reference utterances line up properly.

The boundary conditions are:

τ1=α t1+q(t1)

τ2=α t2+q(t2)Function q(t) and constant α have to be chosen so as to

best align the measured contours.Simpler and faster solution is to utilize the method of

dynamic programming to optimally choose a constrained warping function.

Page 19: Speaker recognition systems

Illustration of time warping

Page 20: Speaker recognition systems

Time warping

Consider time warping for a pair of contours which are

sampled at a discrete set of points .

Let the points be in the measured contour be labeled

n=1,2,…,N.

Let the points in the reference contour be labeled

m=1,2,…,M.

Time warping function w is chosen as

m=w(n)

Page 21: Speaker recognition systems

Time warpingThe boundary on w(n) conditions are:

w(1) = 1 beginning points

w(N) = M ending points To limit the degree of non linearity of the warping

function mild continuity condition is imposed That the warping function w cannot change by more

than 2 grid points at any index n

w(n+1)-w(n) = 0,1,2 if w(n) != w(n-1)

= 1,2 if w(n) = w(n-1)Thus slope of warping function is either 0,1 or 2

Page 22: Speaker recognition systems

Time warping

To determine which of the conditions of equation to use

at grid index n requires the use of similarity measure

between the reference data measured at grid index n and

the test data measured at grid index m.

The similarity measure is used to determine the path of

the warping function which minimizes the max total

distance ,subject to constraints of continuity equation.

Page 23: Speaker recognition systems

An example of a typical time warping

Page 24: Speaker recognition systems

Time warping

Figure shows the possible grid coordinates (n,m) and a warping function w(n).

Consider N = 20 reference and M = 15 test utterance.Because of continuity constraints the warping function

must lie within the parallelogram.The final step is to compute overall distance measures

and then compare the distance to an appropriately chosen threshold.

The simplest distance contour measure is a normalized sum of squares .

Page 25: Speaker recognition systems

Distance measure

For the jth measurement contour ,the distance dj would be of the form

Where ajs (i) is the value of the jth measurement contour at time i

ajr (i) ) is the value of the jth reference contour at time

i, and σaj(i) is the standard deviation of the jth measurement at time i

Page 26: Speaker recognition systems

Distance measureThe distance function is given by

Where wj is the jth weight chosen on the basis of the effectiveness of the jth measurement in verifying the speaker.

Page 27: Speaker recognition systems

SPEAKER IDENTIFICATION SYSTEMS

Almost similar to the speaker verification systems

Main difference is choice of parameters to make

distance measurements.

N distance measurements have to be made rather than 1.

Final decision is to choose the speaker whose reference

patterns are closest in distance to the sample patterns.

Page 28: Speaker recognition systems

SPEAKER IDENTIFICATION SYSTEMS

More sophisticated and robust distance measure is used.Let x be an L- dimensional column vector representing

input pattern , in which the kth component of x is the kth measurement.

It is assumed that joint PDF of the measurements for the ith speaker is a multi dimensional Gaussian distribution with mean mi and covariance matrix wi. Thus ,the L-dimensional Gaussian density function for x is given by

Page 29: Speaker recognition systems

SPEAKER IDENTIFICATION SYSTEMS

Where is the inverse of the matrix (assuming is non singular),| | is the determinant of , and the t denotes the transpose of a vector. The decision rule which minimizes the probability of error states that the measurement vector X should be assigned to class i if

Where pi is the priori probability that belongs to the ith class. Since ln y is a monotonically increasing function of its argument y, the decision rule can be simplified as

Decide class i if

Page 30: Speaker recognition systems

SPEAKER IDENTIFICATION SYSTEMS

The bias term does not provide any advantage over the decision rule . Thus the distance measure is defined as

The mean and covariance vector is defined as

Page 31: Speaker recognition systems

Examples of some measured parameters

Page 32: Speaker recognition systems

Speaker identification accuracy

Page 33: Speaker recognition systems

Speaker identification accuracy(using cepstrum parameters)

Page 34: Speaker recognition systems