Speaker recognition systems

SPEAKER RECOGNITION SYSTEMS

BY

NAMRATHA D’CRUZ

Sub areas of speaker recognition

• Speaker verification system

• Speaker identification system

Speaker recognition problem

Signal processor

Comparison distance

measurement

Decision logic

Reference patterns

x Ds(n)identificationDistance Pattern

vector

General representation of the speaker recognition problem

A representation of the speech signal is obtained

using digital speech processing techniques

which preserve the features of the speech signal

that are relevant to speaker identity.

The resulting pattern is compared to previously

prepared reference patterns.

Decision logic is used to make a choice among

available alternatives

For speaker verification system if we denote the PDF for the measurement vector x for the ith speaker as pi(x) then the decision rule is given by

Where ci is a constant for the ith speaker and pav(x) is the average PDF for the measurement vector x

For speaker identification system the decision rule is given by

Speaker verification systemComputer verification of speakers

Block diagram of a speaker verification system

Online digital speaker verification system was

developed by Rosenberg and others.

The person wishing to be verified first enters his

claimed identity.

On request from verification system utters his

verification phrase, and requests some transaction to

be made in the event he is verified.

The spoken utterance is processed to obtain a pattern

which is compared to the stored reference patterns for

the claimed identity.

On the basis of the transaction requested the error mix

constant (Ci) is determined .

Based on error mix constant decision to accept or reject

is made.

Accept

Reject

Signal processing aspects of the speaker verification system

Signal Processing Parts Of The Speaker Verification System

End point detection system: the sample

utterances which occurs somewhere within a pre

selected time interval is located.

Pitch detector : it is used to measure the pitch

contour of the utterance.

Energy measurements: short-time energy

measurements is made to give energy contours.

Signal Processing Parts Of The Speaker Verification SystemLPC analysis: is used to give predictor parameter

contours. LPC is a tool used for representing the spectral

envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

Autocorrelation formulation method is used.

Formant analysis: estimates of the formant locations is made.

LPF: 16hz low pass is used

Measurement contours for the test utterance “we were away a year ago”

Data are estimated at 100 times per second

Smoothened by 16hz LPF, linear phase, FIR

digital filter.

Pitch period and intensity contours of an utterance used in speaker verification

Plot of first 3 formants ,pitch and intensity for a speaker verification utterance

Plots of the first 8 LPC coefficients for a speaker verification utterance

After the desired parametric representation has been computed it is compared with the corresponding reference patterns for the speaker whose identity is claimed.

Speaker is generally not able to speak at precisely the same rate for different repetitions of the verification phase.

As a solution to this problem non linear time warping of the input patterns is done to obtain the best possible registration between stored pattern and the measured patterns for speakers sample utterance.

Time warping

The time scale t of a reference utterance is warped so that significant events in some measurement contour a(t) line up with the same significant events in the reference contour r(t).

The warping function is assumed to be

τ=α t+q(t)

Where

q(t) - is the non linear time warp function

α – average slope of the time warp function

Time warpingBoundary condition s are imposed to ensure that the

beginning and ending points of both the sample and reference utterances line up properly.

The boundary conditions are:

τ1=α t1+q(t1)

τ2=α t2+q(t2)Function q(t) and constant α have to be chosen so as to

best align the measured contours.Simpler and faster solution is to utilize the method of

dynamic programming to optimally choose a constrained warping function.

Illustration of time warping

Time warping

Consider time warping for a pair of contours which are

sampled at a discrete set of points .

Let the points be in the measured contour be labeled

n=1,2,…,N.

Let the points in the reference contour be labeled

m=1,2,…,M.

Time warping function w is chosen as

m=w(n)

Time warpingThe boundary on w(n) conditions are:

w(1) = 1 beginning points

w(N) = M ending points To limit the degree of non linearity of the warping

function mild continuity condition is imposed That the warping function w cannot change by more

than 2 grid points at any index n

w(n+1)-w(n) = 0,1,2 if w(n) != w(n-1)

= 1,2 if w(n) = w(n-1)Thus slope of warping function is either 0,1 or 2

Time warping

To determine which of the conditions of equation to use

at grid index n requires the use of similarity measure

between the reference data measured at grid index n and

the test data measured at grid index m.

The similarity measure is used to determine the path of

the warping function which minimizes the max total

distance ,subject to constraints of continuity equation.

An example of a typical time warping

Time warping

Figure shows the possible grid coordinates (n,m) and a warping function w(n).

Consider N = 20 reference and M = 15 test utterance.Because of continuity constraints the warping function

must lie within the parallelogram.The final step is to compute overall distance measures

and then compare the distance to an appropriately chosen threshold.

The simplest distance contour measure is a normalized sum of squares .

Distance measure

For the jth measurement contour ,the distance dj would be of the form

Where ajs (i) is the value of the jth measurement contour at time i

ajr (i) ) is the value of the jth reference contour at time

i, and σaj(i) is the standard deviation of the jth measurement at time i

Distance measureThe distance function is given by

Where wj is the jth weight chosen on the basis of the effectiveness of the jth measurement in verifying the speaker.

SPEAKER IDENTIFICATION SYSTEMS

Almost similar to the speaker verification systems

Main difference is choice of parameters to make

distance measurements.

N distance measurements have to be made rather than 1.

Final decision is to choose the speaker whose reference

patterns are closest in distance to the sample patterns.


More sophisticated and robust distance measure is used.Let x be an L- dimensional column vector representing

input pattern , in which the kth component of x is the kth measurement.

It is assumed that joint PDF of the measurements for the ith speaker is a multi dimensional Gaussian distribution with mean mi and covariance matrix wi. Thus ,the L-dimensional Gaussian density function for x is given by


Where is the inverse of the matrix (assuming is non singular),| | is the determinant of , and the t denotes the transpose of a vector. The decision rule which minimizes the probability of error states that the measurement vector X should be assigned to class i if

Where pi is the priori probability that belongs to the ith class. Since ln y is a monotonically increasing function of its argument y, the decision rule can be simplified as

Decide class i if


The bias term does not provide any advantage over the decision rule . Thus the distance measure is defined as

The mean and covariance vector is defined as

Examples of some measured parameters

Speaker identification accuracy

Speaker identification accuracy(using cepstrum parameters)

Speaker recognition systems

Engineering

speaker identity

ith speaker

speaker verificationutterance

time warping function

illustration of time

time scale t

speakerverification

shorttime energymeasurements