SPEAKER RECOGNITION SYSTEMS BY NAMRATHA D’CRUZ
Jun 20, 2015
SPEAKER RECOGNITION SYSTEMS
BY
NAMRATHA D’CRUZ
Sub areas of speaker recognition
• Speaker verification system
• Speaker identification system
Speaker recognition problem
Signal processor
Comparison distance
measurement
Decision logic
Reference patterns
x Ds(n)identificationDistance Pattern
vector
General representation of the speaker recognition problem
A representation of the speech signal is obtained
using digital speech processing techniques
which preserve the features of the speech signal
that are relevant to speaker identity.
The resulting pattern is compared to previously
prepared reference patterns.
Decision logic is used to make a choice among
available alternatives
For speaker verification system if we denote the PDF for the measurement vector x for the ith speaker as pi(x) then the decision rule is given by
Where ci is a constant for the ith speaker and pav(x) is the average PDF for the measurement vector x
For speaker identification system the decision rule is given by
Speaker verification systemComputer verification of speakers
Block diagram of a speaker verification system
Online digital speaker verification system was
developed by Rosenberg and others.
The person wishing to be verified first enters his
claimed identity.
On request from verification system utters his
verification phrase, and requests some transaction to
be made in the event he is verified.
The spoken utterance is processed to obtain a pattern
which is compared to the stored reference patterns for
the claimed identity.
On the basis of the transaction requested the error mix
constant (Ci) is determined .
Based on error mix constant decision to accept or reject
is made.
Accept
Reject
Signal processing aspects of the speaker verification system
Signal Processing Parts Of The Speaker Verification System
End point detection system: the sample
utterances which occurs somewhere within a pre
selected time interval is located.
Pitch detector : it is used to measure the pitch
contour of the utterance.
Energy measurements: short-time energy
measurements is made to give energy contours.
Signal Processing Parts Of The Speaker Verification SystemLPC analysis: is used to give predictor parameter
contours. LPC is a tool used for representing the spectral
envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.
Autocorrelation formulation method is used.
Formant analysis: estimates of the formant locations is made.
LPF: 16hz low pass is used
Measurement contours for the test utterance “we were away a year ago”
Data are estimated at 100 times per second
Smoothened by 16hz LPF, linear phase, FIR
digital filter.
Pitch period and intensity contours of an utterance used in speaker verification
Plot of first 3 formants ,pitch and intensity for a speaker verification utterance
Plots of the first 8 LPC coefficients for a speaker verification utterance
After the desired parametric representation has been computed it is compared with the corresponding reference patterns for the speaker whose identity is claimed.
Speaker is generally not able to speak at precisely the same rate for different repetitions of the verification phase.
As a solution to this problem non linear time warping of the input patterns is done to obtain the best possible registration between stored pattern and the measured patterns for speakers sample utterance.
Time warping
The time scale t of a reference utterance is warped so that significant events in some measurement contour a(t) line up with the same significant events in the reference contour r(t).
The warping function is assumed to be
τ=α t+q(t)
Where
q(t) - is the non linear time warp function
α – average slope of the time warp function
Time warpingBoundary condition s are imposed to ensure that the
beginning and ending points of both the sample and reference utterances line up properly.
The boundary conditions are:
τ1=α t1+q(t1)
τ2=α t2+q(t2)Function q(t) and constant α have to be chosen so as to
best align the measured contours.Simpler and faster solution is to utilize the method of
dynamic programming to optimally choose a constrained warping function.
Illustration of time warping
Time warping
Consider time warping for a pair of contours which are
sampled at a discrete set of points .
Let the points be in the measured contour be labeled
n=1,2,…,N.
Let the points in the reference contour be labeled
m=1,2,…,M.
Time warping function w is chosen as
m=w(n)
Time warpingThe boundary on w(n) conditions are:
w(1) = 1 beginning points
w(N) = M ending points To limit the degree of non linearity of the warping
function mild continuity condition is imposed That the warping function w cannot change by more
than 2 grid points at any index n
w(n+1)-w(n) = 0,1,2 if w(n) != w(n-1)
= 1,2 if w(n) = w(n-1)Thus slope of warping function is either 0,1 or 2
Time warping
To determine which of the conditions of equation to use
at grid index n requires the use of similarity measure
between the reference data measured at grid index n and
the test data measured at grid index m.
The similarity measure is used to determine the path of
the warping function which minimizes the max total
distance ,subject to constraints of continuity equation.
An example of a typical time warping
Time warping
Figure shows the possible grid coordinates (n,m) and a warping function w(n).
Consider N = 20 reference and M = 15 test utterance.Because of continuity constraints the warping function
must lie within the parallelogram.The final step is to compute overall distance measures
and then compare the distance to an appropriately chosen threshold.
The simplest distance contour measure is a normalized sum of squares .
Distance measure
For the jth measurement contour ,the distance dj would be of the form
Where ajs (i) is the value of the jth measurement contour at time i
ajr (i) ) is the value of the jth reference contour at time
i, and σaj(i) is the standard deviation of the jth measurement at time i
Distance measureThe distance function is given by
Where wj is the jth weight chosen on the basis of the effectiveness of the jth measurement in verifying the speaker.
SPEAKER IDENTIFICATION SYSTEMS
Almost similar to the speaker verification systems
Main difference is choice of parameters to make
distance measurements.
N distance measurements have to be made rather than 1.
Final decision is to choose the speaker whose reference
patterns are closest in distance to the sample patterns.
SPEAKER IDENTIFICATION SYSTEMS
More sophisticated and robust distance measure is used.Let x be an L- dimensional column vector representing
input pattern , in which the kth component of x is the kth measurement.
It is assumed that joint PDF of the measurements for the ith speaker is a multi dimensional Gaussian distribution with mean mi and covariance matrix wi. Thus ,the L-dimensional Gaussian density function for x is given by
SPEAKER IDENTIFICATION SYSTEMS
Where is the inverse of the matrix (assuming is non singular),| | is the determinant of , and the t denotes the transpose of a vector. The decision rule which minimizes the probability of error states that the measurement vector X should be assigned to class i if
Where pi is the priori probability that belongs to the ith class. Since ln y is a monotonically increasing function of its argument y, the decision rule can be simplified as
Decide class i if
SPEAKER IDENTIFICATION SYSTEMS
The bias term does not provide any advantage over the decision rule . Thus the distance measure is defined as
The mean and covariance vector is defined as
Examples of some measured parameters
Speaker identification accuracy
Speaker identification accuracy(using cepstrum parameters)