Top Banner
1 Unsupervised Unsupervised adaptation for adaptation for speaker speaker detection detection Jean-François Bonastre LIA, Avignon [email protected] IBM Seminar 14th September 2006 J.F. Bonastre, IBM Seminar, September 14th 2006 2 Outline Outline Introduction to UBM/GMM ALIZE/LIA_SpkDet toolkit Unsupervised adaptation: why and how ? Soft, continuous speaker model adaptation NIST SRE is really a speaker detection task ? Artificially modified impostor voice
25

Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

Apr 19, 2018

Download

Documents

vohuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

1

UnsupervisedUnsupervised adaptation for adaptation for speaker speaker detectiondetection

Jean-François BonastreLIA, Avignon

[email protected]

IBM Seminar 14th September 2006

J.F. Bonastre, IBM Seminar, September 14th 2006 2

OutlineOutline

Introduction to UBM/GMMALIZE/LIA_SpkDet toolkit

Unsupervised adaptation: why and how ?

Soft, continuous speaker model adaptationNIST SRE is really a speaker detection task ?

Artificially modified impostor voice

Page 2: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

2

J.F. Bonastre, IBM Seminar, September 14th 2006 3

Introduction to UBM/GMMIntroduction to UBM/GMM

Speaker Speaker detectiondetection tasktask (NIST, 1conv)(NIST, 1conv)

Joe ?

DevTest segmentTrain seg. (1 by spk)

Eval database

DecisionYes/NoScore

Test segmentJoe TrainSegment

Eval database

More at www.nist.gov/speech

J.F. Bonastre, IBM Seminar, September 14th 2006 4

Introduction to UBM/GMMIntroduction to UBM/GMM

UBM/GMM UBM/GMM ApproachApproach

UBM

Ac param EM-ML

A set of speakers

Target speakermodel

Ac param

1 locuteur X

AdaptationTraining

Ac param

Test segment Y

comparison

Decision

Test

Page 3: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

3

J.F. Bonastre, IBM Seminar, September 14th 2006 5

Introduction to UBM/GMMIntroduction to UBM/GMM

ALIZE/ALIZE/LIA_SpkDetLIA_SpkDet toolkittoolkitClassical Gaussian Mixture toolkit

EM ML/MAP, Gaussian component sharingSmall HMM / Viterbi (designed for segmentation)

ALIZE = low level model/featureLIA_SpkDet = « High Level » system

Current LIA research system (evaluated during NIST-SRE)Feature norm/warping/mappingZ/H/T NormBayes factor Analysys soon…Design For NIST and Demos

Open source www.lia.univ-avignon.fr/heberges/ALIZE

J.F. Bonastre, IBM Seminar, September 14th 2006 6

Introduction to UBM/GMMIntroduction to UBM/GMM

ALIZE/ALIZE/LIA_SpkDetLIA_SpkDet toolkittoolkit

Page 4: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

4

J.F. Bonastre, IBM Seminar, September 14th 2006 7

Unsupervised adaptation: Unsupervised adaptation: Why ?Why ? and How ?and How ?

In NIST SRE 1side.1side task, only one training segment is available by speaker

Could be short, noisy, or content specificOnly one session

In real world commercial applicationsDifficult to request a lot of time to a clientInteresting to launch the system with few data and to improve the speaker representationDifficult to have multiple sessions for a given speaker training set

J.F. Bonastre, IBM Seminar, September 14th 2006 8

Unsupervised adaptation: Unsupervised adaptation: Why ? and Why ? and HowHow ??

Build X, a basic speaker model using the available training dataWhen a verification is requested, Y1

Compute the score between X and Y1

If (score > AdaptT) adapt X using Y1 = X1

else X1 =X

When a second query Y2 is commingcompute score (Y2 ,X1 )

If (score > AdaptT) adapt …See Barras (OD04), Mirghafori (ICSLP02), van Leeuwen (NIST04)

Page 5: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

5

J.F. Bonastre, IBM Seminar, September 14th 2006 9

Unsupervised adaptation: Unsupervised adaptation: Why ?Why ? and How ?and How ?

ORACLE performance are very goodUnsupervised Online Adaptation for Speaker Verificationover the Telephone, Claude Barras, Sylvain Meignier, Jean-Luc GauvainSpeaker Odyssey 2004

J.F. Bonastre, IBM Seminar, September 14th 2006 10

Unsupervised adaptation: Unsupervised adaptation: Why ?Why ? and how ?and how ?

Unsupervised model adaptation for speaker verification, Alexandre Preti, Jean-François Bonastre, ICSLP04

Baseline

Oracle

Page 6: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

6

J.F. Bonastre, IBM Seminar, September 14th 2006 11

Unsupervised adaptation: Unsupervised adaptation: Why ? and Why ? and HowHow ??

Problem: Results during the evaluation campaigns are quite poor

J.F. Bonastre, IBM Seminar, September 14th 2006 12

Unsupervised adaptation: Unsupervised adaptation: Why ? and Why ? and HowHow ??

Unsupervised model adaptation for speaker verification, Alexandre Preti, Jean-François Bonastre, ICSLP04

BaselineAnd Hard Decision

adaptation

Oracle

Page 7: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

7

J.F. Bonastre, IBM Seminar, September 14th 2006 13

Unsupervised adaptation: Unsupervised adaptation: Why ? and Why ? and HowHow ??

Problem: Results during the evaluation campaigns are quite poorWhy ?

The adaptation is done only when we have a good scoreI.e a good matching between the current speaker model and the test file

A model is adapted if the initial model is good enoughIf the mismatch between the sessions is not too largeGood (only) if used on large dataset (not test by test adaptation)…

Problem with NIST protocol

J.F. Bonastre, IBM Seminar, September 14th 2006 14

Unsupervised adaptation: Unsupervised adaptation: Why ? and Why ? and HowHow ??

Area for a sure hard decision

In interest area

When we are « sure »to not make a mistake

We don’t have enough client tests !

Page 8: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

8

J.F. Bonastre, IBM Seminar, September 14th 2006 15

Unsupervised adaptation: Unsupervised adaptation: Why ? and Why ? and HowHow ??

Suppress the hard decisionNo decision = continious adaptationAdaptation with good client dataBut also with bad, impostor data

Weighted adaptation

Soft, continuous speaker model Soft, continuous speaker model adaptationadaptation

Ongoing work - Unpublished

Alexandre Preti and JF Bonastre

Page 9: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

9

J.F. Bonastre, IBM Seminar, September 14th 2006 17

The idea The idea

Always adapt the client models with incoming dataBut weight the data by p(X=Y)A posteriori probability (and not LLK)P(X=Y| LRx)

What is needed?Traditional Score: LLR(test data/ client model)Something to transform the score in p(y|x)

J.F. Bonastre, IBM Seminar, September 14th 2006 18

WMAPWMAPP(X=Y| LRx)= p(LRx | X=Y).p(X=Y) ;

p(LRx| X=Y).p(X=Y) + p(LRx|X ≠ Y).p(X ≠ Y)

p(X=Y) = the prior probability of a targetp(X≠Y) = the prior probability of an impostorp(LRx| X=Y) = the score on the target distributionp(LRx| X ≠ Y) = the score on the impostor distribution

We model the target/impostor score distributions by a GMM learned on a dev set.

Page 10: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

10

J.F. Bonastre, IBM Seminar, September 14th 2006 19

WMAPWMAP

Scores

Probability of a target (weight)

J.F. Bonastre, IBM Seminar, September 14th 2006 20

Updating a modelUpdating a model

New target model =

MAP ﴾UBM, {selected trials ; weights}+ initial training data)

Weights are integrated in the EM/ML estimation (thanks to ALIZE toolkit)

Page 11: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

11

J.F. Bonastre, IBM Seminar, September 14th 2006 21

ProtocolsProtocols

2 different protocols:BATCH: all the trials involving a target are used to adapt its model

The model is adapted before to compute the final scores

NIST: SRE unsupervised adaptation modeThe update of a speaker model is allowed only with (this speaker) previous trial segments

Obviously more adaptation data for the BATCH protocol(done ndx line by ndx line)

J.F. Bonastre, IBM Seminar, September 14th 2006 22

TNORM Score NormalizationTNORM Score NormalizationBasic TNORM (with 2.5 min of train data) is not well suited for the BATCH protocol

The amount of train data is now far from 2.5 min.Need of a TNORM with different length of training data

All target of NIST SRE 2004 using unsupervised adaptation

As the amount of train data is limited in the NIST protocol, basic T-NORM should perform well (160 targets of NIST SRE 2004).

Page 12: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

12

J.F. Bonastre, IBM Seminar, September 14th 2006 23

Results: BATCH ProtocolResults: BATCH Protocolscore distributions are learnt on the NIST SRE 2004 for NIST SRE 2005 results.

10% DCF relative improvement

35% EER relative improvement

J.F. Bonastre, IBM Seminar, September 14th 2006 24

Results: NIST ProtocolResults: NIST Protocol

15% DCF relative improvement

33% EER relative improvement

Page 13: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

13

J.F. Bonastre, IBM Seminar, September 14th 2006 25

Analysis, target trialsAnalysis, target trials

Baseline

Nist Unsup A.

Total

Accepted

Non Tar

1103 115 978

1170 130 1040

Target trial = 1231

Batch A. 1064 89 975

We acceptmore target trials

We rejectmore impostor trials

J.F. Bonastre, IBM Seminar, September 14th 2006 26

Conclusion Conclusion Ongoing work Unsupervised adaptation method without hard decision thresholdDCF and EER improvements

Larger for commercial application as the ratio target/impostor is better !

Two protocols: same gain but two different behavioursOther problems should be addressed

Score normalisation (using score models ?)Threshold estimation/adaptation

Page 14: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

14

NISTNIST--SRESREIs it really a speaker detection Is it really a speaker detection task ?task ?

J.F. Bonastre, IBM Seminar, September 14th 2006 28

Is it really a speaker detection Is it really a speaker detection task ?task ?

The evaluation campaign is a very interesting framework for research

Large datasetClear protocolsFocusing one 1 problem = large improvements year after year

ButOne sort of dataOne taskFew analysis of the results in term of data/phonetic/knowledge (focused only on “performance”)

Page 15: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

15

J.F. Bonastre, IBM Seminar, September 14th 2006 29

Is it really a speaker detection Is it really a speaker detection task ?task ?

Improvements thanks toHNorm = ZNorm+ something linked to 2 sort of phonesTnorm = environment/phone mismatchFeature warping/mapping…

Also reducing environment/phone mismatch(But have an effect on the classifiers)

J.F. Bonastre, IBM Seminar, September 14th 2006 30

Is it really a speaker detection Is it really a speaker detection task ?task ?Example: Bayesian Factor Analysis

(Patrick Kenny)Quite recent but already large improvementsDedicated (currently) to channel effectsAt least 10 labs are implementing PK approach or something close to thatWho tested if the channel is the key factor ?

Page 16: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

16

J.F. Bonastre, IBM Seminar, September 14th 2006 31

Artificially modified impostor Artificially modified impostor voicevoice

We are all using the UBM/GMM since x yearsWhich sort of information is modeled by UBM/GMM classifiers ?

ExperimentIf we know a voice example of a targeted speaker If we know the speaker recognition techniqueIs it possible to transform the voice of someone else in order to cheat the system ?

TRANSFER FUNCTION-BASED VOICE TRANSFORMATION FORSPEAKER RECOGNITION, Jean-François Bonastre, Driss Matrouf, Corinne Fredouille,Speaker Odyssey 2006

J.F. Bonastre, IBM Seminar, September 14th 2006 32

ExperimentalExperimental contextcontextNIST-SRE 2005 (1conv-1conv), male only

1231 target tests12317 non target tests

Test (segment Y, client S)Classical test: Y=S ?If it is a non target test, transform Y using S

Y’=Vtrans(Y,S)New test (segment Y’, client S) : Y’=S ?

Page 17: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

17

J.F. Bonastre, IBM Seminar, September 14th 2006 33

VoiceVoice transformationtransformationFrame by frame + Overlap-addEstimate a new cepstral target for an impostor frame y, as a combination of the component means of the target speaker modelWith the cepstral target -> transfer functionFiltering in order to change the original transfer function by the new oneOther parameters are taken in the original signal

J.F. Bonastre, IBM Seminar, September 14th 2006 34

ResultsResults (1)(1)ImpostorImpostor score score disributiondisribution

Normal (1)

Transformed (2) usinga non train segment for X

Transformed (3) usingspeaker X training seg

Page 18: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

18

J.F. Bonastre, IBM Seminar, September 14th 2006 35

ResultsResults (2)(2)DetDet andand errorerror ratesrates

False Acceptance

Mis

s

1

2 3F. Accept. Miss Det.

0.88 %

49.72 %

96.55 %

27.45 %

27.45 %

27.45 %

1 - baseline

2 - !=

3 - =

FA from 0.88 % to 96.55 % !!(same threshold)

J.F. Bonastre, IBM Seminar, September 14th 2006 36

ExamplesExamples

7396 8049

NCFB_A -1.94 4.84 0.46 5.47

French BN (Alain Passerel)Driss Redragui Fabrice Drouelle Franck Mathevon Joel Collado

NIST SRE

Page 19: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

19

J.F. Bonastre, IBM Seminar, September 14th 2006 37

ConclusionConclusionWe are using efficient UBM/GMM systems

But we don’t know which information is usedWe know now it is possible to cheat the system

Need of caution for Forensic/National security applicationsBonastre et al Eurospeech 2003

In this experiment, we used a knowledge ofThe feature extractionThe method (UBM/GMM, number of components, of top)The world model

To be extended

J.F. Bonastre, IBM Seminar, September 14th 2006 38

ThanksThanks !!

For inviting meFor the attention

Questions ?

Page 20: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

20

J.F. Bonastre, IBM Seminar, September 14th 2006 39

Annex Annex UBMUBM--GMM and GMM and protocolprotocol

J.F. Bonastre, IBM Seminar, September 14th 2006 40

ParadigmParadigmP (target speaker | speech data)Bayesian Hypothesis Test: LR

UBM (Universal Background Model) represents the inverse hypothesis

It is usually learned with hundreds of hours of speechThe EM algorithm and multiple iterations are usedSome tricks: variance flooring (?)

Front-end is a cepstral analysisSpeech (MFCC,LFCC)+ derivatives…

Page 21: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

21

J.F. Bonastre, IBM Seminar, September 14th 2006 41

Experimental ProtocolExperimental ProtocolBased upon NIST SRE 2005 database:

Primary, 1side-1side, male set280 speakersUtterances 2.5 min. long (contains speech)13624 tests (951 target tests)Impostors: 200 speakers from the BM model

Commonalities:(16+16d) LFCC features (300-3Khz), Tnormedsystems, 2048 UBM model.

J.F. Bonastre, IBM Seminar, September 14th 2006 42

The The LIA_SpkDetLIA_SpkDet system system GMM/UBM, 2048 diagonal components16 Cepstral coefficients + 16 Delta

(50 coeff in 2006)Frame selection based on a 3

component-GMM modeling of the energyFeature normalization: mean removal

and variance normalizationScore normalization: Tnorm

Page 22: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

22

J.F. Bonastre, IBM Seminar, September 14th 2006 43

AnnexAnnex –– VoiceVoice transformationtransformation

J.F. Bonastre, IBM Seminar, September 14th 2006 44

Voice transformation (1)Voice transformation (1)Independent processing and overlapIndependent processing and overlap--addadd

Frame by frame processing

With overlap

Adding

Y’0=VT(y0,S)

Y’0

y0y1 yn

Y’1 Y’n

Y’1=VT(y1,S) Y’n=VT(yn,S)

Y

adding

Y’=VT(Y,S)Y’

Page 23: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

23

J.F. Bonastre, IBM Seminar, September 14th 2006 45

Voice transformation Voice transformation (2)(2)FindFind thethe targettarget transfertransfer functionfunction ((frameframe) )

Target speakermodel

Acoustic parameter vector (cepstral)

A posteriori probabilities for each component

Build the targetby combining

the meansof the components

usingthe probability vector

Cepstral target -> tranfer function

J.F. Bonastre, IBM Seminar, September 14th 2006 46

Voice transformation (3)Voice transformation (3)

2 // GMM2 // GMM

1 to 1 tying of the components

Filtering

ASR is using several feature norm techniques:Not possible to come back to signal= 2 // models for a target

•One to estimate the a posteriori proba (master)•One to estimate the target (filtering)

Master

Master: in the ASRfeature space

YCompute the

a posteriori proba

TargetFiltering: in the filteringfeature space

Combine the means

Page 24: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

24

J.F. Bonastre, IBM Seminar, September 14th 2006 47

Voice transformation (4)Voice transformation (4)FilteringFiltering

y

Hy(f) ( ) ( )( )fHfHfH

y

x =

( ) fHx

Y’

Build the target for the frame

Target transferfunction

Original TF

J.F. Bonastre, IBM Seminar, September 14th 2006 48

Distributions client Distributions client andand impimp

Page 25: Unsupervised adaptation for speaker detectionlia.univ-avignon.fr/fileadmin/documents/Users/Intranet/chercheurs/... · Unsupervised adaptation for speaker detection Jean-François

25

J.F. Bonastre, IBM Seminar, September 14th 2006 49

DetDet usingusing thethe transformation for transformation for allall thethe teststests

J.F. Bonastre, IBM Seminar, September 14th 2006 50

// Models// Models

ASR feature domain

Filtering feature domain

ASR target speaker

model

Get EM-E hidden variable

Apply EM-M on the Filtering domain

Filteringtarget speaker

model