Page 1
Deep Neural Networks forAcoustic Modeling in Speech
Recognition
Hinton,Geoffrey,etal.“Deepneuralnetworksforacousticmodelinginspeechrecognition:Thesharedviewsoffourresearchgroups.” Signal
ProcessingMagazine,IEEE 29.6(2012):82-97.
Presented by PeidongWang04/04/2016
1
Page 2
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
2
Page 3
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
3
Page 4
SpeechRecognitionSystem
• Goal• Convertingspeechtotext
• AMathematicalPerspective
orw = argmax
w{P(w |Y )}
w = argmaxw
{P(Y |w)P(w)}
4
Page 5
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
5
Page 6
GMM-HMMModel
• GMM and HMM• GMM is short for Gaussian Mixture Model, and HMM isshort for Hidden Markov Model.
• PredecessorofDNNs• Before Deep Neural Networks (DNNs), the most commonlyused speech recognition systemswere consistedof GMMsand HMMs.
6
Page 7
GMM-HMMModel
• HMM• HMMisusedtodealwiththetemporalvariabilityofspeech.
• GMM• GMMisusedtorepresenttherelationshipbetweenHMMstatesandtheacousticinput.
7
Page 8
GMM-HMMModel
• Features• ThefeaturesistypicallyrepresentedbyconcatenatingMel-frequencycepstralcoefficients(MFCCs)orperceptuallinearpredictivecoefficients(PLPs)computedfromtherawwaveformandtheirfirst- andsecond-ordertemporaldifferences.
8
Page 9
GMM-HMMModel
• Shortcoming• GMM-HMMmodelsarestatisticallyinefficientformodelingdatathatlieonornearanonlinearmanifoldinthedataspace.• Forexample,modelingthesetofpointsthatlieveryclosetothesurfaceofasphere.
9
Page 10
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
10
Page 11
TrainingDeepNeuralNetworks
• DeepNeuralNetwork(DNN)• ADNNisafeed-forward,artificialneuralnetworkthathasmorethanonelayerofhiddenunitsbetweenitsinputsanditsoutputs.•Withnonlinearactivationfunctions,DNNisabletomodelanarbitrarynonlinearfunction(projectionfrominputstooutputs).[*]
[*]Addedbythepresenter.
11
Page 12
TrainingDeepNeuralNetworks
• ActivationFunctionoftheOutputUnits• Theactivationfunctionoftheoutputunitsis“softmax”function.• Themathematicalexpressionisasfollows.
pj =exp(x j )exp(xk )
k∑
12
Page 13
TrainingDeepNeuralNetworks
• ObjectiveFunction•Whenusingthesoftmaxoutputfunction,thenaturalobjectivefunction(costfunction)Cisthecross-entropybetweenthetargetprobabilitiesdandtheoutputsofthesoftmax,p.• Themathematicalexpressionisasfollows.
C = dj log pjj∑
13
Page 14
TrainingDeepNeuralNetworks
•WeightPenaltiesandEarlyStopping• Toreduceoverfitting,largeweightscanbepenalizedinproportiontotheirsquaredmagnitude,orthelearningcansimplybeterminatedatthepointwhichperformanceonaheld-outvalidationsetstartsgettingworse.
14
Page 15
TrainingDeepNeuralNetworks
• OverfittingReduction• Generallyspeaking,therearethreemethods.•Weightpenaltiesandearlystoppingcanreducetheoverfittingbutonlybyremovingmuchofthemodelingpower.• Verylargetrainingsetscanreduceoverfittingbutonlybymakingtrainingverycomputationallyexpensive.• GenerativePretraining
15
Page 16
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
16
Page 17
GenerativePretraining
• Purpose• Themultiplelayersoffeaturedetectors(theresultofthisstep)canbeusedasagoodstartingpointforadiscriminative“fine-tuning”phaseduringwhichbackpropagationthroughtheDNNslightlyadjuststheweightsandimprovestheperformance.• Inaddition,thisstepcansignificantlyreduceoverfitting.
17
Page 18
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)• RBMconsistsofalayerofstochasticbinary“visible”unitsthatrepresentbinaryinputdataconnectedtoalayerofstochasticbinaryhidden (latent)unitsthatlearntomodelsignificantnonindependenciesbetweenthevisibleunits.• Thereareundirectedconnectionsbetweenvisibleandhiddenunitsbutnovisible-visibleorhidden-hiddenconnections.
18
Page 19
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• TheframeworkofanRBMisshownbelow.
From:SlidesinCSE5526NeuralNetworks19
Page 20
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• RBMusesasinglesetofparameters,W,todefinethejointprobabilityofavectorofvaluesoftheobservablevariables,v,andavectorofvaluesofthelatentvariables,h,viaanenergyfunction,E.
20
p(v,h;W ) = 1Ze−E (v,h;W ),Z = e−E (v ',h ';W )
v ',h '∑
E(v,h) = − aivii∈visible∑ − bjhj
j∈visible∑ − vihjwij
i, j∑
Page 21
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• Theprobabilitythatthenetworkassignstoavisiblevector,v,isgivenbysummingoverallpossiblehiddenvectors.
• Thederivativeofthelogprobabilityofatrainingsetwithrespecttoaweightissurprisinglysimple.Theanglebracketsdenoteexpectationsunderthecorrespondingdistribution.
p(v) = 1Z
e−E (v,h)h∑
1N
∂log p(vn )∂wijn=1
N
∑ =< vihj >data − < vihj >model
21
Page 22
GenerativePretraining
• RestrictedBoltzmannMachine(RBM)(Cont’d)• Thelearningruleisthusasfollows.
• Abetterlearningprocedureiscontrastivedivergence(CD),whichisshownbelow.Thesubscript“recon”denotesastepinCDwhenthestatesofvisibleunitsareassigned0or1accordingtothecurrentstatesofthehiddenunits.
Δwij = ε(< vihj >data − < vihj >model )
Δwij = ε(< vihj >data − < vihj >recon )
22
Page 23
GenerativePretraining
•ModelingReal-ValuedData• Real-valueddata,suchasMFCCs,aremorenaturallymodeledbylinearvariableswithGaussiannoiseandtheRBMenergyfunctioncanbemodifiedtoaccommodatesuchvariables,givingaGaussian-BernoulliRBM(GRBM).
E(v,h) = (vi − ai )2
2σ i2
i∈vis∑ − bjhj
j∈hid∑ − vi
σ i
hjwiji, j∑
23
Page 24
GenerativePretraining
• StackingRBMstoMakeaDeepBeliefNetwork• AftertraininganRBMonthedata,theinferredstatesofthehiddenunitscanbeusedasdatafortraininganotherRBMthatlearnstomodelthesignificantdependenciesbetweenthehiddenunitsofthefirstRBM.• Thiscanberepeatedasmanytimesasdesiredtoproducemanylayersofnonlinearfeaturedetectorsthatrepresentprogressivelymorecomplexstatisticalstructureinthedata.
24
Page 25
GenerativePretraining
• StackingRBMstoMakeaDeepBeliefNetwork(Cont’d)
From:Thepaper25
Page 26
GenerativePretraining
• InterfacingaDNNwithanHMM• InanHMMframework,thehiddenvariablesdenotethestatesofthephonesequence,andthe“visible”variablesdenotethefeaturevectors.[*]
[*]Addedbythepresenter
From:Gales,Mark,andSteveYoung."TheapplicationofhiddenMarkovmodels inspeechrecognition.”Foundationsandtrendsinsignalprocessing 1.3(2008):195-304. 26
Page 27
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• TocomputeaViterbialignmentortoruntheforward-backwardalgorithmwithintheHMMframework,werequirethelikelihoodp(AcousticInput|HMMstate).• ADNN,however,outputsprobabilitiesoftheformp(HMMstate|AcousticInput).
27
Page 28
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• TheposteriorprobabilitiesthattheDNNoutputscanbeconvertedintothescaledlikelihoodbydividingthembythefrequenciesoftheHMMstatesintheforcedalignmentthatisusedforfine-tuningtheDNN.• Forcedalignment isaprocedureusedtogeneratelabelsforthetrainingprocess.[*]
[*]Addedbythepresenter
28
Page 29
GenerativePretraining
• InterfacingaDNNwithanHMM(Cont’d)• All of the likelihoods produced in this way are scaled by thesame unknown factor of p(AcousticInput).• Although this appears to have little effect on somerecognition tasks, it can be important for tasks wheretraining labels are highly unbalanced.
29
Page 30
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
30
Page 31
Experiments
• PhoneticClassificationandRecognitiononTIMIT• TheTIMITdatasetisarelativelysmalldatasetwhichprovidesasimpleandconvenientwayoftestingnewapproachestospeechrecognition.
31
Page 32
Experiments
• PhoneticClassificationandRecognitiononTIMIT(Cont’d)
From:Thepaper32
Page 33
Experiments
• Bing-Voice-SearchSpeechRecognitionTask• Thistaskused24hoftrainingdatawithahighdegreeofacousticvariabilitycausedbynoise,music,side-speech,accents,sloppypronunciation,etal.• ThebestDNN-HMMacousticmodelachievedasentenceaccuracyof69.6%onthetestset,comparedwith63.8%forastrong,minimumphoneerror(MPE)-trainedGMM-HMMbaseline.
33
Page 34
Experiments
• Bing-Voice-SearchSpeechRecognitionTask(Cont’d)
From:Thepaper 34
Page 35
Experiments
• OtherLargeVocabularyTasks• SwitchboardSpeechRecognitionTask(acorpuscontainingover300hoftrainingdata)• GoogleVoiceInputSpeechRecognitionTask• YouTubeSpeechRecognitionTask• EnglishBroadcastNewsSpeechRecognitionTask
35
Page 36
Experiments
• OtherLargeVocabularyTasks(Cont’d)
From:Thepaper 36
Page 37
Content
• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion
37
Page 38
Discussion
• ConvolutionalDNNsforPhoneClassificationandRecognition• AlthoughconvolutionalmodelsalongthetemporaldimensionachievedgoodclassificationresultsonTIMITcorpus,applyingthemtophonerecognitionisnotstraightforward.• ThisisbecausetemporalvariationsinspeechcanbepartiallyhandledbythedynamicprogramingprocedureintheHMMcomponentandhiddentrajectorymodels.
38
Page 39
Discussion
• SpeedingUpDNNsatRecognitionTime• ThetimethataDNN-HMMsystemrequirestorecognize1sofspeechcanbereducedfrom1.6sto210ms,withoutdecreasingrecognitionaccuracy,byquantizingtheweightsdownto8busingCPU.• Alternatively,itcanbereducedto66msbyusingagraphicsprocessingunit(GPU).
39
Page 40
Discussion
• AlternativePretrainingMethodsforDNNs• ItispossibletolearnaDNNbystartingwithashallowneuralnetwithasinglehiddenlayer.Oncethisnethasbeentraineddiscriminatively,asecondhiddenlayerisinterposedbetweenthefirsthiddenlayerandthesoftmaxoutputunitsandthewholenetworkisagaindiscriminativelytrained.Thiscanbecontinueduntilthedesirednumberofhiddenlayersisreached,afterwhichfullbackpropagationfine-tuningisapplied.
40
Page 41
Discussion
• AlternativePretrainingMethodsforDNNs(Cont’d)• PurelydiscriminativetrainingofthewholeDNNfromrandominitialweightsworkswell,too.• Varioustypesofautoencoderwithonehiddenlayercanalsobeusedinthe layer-by-layergenerativepretrainingprocess.
41
Page 42
Discussion
• AlternativeFine-TuningMethodsforDNNs•MostDBN-DNNacousticmodelsarefine-tunedbyapplyingstochasticgradientdescentwithmomentumtosmallminibatchesoftrainingcases.•Moresophisticatedoptimizationmethodscanbeused,butitisnotclearthatthemoresophisticatedmethodsareworthwhilesincethefine-tuningprocessistypicallystoppedearlytopreventoverfitting.
42
Page 43
Discussion
• UsingDBN-DNNstoProvideInputFeaturesforGMM-HMMSystems• ThisclassofmethodsuseneuralnetworkstoprovidethefeaturevectorsforthetrainingprocessoftheGMMinaGMM-HMMsystem.• Themostcommonapproachistotrainarandomlyinitializedneuralnetwithanarrowbottleneckmiddlelayerandtousetheactivationsofthebottleneckhiddenunitsasfeatures.
43
Page 44
Discussion
• UsingDNNstoEstimateArticulatoryFeaturesforDetection-BasedSpeechRecognition• DBN-DNNsareeffectivefordetectingsubphoneticspeechattributes(alsoknownasphonologicalorarticulatoryfeatures).
44
Page 45
Discussion
• Summary•MostofthegaincomesfromusingDNNstoexploitinformationinneighboringframesandfrommodelingtiedcontext-dependentstates.• Thereisnoreasontobelievethattheoptimaltypesofhiddenunitsortheoptimalnetworkarchitecturesareused,anditishighlylikelythatboththepretrainingandfine-tuningalgorithmscanbemodifiedtoreducetheamountofoverfittingandtheamountofcomputation.
45
Page 47
InvestigationofSpeechSeparationasaFront-Endfor
NoiseRobustSpeechRecognition
Narayanan,Arun,andDeLiangWang."Investigationofspeechseparationasafront-endfornoiserobustspeechrecognition."Audio,Speech,andLanguageProcessing,IEEE/ACMTransactionson 22.4
(2014):826-835.
Presented by PeidongWang04/04/2016
47
Page 48
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
48
Page 49
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
49
Page 50
Introduction
• Background• Althoughautomaticspeechrecognition(ASR)systemshavebecomefairlypowerful,theinherentvariabilitycanstillposechallenges.• Typically,ASRsystemsthatworkwellincleanconditionssufferfromadrasticlossofperformanceinthepresenceofnoise.
50
Page 51
Introduction
• Feature-BasedMethods• Thisclassofmethodsfocusonfeatureextractionorfeaturenormalization.• Feature-basedtechniqueshavethepotentialtogeneralizewell,butdonotalwaysproducethebestresults.
51
Page 52
Introduction
• TwoGroupsofFeature-BasedMethods•Whenstereo[*] data isunavailable,priorknowledgeaboutspeechand/ornoiseisused,suchasspectralreconstructionbasedmissingfeaturemethods,directmaskingmethodsandfeatureenhancementmethods.•Whenstereodataisavailable,featuremappingmethodsandrecurrentneuralnetworkshavebeenused.
[*]Bystereowemeannoisyandthecorresponding cleansignals.
52
Page 53
Introduction
•Model-BasedMethods• TheASRmodelparametersareadaptedtomatchthedistributionofnoisyorenhancedfeatures.•Model-basedmethodsworkwellwhentheunderlyingassumptionsaremet,buttypicallyinvolvesignificantcomputationaloverhead.• Thebestperformancesareusuallyobtainedbycombiningfeature-basedandmodel-basedmethods.
53
Page 54
Introduction
• SupervisedClassificationBasedSpeechSeparation• Stereotrainingdataisalsousedbysupervisedclassificationbasedspeechseparationalgorithms.• Suchalgorithmstypicallyestimatetheidealbinarymask(IBM)-abinarymaskdefinedinthetime-frequency(T-F)domainthatidentifiesspeechdominantandnoisedominantT-Funits.• Theabovemethodcanbeextendedtoidealratiomask(IRM),which representstheratioofspeechtomixture energy.
54
Page 55
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
55
Page 56
SystemDescription
• BlockDiagramoftheProposedSystem
From:Thepaper56
Page 57
SystemDescription
• AddressingAdditiveNoiseandConvolutionalDistortion• Theadditivenoiseandtheconvolutionaldistortionaredealtwithintwoseparatestages:Noiseremovalfollowedbychannelcompensation.• NoiseisremovedviaT-FmaskingusingtheIRM.Tocompensateforchannelmismatchandtheerrorsintroducedbymasking,welearnanon-linearmappingfunctionthatundoesthesedistortions.
57
Page 58
SystemDescription
• Time-FrequencyMasking
58
Page 59
SystemDescription
• Time-FrequencyMasking(Cont’d)• HeretheauthorsperformT-Fmaskinginthemel-frequencydomain,unlikesomeoftheothersystemsthatoperateinthegammatonefeaturedomain.• Toobtainthemel-spectrogramofasignal,itisfirstpre-emphasizedandtransformedtothelinearfrequencydomainusinga320channelfastFouriertransform(FFT).A20msecHammingwindowisused. The161-dimensionalspectrogramisthenconvertedtoa26-channelmel-spectrogram.
59
Page 60
SystemDescription
• Time-FrequencyMasking(Cont’d)• TheauthorsuseDNNstoestimatetheIRMasDNNsshowgoodperformanceandtrainingusingstochasticgradientdescentscaleswellcomparedtoothernonlineardiscriminativeclassifiers.
60
Page 61
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• Theidealratiomaskisdefinedastheratioofthecleansignalenergytothemixtureenergyateachtime-frequencyunit.• Themathematicalexpressionisshownbelow.
IRM (t, f ) = 10(SNR(t , f )/10)
10(SNR(t , f )/10) +1SNR(t, f ) = 10 log10 (X(t, f ) / N(t, f ))
61
Page 62
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• RatherthanestimatingIRMdirectly,theauthorsestimateatransformedversionoftheSNR.• Themathematicalexpressionofthesigmoidaltransformationisshownbelow.
d(t, f ) = 11+ exp(−α (SNR(t, f )− β ))
62
Page 63
SystemDescription
• Time-FrequencyMasking(Cont’d)• TargetSignal• Duringtesting,thevaluesoutputfromtheDNNaremappedbacktotheircorrespondingIRMvalues.
63
Page 64
SystemDescription
• Time-FrequencyMasking(Cont’d)• Features• Featureextractionisperformedbothatthefullbandandthesubbandlevel.• Thecombinationoffeatures,31dimensionalMFCCs,13dimensionalFASTAfilteredPLPsand15dimensionalamplitudemodulationspectrogram(AMS)features,areused.
64
Page 65
SystemDescription
• Time-FrequencyMasking(Cont’d)• Features• ThefullbandfeaturesarederivedbysplicingtogetherfullbandMFCCsandRASTA-PLPs,alongwiththeirdeltaandaccelerationcomponents,andsubbandAMSfeatures.• ThesubbandfeaturesarederivedbysplicingtogethersubbandMFCCs,RASTA-PLPs,andAMSfeatures.Someauxiliarycomponentsarealsoadded.
65
Page 66
SystemDescription
• Time-FrequencyMasking(Cont’d)• SupervisedLearning• IRMestimationisperformedintwostages.Inthefirststage,multipleDNNsaretrainedusingfullbandandsubbandfeatures.ThefinalestimateisobtainedusinganMLPthatcombinestheoutputofthefullbandandthesubbandDNNs.
66
Page 67
SystemDescription
• Time-FrequencyMasking(Cont’d)• SupervisedLearning• ThefullbandDNNswouldbecognizantoftheoverallspectralshapeoftheIRMandtheinformationconveyedbythefullbandfeatures,whereasthesubbandDNNsareexpectedtobemorerobusttonoiseoccurringatfrequenciesoutsidetheirpassband.
67
Page 68
SystemDescription
• Time-FrequencyMasking(Cont’d)
From:Thepaper 68
Page 69
SystemDescription
• FeatureMapping
69
Page 70
SystemDescription
• FeatureMapping(Cont’d)• EvenafterT-Fmasking,channelmismatchcanstillsignificantlyimpactperformance.• Thishappensfortworeasons.Firstly,thealgorithmlearnstoestimatetheratiomaskusingmixturesofspeechandnoiserecordedusingasinglemicrophone.Secondly,becausechannelmismatchisconvolutional,speechandnoise,whichnowincludesbothbackgroundnoiseandconvolutivenoise,areclearlynotuncorrelated.
70
Page 71
SystemDescription
• FeatureMapping(Cont’d)• Thegoaloffeaturemappinginthisworkistolearnspectro-temporalcorrelationsthatexistinspeechtoundothedistortionsintroducedbyunseenmicrophonesandthefirststageofthealgorithm.
71
Page 72
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• Thetargetisthecleanlog-melspectrogram(LMS).The“clean”LMSherecorrespondstothoseobtainedfromthecleansignalsrecordedusingasinglemicrophoneinasinglefiltersetting.
72
Page 73
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• InsteadofusingtheLMSdirectlyasthetarget,theauthorsapplyalineartransformtolimitthetargetvaluestotherange[0,1]tousethesigmoidaltransferfunctionfortheoutputlayeroftheDNN.• Themathematicalexpressionisasfollows.
Xd (t, f ) =ln(X(t, f ))−min(ln(X(⋅, f )))
max(ln(X(⋅, f )))−min(ln(X(⋅, f )))
73
Page 74
SystemDescription
• FeatureMapping(Cont’d)• TargetSignal• Duringtesting,theoutputoftheDNNismappedbacktothedynamicrangeoftheutterancesintrainingset.
74
Page 75
SystemDescription
• FeatureMapping(Cont’d)• Features• TheauthorsuseboththenoisyandthemaskedLMS.
• SupervisedLearning• UnliketheDNNsusedforIRMestimation,thehiddenlayersoftheDNNforthistaskuserectifiedlinearunits(ReLUs).Inaddition,theoutputlayerusessigmoidactivations.
75
Page 76
SystemDescription
• FeatureMapping(Cont’d)
From:Thepaper76
Page 77
SystemDescription
• AcousticModeling
77
Page 78
SystemDescription
• AcousticModeling(Cont’d)• TheacousticmodelsaretrainedusingtheAurora-4dataset.• Aurora-4isa5000-wordclosedvocabularyrecognitiontaskbasedontheWallStreetJournaldatabase.Thecorpushastwotrainingsets,cleanandmulti-condition,bothwith7138utterances.
78
Page 79
SystemDescription
• AcousticModeling(Cont’d)• GaussianMixtureModels• TheHMMsandtheGMMsareinitiallytrainedusingthecleantrainingset.Thecleanmodelsarethenusedtoinitializethemulti-conditionmodels;bothcleanandmulti-conditionmodelshavethesamestructureanddifferonlyintransitionandobservationprobabilitydensities.
79
Page 80
SystemDescription
• AcousticModeling(Cont’d)• DeepNeuralNetworks• Theauthorsfirstalignthecleantrainingsettoobtainsenonelabelsateachtime-frameforallutterancesinthetrainingset.DNNsarethentrainedtopredicttheposteriorprobabilityofsenonesusingeithercleanfeaturesorfeaturesextractedfromthemulti-conditionset.
80
Page 81
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression
81
Page 82
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• dFDLRisasemi-supervisedfeatureadaptationtechnique.• ThemotivationfordevelopingdFDLRistoaddresstheproblemofgeneralizationtounseenmicrophoneconditionsinourdataset,whichiswheretheDNN-HMMsystemsperformtheworst.
82
Page 83
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• ToapplydFDLR,wefirstobtainaninitialsenone-levellabelingforourtestutterancesusingtheunadaptedmodels.Featuresarethentransformedtominimizethecross-entropyerrorinpredictingtheselabels.• Themathematicalexpressionsareasfollow.
Ot ( f ) = wf iOt ( f )+ bf
min E(st ,Dout (Ot−5...Ot+5 ))t∑
83
Page 84
SystemDescription
• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• TheparameterscaneasilybelearnedwithintheDNNframeworkbyaddingalayerbetweentheinputlayerandthefirsthiddenlayeroftheoriginalDNN. Afterinitialization,thestandardbackpropagationalgorithmisrunfor10epochstolearntheparametersofthedFDLRmodel. Duringbackpropagation,weightsoftheoriginalhiddenlayersarekeptunchangedandonlytheparametersinthedFDLRareupdated.
84
Page 85
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
85
Page 86
EvaluationResults
From:Thepaper86
Page 87
EvaluationResults
From:Thepaper87
Page 88
Content
• Introduction• SystemDescription• EvaluationResults• Discussion
88
Page 89
Discussion
• Severalinterestingobservationscanbemadefromtheresultspresentedintheprevioussection.• Firstly,theresultsclearlyshowthatthespeechseparationfront-endisdoingagoodjobatremovingnoiseandhandlingchannelmismatch.• Secondly,withnochannelmismatch,T-Fmaskingaloneworkedwellinremovingnoise.
89
Page 90
Discussion
• Finally,directlyperformingfeaturemappingfromnoisyfeaturestocleanfeaturesperformsreasonably,butitdoesnotperformaswellastheproposedfront-end.
90