Perceptual Evalua-on of Singing Quality (PESnQ) · 2020. 12. 13. · Perceptual Evalua-on of Singing Quality (PESnQ) Chitralekha Gupta1,2, Haizhou Li3, and Ye Wang1 [email protected],

PerceptualEvalua-onofSingingQuality(PESnQ)ChitralekhaGupta1,2,HaizhouLi3,andYeWang1

[email protected],[email protected],[email protected],2NUSGraduateSchoolforIntegra0veSciencesandEngineering,3DepartmentofElectricalandComputerEngineering,

Na0onalUniversityofSingapore

1.Introduc0on•  Singing pedagogy is dependent on

human music experts, and is notalwaysaccessibletothemasses

•  Aperceptually-validautoma-csingingevalua-on score could serve as acomplement to singing lessons, andmake singing trainingmore accessibletolearners

7.Conclusions•  We propose perceptually relevant features to objec0vely

evaluatesingingquality•  Weadoptthecogni-vemodelingtheoryofPESQtodesigna

PESnQscorewhichperformsbeKerthandistancefeatures•  PESnQ shows 96% improvement over baseline scores in

correla0ngwiththemusic-experthumanjudges

5.PESnQFormula0on

Sound&MusicCompu0ngLabhKp://www.smcnus.org/

2.Howdoexpertsperceptuallyevaluatesingingquality?

ExperimentalDataset•  20 audio recordings collected from 20 singers with varied

singingabili0es–professionaltopoor•  Subjec0ve evalua0on for singing quality by 5 professionally

trainedmusicians–inter-judgeagreementwas0.82

Reference

Good

Poor

DisturbanceFeatures

Computa0on

Cogni0veModelingTestsignal

Referencesignal PESnQ

score

3.Objec0veCharacteriza0onofSingingQuality

•  UseDTWofMFCCvectorsbetweenframe-equalizedreferenceandtest.Uniformlyfasterorslowertemposhouldn’tbepenalized

RhythmConsistency

ReferenceVs.Good ReferenceVs.PoorIntona-onAccuracy•  Compare post-processed pitch contours from rhythm-

alignedreferenceandtest•  Key transposi-on should be allowedà pitch deriva-ve,

andmedian-subtractedpitchAppropriateVibrato•  Vibratooscilla0ons:Rate:5-8Hz;Extent:30-150cents•  Features:vibratolikeliness,rate,extentVoiceQualityandPronuncia-on

PitchDynamicRange

4.PESQ-basedFeatureModelingCombine frame-disturbances of these features with cogni0vemodelinginspiredbytelecommunica0onstandardPESQ[Rix2001]:

alocalizederrorin,mehasalargersubjec,veimpactthanadistributederror

•  Localizederror:L6-normoversplitsecondintervals(320ms)•  Distributederror:L2-normoverallsplitsecondintervals

System Descrip-onBaselines Pitch distance [Tsai2012], pitch-aligned rhythm distance

[Molina2013],volumedistance[Chang2007,Tsai2012]PESnQsystems Combina0onsof L2-norm, L6+L2-normanddistance features

forthevariousMFCC-alignedperceptualfeatures

6.Results

System Correla-onobjec-vescorewithavg.overallhumanscore

Leave-one-judge-outavg.correla-onscore

HumanJudge – 0.87Baseline 0.30 0.38PESnQ 0.59 0.66

•  RhythmConsistency•  Intona0onAccuracy•  AppropriateVibrato•  VoiceQuality•  PitchDynamicRange

Baseline PESnQ

Regression

•  DTWdistancebetweenMFCCfeaturevectors

•  ComparisonofdifferencebetweenminandmaxpitchvaluesDisturbanceFeatures•  Frame-leveldevia-onoftheop0malpathfromthediagonal

inDTWforrhythmandintona0onfeatures

Perceptual Evalua-on of Singing Quality (PESnQ) · 2020. 12. 13. · Perceptual Evalua-on of Singing Quality (PESnQ) Chitralekha Gupta1,2, Haizhou Li3, and Ye Wang1 [email protected],

Documents