Top Banner
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu Center for Spoken Language Understanding OGI School of Science & Technology at OHSU
25

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

Dec 17, 2015

Download

Documents

Natalie Gibbs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1

PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL

BALANCE OF VOWELS

Jan P.H. van Santen and Xiaochuan Niu

Center for Spoken Language UnderstandingOGI School of Science & Technology at OHSU

Page 2: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 2

OVERVIEW

1. IMPORTANCE OF SPECTRAL BALANCE2. MEASUREMENT OF SPECTRAL BALANCE3. ANALYSIS METHODS4. RESULTS5. SYNTHESIS6. CONCLUSIONS

Page 3: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 3

1. IMPORTANCE OF SPECTRAL BALANCE

• Linguistic Control Factors– Stress-like factors– Positional factors– Phonemic factors

• Acoustic Correlates– Traditionally TTS-controlled:

• Pitch, timing, amplitude

– Demonstrated in natural speech, but usually not TTS-controlled:• Spectral tilt, balance• Formant dynamics• …

Page 4: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 4

2. MEASUREMENT OF SPECTRAL BALANCE

• Data:– 472 greedily selected sentences

• Genre: newspaper• Greedy features: linguistic control factors

– One female speaker– Manual segmentation– Accent: independent rating by 3 judges

• 0-3 score

Page 5: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 5

2. MEASUREMENT OF SPECTRAL BALANCE

• Energy in 5 formant-range frequency bands– B0: 100-300 Hz [~F0]

– B1: 300-800 Hz [~F1]

– B2: 800-2500 Hz [~F2]

– B3: 2500-3500 Hz [~F3]

– B4: 3500- max Hz [~fricative noise]

• In other words, multidimensional measure• Filter bank Square

Average [1 ms rect.] 20 log10(Bi )

• Subtract estimated per-utterance means

Page 6: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 6

2. MEASUREMENT OF SPECTRAL BALANCE

• Details:– Confounding with F0

• Measure pitch-corrected and raw– For certain wave shapes, pitch directly related to fixed-frame

energy– Why do both: wave shapes may change in unknown ways

• F0 not confined to B0 [female speech]

– Vowel formants not quite confined to bands [e.g., F1 for /EE/ and F3 for /ER/]

Page 7: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 7

2. MEASUREMENT OF SPECTRAL BALANCE

• Why not more or different bands?– Multiple interacting Linguistic Control Factors

• Need measurements that minimize interactions

– 5 bands Different vowels “behave similarly”• Can model vowels as a class

• Why not simply spectral tilt?– 5 bands more information than single measure– Supply more information for synthesis

Page 8: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 8

3. ANALYSIS METHODS

• Measures likely to behave like segmental duration:– Multiple interacting, confounded factors:

• Interaction: Magnitude of effects on one factor may depend on other factors

• Confounding: Unequal frequencies of control factor combinations

– “Directional Invariance”• Direction of effects on one factor

independent of other factors

Page 9: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 9

3. ANALYSIS METHODS

• Need method that – can handle multiple interacting,

confounded factors and – takes advantage of Directional

Invariance:

• Used: Sums of Products Model:

Ki Ij

jjini

i

cSccB )(),...,( ,0

Page 10: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 10

3. ANALYSIS METHODS

• Special cases:– Multiplicative model: K = {1}, I1 = {0,…,n}

)()(),...,( ,100,10 nnni cScSccB

)()(),...,( 1,01,00 nnni cScSccB

– Additive model: K = {0,…,n}, Ii = {i}

Page 11: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 11

3. ANALYSIS METHODS

• Used additive model

• Note: Parameter estimates are:– Estimates of marginal means …– … in balanced design:

),...,,...,()( 0,...,,...,

1,00

niiCcccCc

ii cccBMeancSnnii

Page 12: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 12

3. ANALYSIS METHODS

• Pitch correction:

)(log20)(log20 10010][

wici tfBB

• Confounding with F0: Show both

<B0, B1, B2, B3, B4> and:

<B0 + B1, B2, B3, B4>

Page 13: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 13

4. RESULTS: (A) POSITIONAL EFFECTS

5 Bands, not pitch-correctedSolid: right position, dashed: left position. Y-axis: corrected mean

Page 14: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 14

4. RESULTS: (A) POSITIONAL EFFECTS

5 Bands, pitch-corrected

Page 15: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 15

4. RESULTS: (A) POSITIONAL EFFECTS

4 Bands, not pitch-corrected

Page 16: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 16

4. RESULTS: (A) POSITIONAL EFFECTS

4 Bands, pitch-corrected

Page 17: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 17

4. RESULTS: (B) STRESS/ACCENT EFFECTS

5 Bands, not pitch-correctedSolid: stressed syllable, dashed: unstressed. Y-axis: corrected mean

Page 18: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 18

4. RESULTS: (B) STRESS/ACCENT EFFECTS

5 Bands, pitch-corrected

Page 19: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 19

4. RESULTS: (B) STRESS/ACCENT EFFECTS

4 Bands, not pitch-corrected

Page 20: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 20

4. RESULTS: (B) STRESS/ACCENT EFFECTS

4 Bands, pitch-corrected

Page 21: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 21

4. RESULTS: (C) TILT EFFECTS

4

3

2

1

0

)2,1,0,1,2(

B

B

B

B

B

Tilt

Page 22: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 22

5. SYNTHESIS

• Use ABS/OLA sinusoidal model:s[n] = sum of overlapped short-time signal frames sk[n]

sk[n] = sum of quasi-harmonic sinusoidal components:

sk[n] lAk,l cos(k,l n + k,l

• Each frame of unit is represented by a set of quasi-harmonic sinusoidal parameters;

• Given the desired F0 contour, pitch shift is applied to the sinusoidal parameter component of the unit to obtain the target parameter Ak,l ;

Page 23: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 23

5. SYNTHESIS

• Considering the differences of prosody factors between original and target unit, band differences:

iii BB ˆ

• Transform the band difference into weights applying to the sinusoidal parameters:

i

2010 iiw

• ,when the j’th harmonic is located in

the i'th band;ikjkj wAA

• Spectral smoothing across unit boundaries.

Page 24: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 24

5. SYNTHESIS

5 Bands modification example [i:]

Page 25: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 25

CONCLUSIONS

• Described simple methods for predicting and synthesizing spectral balance

• But: Spectral balance is only one “non-standard acoustic correlate”

• Others that remain to be addressed:– Spectral dynamics– Phase