Top Banner
Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul
55

Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Dec 14, 2015

Download

Documents

Kaila Higley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Linguistic Voice Quality

Patricia KeatingUniversity of California, Los Angeles

Christina EspositoMacalester College, St. Paul

Page 2: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Phonation

• Production of sound by vibration of the vocal folds

• Phonation type contrasts on vowels and/or consonants

Page 3: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Ladefoged’s (simplified) glottal constriction model

•Size of glottal opening varies from that for voiceless sounds (no phonation) to zero (glottal closure)•Phonation is possible at a variety of constrictions, but with voice quality differences•These are the most common contrasts

Page 4: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Breathy vs. creaky glottis

(from Ladefoged)

Page 5: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

3 phonations of San Lucas Quiavini Zapotec

‘gets bitter’

modal

‘rdaa’

‘gets ripe’

breathy

‘rah’

‘lets go of’

creaky

‘rdààà’

Page 6: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

A continuum like VOT

(pre)voiced

voiceless

unaspirated aspirated

lead VOT

short lag VOT

long lag VOT

Page 7: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

This talk

• Language differences in acoustic dimensions of phonation contrasts

• Perception of phonation contrasts• A bit on phonation and tones

Page 8: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Spectral measures of voice quality

• Given the F0, the frequencies of all the harmonics are determined and cannot vary

• But the amplitudes of the harmonics do not depend on the F0 and can vary

• Relative amplitudes of harmonics can be readily seen in a spectrum

Page 9: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Most popular measure:H1-H2

• Relative amplitude of first two harmonics H1 and H2

• Breathy voice: strong H1

• Creaky voice: H1 weaker than H2

Page 10: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Related to Open Quotient

• Vocal fold vibration cycle divides into open vs. closed portions

• Open portion of cycle as proportion of total cycle: Open Quotient (OQ)

• The more time the vocal folds are open, the more air gets through, so the breathier the voice

• Most extreme OQ would be1.00: folds don’t close completely and are always letting some air through

Page 11: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Glottal constriction and OQ (from Klatt & Klatt 1990)

Ug

Page 12: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

H1-H2 and breathy voice in several languages

from Esposito 2006

Page 13: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Spectral tilt differences

• Stronger high frequency components with more abrupt closing of the folds is typical with greater glottal constriction

• Several ways to quantify overall tilt

Page 14: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

H1 and A1, A2, A3

A2 A3

A1

H1

H2

Page 15: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

H1-A1 in Mazatec(from Blankenship 1997)

modal

creaky

breathy

Page 16: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Cepstral Peak Prominence (from Hillenbrand et al. 1994)

• Well-defined harmonics give strong peak in cepstrum

• Harmonics and cepstral peak less defined in breathy noise (on the right)

Page 17: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

H1-H2 and CPP in Mazatec (from Blankenship 1997)

H1-H2 CPP

means of 12 speakers x 3 reps

Page 18: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Summary

CREAKY• H1-H2 is low• Higher

frequencies are strong

• Cepstral Peak Prominence can be low due to irregular vibration

BREATHY• H1-H2 is high• Higher

frequencies are weak

• Cepstral Peak Prominence is low due to noise

Page 19: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Within-language difference in phonations

• OQ and tilt measures are generally correlated, but not always

• Contrasts that are not distinguished by H1-H2 are necessarily distinguished by one or more other measures (e.g. H1-A3, H1-A2, CPP in Esposito 2006)

• But even within a language, speakers can differ: Esposito (2003, 2005) on Santa Ana del Valle Zapotec

Page 20: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Santa Ana del Valle Zapotec

• Spoken in Santa Ana del Valle, Oaxaca, Mexico

• Related to: San Lucas Quiaviní Zapotec, San Juan Guelavía Zapotec, Tlacolula Zapotec

Mexico City

Oaxaca

Page 21: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Santa Ana del Valle Zapotec

minimal triple• Modal: ‘can’ lat

• Breathy: ‘place’ la̤t

• Creaky: ‘field’ la̰ts

Page 22: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

• The three phonations are distinguished by: •H1-A3 for the male speakers•H1-H2 for the female speakers

-15

-10

-5

0

5

10

15

20

dB

Breathy lat Modal latCreaky lats

H1-H2 H1-H2H1-A3 H1-A3

Male Speakers Female Speakers

Spectral measures

Page 23: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Compare with EGG

• Are the speakers really producing the contrasts differently as suggested by the acoustic measures?

• Electroglottograph recordings using Glottal Enterprises EGG

Page 24: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

EGG Closing Quotient CQ

– Reflects the portion of time the vocal folds are closing during each glottal cycle

– Measured automatically

CQ = Tc / (Tc +To)

Page 25: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

EGG Max-Min Velocity

– A measure of pulse symmetry– Measured manually from the

derivative of the EGG signal

Page 26: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

CQ

0

200

400

600

800

1000

1200

Velocity symmetry

0

0.5

1

• Velocity symmetry (not CQ) distinguishes the 3 phonation categories for the male speakers

- Suggesting that the male speakers’ phonations arise from differences in closing abruptness

EGG measures: males

Page 27: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

0

0.5

1CQ Velocity symmetry

0

200

400

600

800

1000

1200

•CQ (not Velocity symmetry) distinguishes the 3 phonation categories for the female speakers

- Suggesting that the female speakers’ phonations are produced by the proportion of time the vocal folds are open during each glottal cycle

EGG measures: females

Page 28: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Summary, Zapotec

Speakers Successful measures of phonation

Suggested manner of phonation production

Male H1-A3, Max-Min Velocity

abruptness of vocal fold closure

Female H1-H2, Closing Quotient

proportion of cycle the vocal folds are open

Page 29: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

From a continuum to a multidimensional space

• Phonation categories can be made in multiple ways

• How independent are different dimensions in a given language?

• How important is each dimension?

• Perception tests as a way to answer

Page 30: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Perception of modal vs. breathy (Esposito 2006)

• 2 experiments using different tasks and contrasting vowels from different languages

• 3 listener groups– 12 Gujarati (with contrast)– 18 American English (no contrast, but

allophonic breathiness)– 18 Mexican Spanish (no contrast)

Page 31: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Experiment 1: Classification

• 2 breathy and 2 modal tokens from each of 10 languages, NOT Gujarati

• Male speakers, /a/ vowels after coronals

• Discriminant analysis of this sample identifies CPP, H1-H2, H1-A2, and H1-A3 (in that order) as most useful in distinguishing breathy vs modal

Page 32: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

A mix of talkers and languages

• Gujarati contrast is made simply by H1-H2, so the sample offers a greater variety of dimensions than Gujarati listeners are used to attending to

• Languages/talkers differ in breathiness:– Fouzhou (breathier) vs. Mong

Page 33: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Stimulus control

• Eliminate differences in duration and F0 by resynthesizing all tokens to 250 ms and 115-110 Hz F0

• Audio comparison of a Mazatec example:– Original (whole word)– manipulated (vowel)

Page 34: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Stimulus 1

Box 1 Box 2

Stimulus 2 Stimulus 3 Stimulus 4

arrows represent one possible sorting of the stimuli

......

Visual free sort, schematic screen display

Page 35: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Comparing responses across listeners

• For each pair of stimuli, how often did listeners put them in the same box, vs. in different boxes? (perceptual similarity)

• For each pair of stimuli, how different are they along each of the physical dimensions measured? (acoustic similarity)

• How are these related for each listener group? (correlations)

Page 36: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

H1-H2 and perception, Gujarati listeners

y = -3.7248x + 36.143

R2 = 0.8113

0

5

10

15

20

25

30

35

0 2 4 6 8 10

Perceptual Judgements

H1-H

2 Di

ff. (d

B)Physically different

Perceptually different

Page 37: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

H1-H2 and perception

English listeners Spanish listenersy = -0.4357x + 11.191

R2 = 0.1416

0

5

10

15

20

25

0 5 10 15 20

Perceptual Judgements

H1-H

2 di

ff (d

B)

y = -0.5561x + 12.446

R2 = 0.2225

0

5

10

15

20

25

0 5 10 15 20

Perceptual Judgements

H1-H

2 Di

ff. (d

B)

Weak correlation in both languages; Spanish listeners also used H1-A1

Page 38: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Summary, Experiment 1

• High cross-listener consistency for Gujarati listeners

• Gujarati listeners relied only on H1-H2• English and Spanish listeners also used

H1-H2, but not consistently or well• No listener groups used Cepstral Peak

Prominence; though it was highest in the discriminant analysis, the total range of values was small

Page 39: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Experiment 2

• Multidimensional perceptual spaces derived from similarity ratings of every pair of tokens in the stimulus set

• Same listeners as Experiment 1• Different stimuli

Page 40: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Experiment 2 stimuli

• All stimuli from Mazatec– 3 male speakers– 16 tokens from each speaker– Resynthesized as in Experiment 1

• Discriminant analysis identifies H1-A2 as the best discriminator for these 40 tokens, followed by H1-H2

• Talkers differ in degree of breathiness

Page 41: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Similarity ratings and MDS

• All possible pairs (within each talker separately) presented for similarity ratings (using an on-screen slider)

• Multidimensional scaling of ratings to derive perceptual spaces for individuals and for groups

• Perceptual dimensions related to acoustic measures by correlation

Page 42: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Results

• No listeners used H1-A2 (the best one)• English and Spanish listeners were

inconsistent• English listeners weakly used H1-H2

and CPP; Spanish listeners weakly used H1-A1 and H1-H2

• Gujarati listeners consistently relied on H1-H2, but distinguished 3 perceptual clusters rather than 2 (modal, breathy, beyond breathy)

Page 43: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Sample Gujarati space with three clusters

H1-H2 > 15 dB H1-H2 < 5 dB

Page 44: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Summary of perception experiments

• Gujarati listeners, experienced with a modal-breathy vowel contrast based on H1-H2, consistently relied on H1-H2 in perceiving vowels from several other languages

• English and Spanish listeners were inconsistent, relying weakly on a variety of (sometimes weak) correlates

Page 45: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Phonation, F0, tones

• Phonation varies with tone in some tone languages

• Perhaps more general variation across languages, subtle because within modal range?

• Voice quality might be used to recognize tones

• Voice quality might be used to calibrate speaker’s F0 range

Page 46: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Preliminary foray

• 4 tones of Mandarin (tones 3 and 4 known to occur with creak)

• High, Low level tones of Bura (Chadic, Nigeria)

• One male speaker of each language

Page 47: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Refining harmonic measures

• H1 and H2 are very sensitive to frequency of F1, which limits vowel comparisons

• Inverse filtering recovers the voice source, but is not always practical

• Iseli & Alwan (2004), Iseli, Shue & Alwan (2006) provide corrections for higher formant frequencies and BWs

• H1*-H2*, H2*-H4*, H1*-A3*

Page 48: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

audio

F0

H1*-H2*

H2*-H4*

H1*-A3*

Sample output: Mandarin

Page 49: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Mandarin

• H1*-H2* is most related to F0• Low F0 is creakier and high F0 breathier• [ Compare Iseli et al. similar result for

English: below 175 Hz, F0 is positively correlated with H1*-H2* ]

• H1*-H2* is positive, zero, or negative with high, mid, or low F0 tone onset (next slide)

Page 50: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

F0 and H1*-H2* 4 timepoints per vowel

F0 of 4 tones

80

90

100

110

120

130

140

150

160

170

180

1 2 3 4

H1-H2 of 4 tones

-10

-5

0

5

10

15

20

1 2 3 4

timepoints timepoints

Page 51: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Bura tone samples

• Larger sample, multiple tokens per tone– Examples of High tone– Examples of Low tone – Example of Low-High sequence

• 1 measure per vowel at mid-vowel• Compared by exploratory ANOVAs

Page 52: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Bura tones and H1*-H2*

• H1*-H2* does NOT vary with tone or with F0

• Even though speaker’s F0 is in the range identified by Iseli et al.

• Different from English, and from the Mandarin sample

Page 53: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Bura tones and other measures

• H2*-H4*, H1*-A3* are higher for Low tones, though not correlated with F0

• i.e. Low tones are breathier: opposite result from Mandarin, and based on abruptness of closing rather than OQ

• Discriminant analysis with just voice measures uses H1*-A3* to get 57% of tokens’ tones correct

Page 54: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Summary, tone foray

• Mandarin and Bura tones have opposite relations of tone and voice, on different dimensions

• Within each language, voice quality could offer information to listeners about a speaker’s tones

Page 55: Linguistic Voice Quality Patricia Keating University of California, Los Angeles Christina Esposito Macalester College, St. Paul.

Conclusion

Linguistic voice quality is a rich yet relatively under-studied area.

Phonation contrasts are multi-dimensional, and listeners with different language experience attend to different dimensions.

Better understanding of linguistic contrasts could help with other areas in which voice quality is important.