Linking Vowel Height and Creaky Voice Laura M. Panfili - [email protected] The University of Washington April 23, 2016 Research Q & Hypotheses – Background – Methods – Results – Discussion – Conclusion 1/20
Linking Vowel Height and Creaky Voice
Laura M. Panfili - [email protected] The University of Washington
April 23, 2016
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion1/20
Research Questions and Hypotheses
• Is creaky voice more likely to occur on low vowels than on high vowels in English?
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion2/20
High à
Lowà
Background – Phonation • Phonation: the process of using air pressure to
set the vocal folds into vibration
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion3/20
Background – Phonation Types • The phonation continuum (Gordon and Ladefoged, 2001):
Linking Vowel Height and Creaky Voice
Laura M. Panfili
Spring 2015
1 Introduction
Phonation has been well-studied in languages that use it phonemically (such as Jalapa Mazatec(Esposito, 2012), Hainan Cham (Thurgood, 2015) and Montana Salish (Flemming et al., 1994)).However, relatively little is known about its acoustic properties and uses in English. This studyexamines one aspect of the acoustics of phonation in investigating whether creaky voice is morelikely to occur on low vowels than on high vowels.
Previous studies have observed patterns regarding phonation types and vowel qualities (Podesvaet al., 2015; Szakay, 2012), though findings have been varied and side observations rather thanmethodically researched questions. The present study aims to study one acoustic aspect of phonationuse in English and hypothesizes that creaky voice is more likely to occur on low vowels than on highvowels. This pattern is demonstrated in a corpus of spontaneous conversations in Pacific NorthwestEnglish and is theoretically linked to Intrinsic Fundamental Frequency (IF0). Though further studyand the inclusion of more dialects and languages is needed, the results of this study potentially havesignificant implications for experimental design in studies of phonation, as well as for the discussionof IF0 and its mechanisms.
2 Background
2.1 Phonation
Phonation is the process of using air pressure to set the vocal folds into vibration, producing a quasi-periodic sound wave. We are able to manipulate our vocal folds in various ways - their thickness,length, and separation - using the muscles around them. These manipulations change the quality ofthe sound produced (Raphael et al., 2007). Gordon and Ladefoged (2001) describe a continuum ofphonation, ranging from spread vocal folds to closed vocal folds and with different types of vibrationin between, as seen in Figure 1. These laryngeal parameters and phonation types are overviewed inin Section 2.1.1; they produce different acoustic properties, which are overviewed in Section 2.1.2.
Figure 1: The Phonation ContinuumAfter Gordon and Ladefoged (2001)
1
No vocal fold vibration
because vocal folds are spread
No vocal fold vibration because
vocal folds are closed
(glottal stop)
Vocal folds spend approximately equal
amounts of time open and closed,
maximum vibration Vocal folds spend more time open
than closed
Vocal folds spend more
time closed than open
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion4/20
Graphics by Dan McCloy
Voiced Sounds
Background – Phonation
• Voice quality changes based on manipulation of vocal fold: (Raphael et al., 2007) – Thickness – Length – Separation
• Phonation is used in different ways by different languages
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion5/20
Breathy
Modal
Creaky
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion6/20
“YEAH”
Methods – The Corpus • ATAROS (Automatic Tagging and
Recognition of Stance) Corpus (Freeman, 2015)
• Pairs of native PNW English speakers – Roughly matched for age – Matched or crossed for gender
• Five collaborative tasks designed to elicit changes in stance
• Recorded at the UW Phonetics Lab
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion7/20
Methods – The Sample
• 8 pairs (16 speakers) – 11 female, 5 male – 21 – 70 years old
• “Budget Task” – Asked to work together to balance an
imaginary town budget – The final of five tasks -> most natural speech
• ~95 minutes of conversation
Table 1: Gender and Age of Speakers
Dyad 1 F, 21 M, 24Dyad 2 F, 70 F, 68Dyad 3 M, 26 F, 24Dyad 4 M, 24 F, 23Dyad 5 F, 21 F, 27Dyad 6 F, 49 M, 49Dyad 8 F, 39 M, 38Dyad 9 F, 23 F, 19
3.2 Annotating Phonation
Conversations in the ATAROS corpus were previously manually transcribed and force-aligned. Fol-lowing the phone-level boundaries already indicated in the corpus, all stressed vowels (primary andsecondary stress, for a total of 13,834 vowels) were tagged for their phonation type.
Two raters trained in phonetics listened to all stressed vowels in the eight ATAROS dyads. Inorder to accurately represent what we hear as different phonation types (as opposed to the acousticproperties phoneticians are trained to recognize in spectrograms), the raters were instructed torely on their ears to make a judgement; the spectrogram, pitch track, intensity track, and formanttracks were all shut off and raters were discouraged from looking at the waveform. They weretrained by listening to multiple examples of vowels exhibiting the acoustic properties of each of thethree phonation types. Example waveforms and spectrograms for the three phonation types witheach of the four corner vowels are provided in Figure A.2 in the appendix. The two raters had goodinter-rater reliability (Cohen’s Kappa 0.85 overlapping on 12.07 % of the data).
Stressed vowels were given one of the five following tags:• B: Breathy• M: Modal• C: Creaky• 0: Flaw in recording or alignment (e.g. clipping)• 1: Something interesting but irrelevant or problematic (e.g. laugh-speech)
Two types of data were excluded from the analysis. First, vowels tagged as 0 or 1 were notincluded, as they either did not represent useable audio or any of the phonation types. Second,function words that tend to include reduced vowels (phonetic ‘stop words’) were excluded to ensurethat only stressed vowels were part of the analysis. A complete list of these phonetic stop wordscan be found in the appendix in Figure A.1. After removing vowels tagged as 0 or 1 and thosebelonging to function words, the remaining 7,605 vowels were included in the analysis.1
1This study did not control for position in utterance. Creaky voice is often found phrase-finally, but it seemsunlikely that all the low vowels and none of the high vowels included in this study were also found phrase-finally.
7
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion8/20
Methods – Tagging Phonation • Vowels were tagged for phonation type based
on auditory judgments • Two phonetically trained raters – Cohen’s Kappa 0.85
• Tags: – B: Breathy – M: Modal – C: Creaky – 0: Flaw in the recording or alignment – 1: Something interesting but irrelevant or
problematic
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion9/20
Methods – Data in the Analysis
• The final sample excludes: – Vowels tagged as 0 or 1 – Unstressed vowels – Reduced vowels – All but the four “corner vowels” /i u æ ɑ/
• The analyzed data includes /i u æ ɑ/ tagged as B, M, C in stressed syllables
à 2,459 vowels
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion10/20
Results – Vowel Spaces • Plotted vowels to verify that they are representative of a
typical PNW vowel space • Modal tokens only for these vowel plots
4 Results and Analysis
4.1 Vowel Spaces
To verify that the speakers produced a set of vowels representative of a typical Pacific Northwestvowel space (as in Wassink, 2015 and Wright and Souza, 2012), particularly that high vowels arehigh and low vowels are low, the first and second formants of each speaker’s four corner vowels wereplotted. These plots were created using the formants for only modal tokens.2 All modal tokens of/æ/, /A/, /i/ and /u/, with formants measured at the midpoint, were included, for a total of 1,705vowels (see Table 2 for a complete breakdown of tokens of vowel qualities).
Figures 4 and 5 show the average vowel spaces across all male and female speakers, respectively.While there is significant overlap between the front and back vowels for both men and women, it isimportant to note that high and low vowels remain distinct, making this data useful for studying therelationship between vowel height and voice quality. Note that the considerable overlap between /i/and /u/ is typical of the Pacific Northwest, where /u/ is fronted (Wassink, 2015, Freeman, 2015). Avowel space for each speaker can be found in Figure A.3, and a Nearey2-normalized aggregate vowelspace for all speakers in Figure A.4 of the Appendix. Vowel spaces were normalized and producedusing the phonR package (McCloy, 2015) in R.
Figure 4: Average Vowel Space, Male Speakers
2The fundamental frequency of non-modal vowels is extremely difficult to accurately calculate, causing formantsto also be difficult to determine.
8
Figure 5: Average Vowel Space, Female Speakers
4.2 Vowel Quality and Voice Quality - Descriptive Results
To examine the relationship between phonation type and vowel quality, the frequency of each phona-tion type was calculated for each of the corner vowels. The descriptive results are summarized inTable 2 and illustrated in Figure 6, a stacked bar graph showing the relative frequencies of the threephonation types for the four corner vowels /æ/, /A/, /i/ and /u/. Of the 850 tokens of the low frontvowel /æ/, 7.41% were breathy, 59% were modal, and 34.59% were creaky. Of the 496 tokens ofthe low back vowel /A/, 5.24% were breathy, 68.35% were modal, and 26.41% were creaky. Of the698 tokens of the high front vowel /i/, 6.88% were breathy, 77.79% were modal, and 15.33% werecreaky. Of the 245 tokens of the high back vowel /u/, 3.37% were breathy, 79.52% were modal, and17.11% were creaky.
Table 2: Phonation Types, by Vowel Quality, Totals (Percentages)
Vowel Breathy Modal Creaky Totalæ 63 (7.41%) 493 (58%) 294 (34.59%) 850 (34.57%)A 26 (5.24 %) 339 (68.35%) 131 (26.41%) 496 (20.17%)i 48 (6.88 %) 543 (77.79 %) 107 (15.33 %) 698 (28.39%)u 14 (3.37 %) 330 (79.52 %) 71 (17.11 %) 415 (16.88%)Total 151 (6.14%) 1705 (69.33%) 603 (24.53%) 2459
9
Male Speakers Female Speakers
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion11/20
Results – Vowel Height and Creak
• Low vowels are significantly more likely to
be creaky than high vowels • (χ2(1, N = 2459) = 83.58, p < .001)
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion12/20
Vowel Height Breathy Modal Creaky Total Low 89 (6.6%) 832 (61.8%) 425 (31.6%) 1346 (54.7%) High 62 (5.6%) 873 (78.4%) 178 (16%) 1113 (45.3%) Total 151 (6.1%) 1705 (69.3%) 603 (24.5%) 2459
Results – Vowel Height and Creak
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion13/20
Creaky à
phonaGon
low high
Results - Gender • Do women use creaky voice more than men?
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion14/20
Breathy Modal Creaky Total Male 22 (3.1%) 514 (72.1%) 177 (24.8%) 713 Female 129 (7.4%) 1191 (68.2%) 426 (24.4%) 1746
Results - Gender • Do women use creaky voice more than men?
Women Men
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion15/20
phonaGon
Discussion
• Intrinsic Fundamental Frequency (IF0): low vowels have a lower pitch than high vowels (Whalen and Levitt, 1995) – The tongue position required in high vowels
pulls on the larynx, increasing tension on the vocal folds à higher F0
– Creaky voice is produced with low longitudinal tension on the vocal folds, so it would be harder to achieve on high vowels
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion16/20
Conclusion • Low vowels are more likely to be creaky
than high vowels in this corpus of PNW English
• Men and women creak with the same frequency
• Physiology may underpin this pattern – high vowels cause more vocal fold tension than low vowels, and creaky voice requires low tension
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion17/20
Future Directions
• What about breathy voice? • Does this pattern hold in other dialects or
languages? • In languages that use phonation
contrastively, are creaky high vowels as frequent as creaky low vowels in the inventory? (Check back with me on this in a few weeks! J)
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion18/20
References
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion20/20
Esling, J. H. (2006). Voice Quality. In Encyclopedia of Language and Linguistics, pages 470–474. Oxford: Elsevier Freeman, V. (2015). The Phonetics of Stance-Taking. PhD thesis, University of Washington. Gordon, M. and Ladefoged, P. (2001). Phonation types: a cross-linguistic overview. Journal of Phonetics, 29:283–406. Ladefoged, P. and Johnson, K. (2015). A Course in Phonetics. Wadsworth, 7 edition. Laver, J. (1980). The Phonetic Description of Voice Quality. Cambridge University Press, 1 edition. McCloy, D. (2015). phonR: tools for phoneticians and phonologists. R package version 1.0-3. Ohala, J. J. and Eukel, B. W. (1987). Explaining the intrinsic pitch of vowels. In Channon, R. and Shockey, L., editors, In honor of Ilse Lehiste, pages 207–215. Dordrecht. Raphael, L. J., Borden, G. J., and Harris, K. S. (2007). Speech Science Primer: Physiology, Acoustics, and Perception of Speech. Lippincott Williams & Wilkins, 5 edition. Wassink, A. (2015). Sociolinguistic patterns in Seattle English. Language Variation and Change, 27:31–58. Whalen, D. H. and Levitt, A. G. (1995). The universality of intrinsic f0 of vowels. Journal of Phonetics, 23:349–366.
Phonetic Stop Words Excluded from Analysis
A Appendices
Figure A.1: Phonetic Stop Words
aaboutallamanandanyareasatbebeenbeforebeingbutbycancantcan’t
causecouldcuzdiddodoesdoingdontdon’tdunnoelseemceptfewforfromgetgetsgoing
gonnagotgottenhadhashavehaven’thaventhavinheherhershimhishowii’didif
i’miminisititsjustkkayletlet’sletslikelotmaymemyndof
onorouroursoutownrsheshouldsosomestillthatthatsthat’sthetheirtheirsthem
thentheretheresthere’sthesetheythisthosetiltilltouhusumverywannawantwantswas
wewellwentwerewhatwhenwherewhichwhilewhowillwithwouldyouyouryours
16
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion22/20
Phonation Type – Corner Vowels Figure 6: Phonation Type for Corner Vowels
No strong relationship emerges between vowel quality and breathy voice; front vowels seem tobe very slightly more frequently breathy than back vowels, though this difference appears trivial.More noticeable are differences in creaky voice - low vowels are more frequently creaky than highvowels. Because phonation appears to pattern by height, and the focus of this study is creaky voice,I will continue my analysis with data grouped into low (/æ/ & /A/) and high (/i/ & /u/) vowels.
4.3 Vowel Height and Voice Quality - Results
The results of Section 4.2 support collapsing /æ/ & /A/ into low vowels and /i/ & /u/ into highvowels, as they pattern similarly regarding voice quality. These two categories were submitted to achi square test of independence to compare the relationship between vowel height and phonation.
The descriptive statistics regarding vowel height and vowel quality are summarized in Table 3and illustrated in Figure 7. Of the 1346 tokens of low vowels, 6.61% were breathy, 61.81% weremodal, and 31.58% were creaky. Of the 1113 tokens of high vowels, 5.57% were breathy, 78.44%were modal, and 15.99% were creaky.
10
æɑiu
Figure 6: Phonation Type for Corner Vowels
No strong relationship emerges between vowel quality and breathy voice; front vowels seem tobe very slightly more frequently breathy than back vowels, though this difference appears trivial.More noticeable are differences in creaky voice - low vowels are more frequently creaky than highvowels. Because phonation appears to pattern by height, and the focus of this study is creaky voice,I will continue my analysis with data grouped into low (/æ/ & /A/) and high (/i/ & /u/) vowels.
4.3 Vowel Height and Voice Quality - Results
The results of Section 4.2 support collapsing /æ/ & /A/ into low vowels and /i/ & /u/ into highvowels, as they pattern similarly regarding voice quality. These two categories were submitted to achi square test of independence to compare the relationship between vowel height and phonation.
The descriptive statistics regarding vowel height and vowel quality are summarized in Table 3and illustrated in Figure 7. Of the 1346 tokens of low vowels, 6.61% were breathy, 61.81% weremodal, and 31.58% were creaky. Of the 1113 tokens of high vowels, 5.57% were breathy, 78.44%were modal, and 15.99% were creaky.
10
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion23/20
Figure A.5: Phonation Types by Vowel Height, For Each Speaker
20
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion24/20
Longitudinal Tension
• “the degree of stretching force” on the vocal folds (Zemlin, 1998)
• Controlled by the thyroarytenoid muscle • Creaky voice involves low LT – Shorter vocal folds – More mass per unit length – Slower vibration à lower F0
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion25/20
Laryngeal Cartilages and Parameters
2.1.1 Physiology of Phonation Types
Laver (1980) describes three “laryngeal parameters” that, when combined in various permutations,produce different phonation types. These parameters are longitudinal tension, medial compression,and adductive tension; they are determined by actions of the muscles controlling the cartilagesaround the vocal folds - the thyroid, cricoid, arytenoid, and posterior criocoarytenoid cartilages.Figure 2 (after Laver p. 109) illustrates the cartilages and laryngeal parameters.
Figure 2: Laryngeal Cartilages and Parameters, after Laver
!
!
Spread Breathy Modal Creaky Closed Voiceless Voiceless
Thyroid Cartilage
Cricoid Cartilage
Arytenoid Cartilage
Longitudinal Tension
Medial Compression
Adductive Tension
Posterior Cricoarytenoid Cartilage
Medial compression is the amount of force bringing the vocal folds together at the midline. Thiscompression determines how much the vocal folds are approximated (Zemlin, 1998), and is controlledby various muscles. The lateral cricoarytenoid muscle (connecting the cricoid and arytenoid carti-lages) rotates the arytenoid cartilages, bringing them towards the midline. The arytenoid cartilagesare also brought together in adductive tension, caused by the interarytenoid muscles. These twoforces bringing the arytenoid cartilages together at one end of the glottis, combined with increasedtension in the thyroarytenoid muscle, increases medial compression.
Longitudinal tension is “the degree of stretching force” on the vocal folds (Zemlin, 1998). It iscontrolled by the thyroarytenoid muscle, which connects the thyroid and arytenoid cartilages. Whenunopposed, its contraction reduces longitudinal tension by shortening the vocal folds, causing themto have more mass per unit length and therefore to vibrate more slowly, resulting in a lower fun-damental frequency (Raphael et al., 2007). (However, when the contraction of the thyroarytenoidmuscle is opposed, vocal fold tension increases (Zemlin, 1998).) The cricothyroid muscle, connect-ing the cricoid and thyroid cartilages, also impacts the length of (and therefore tension on) thevocal folds; its contraction decreases the distance between the cricoid and thyroid cartilages, whichincreases longitudinal tension (Zemlin, 1998).
These three forces on the vocal folds - medial compression, adductive tension, and longitudinaltension - work together in various ways to produce the full range of phonation types seen in Figure1. The following is an overview of the laryngeal settings involved in each phonation type.
2
ResearchQ&Hypotheses–Background–Methods–Results–Discussion–Conclusion26/20