Contextually dependent cue realization and cue weighting for a laryngeal contrast in Shanghai Wu a) Jie Zhang b) Department of Linguistics, University of Kansas, 1541 Lilac Lane, Lawrence, Kansas 66045, USA Hanbo Yan School of Chinese Studies and Exchange, Shanghai International Studies University, Shanghai 200083, China (Received 2 November 2017; revised 3 August 2018; accepted 16 August 2018; published online 11 September 2018) Phonological categories are often differentiated by multiple phonetic cues. This paper reports a pro- duction and perception study of a laryngeal contrast in Shanghai Wu that is not only cued in multi- ple dimensions, but also cued differently on different manners (stops, fricatives, sonorants) and in different positions (non-sandhi, sandhi). Acoustic results showed that, although this contrast has been described as phonatory in earlier literature, its primary cue is in tone in the non-sandhi con- text, with vowel phonation and consonant properties appearing selectively for specific manners of articulation. In the sandhi context where the tonal distinction is neutralized, these other cues may remain depending on the manner of articulation. Sonorants, in both contexts, embody the weakest cues. The perception results were largely consistent with the aggregate acoustic results, indicating that speakers adjust the perceptual weights of individual cues for a contrast according to manner and context. These findings support the position that phonological contrasts are formed by the integra- tion of multiple cues in a language-specific, context-specific fashion and should be represented as such. V C 2018 Acoustical Society of America. https://doi.org/10.1121/1.5054014 [MS] Pages: 1293–1308 I. INTRODUCTION A standard assumption about phonological contrast is that it is categorical, based on either segments (/p/ vs /b/) or features ([voice] for /p/, [þvoice] for /b/; Jakobson et al., 1952; Chomsky and Halle, 1968; Stevens, 2002; Clements, 2009). A major challenge for phoneticians and phonologists alike is to account for how speakers categorize gradient and variable acoustic signals into such discrete entities. Two salient aspects of this challenge relate to how featural con- trasts are instantiated acoustically. First, contrasts are often differentiated by multiple acoustic cues. The stop voicing contrast in English, for example, is associated with differ- ences in voice-onset time (VOT), closure duration, f0 of the following vowel, and a host of other acoustic properties (Lisker, 1986). Second, the acoustic cues for the same con- trast often depend on the phonological context in which the contrast appears. For instance, the English voicing contrast would not benefit from the f0 cue of the following vowel in the final position, but would benefit from a duration differ- ence on the vowel preceding it (Chen, 1970; Raphael, 1972). The investigations of how a contrast is acoustically realized in a multidimensional fashion, how the different acoustic cues are weighted in the perception of the contrast, and how the weighting is affected by the acoustic dimensions along which the cues vary, the distributional characteristics of the acoustic cues, the context in which the contrast appears, and the listeners’ language background have contributed to sig- nificant theoretical issues in phonetics and phonology, such as the mode of speech perception (Repp, 1983; Parker et al., 1986; Massaro, 1987), the nature of distinctive features (Halle and Stevens, 1971; Kingston, 1992; Stevens and Keyser, 2010), the production-perception link (Newman, 2003; Shultz et al., 2012; DiCanio, 2014), the influence of phonological knowledge of a language on perception (Massaro and Cohen, 1983; Flege and Wang, 1989; Dupoux et al., 1999; Halle and Best, 2007), the theories of perceptual contribution of secondary cues (Holt et al., 2001; Francis et al., 2008; Kingston et al., 2008; Llanos et al., 2013), and the mechanisms of phonetic category learning (Clayards et al., 2008; Toscano and McMurray, 2010; McMurray et al., 2011). This paper contributes to this scholarship by presenting a case study on the cue realization and cue weighting of a laryngeal contrast on different segments in different contexts in Shanghai Wu. Like many Wu dialects of Chinese, Shanghai has a three-way distinction among voiceless aspi- rated, voiceless unaspirated, and voiced stops. The voiced series, however, is not realized with typical closure voicing, but is known as “voiceless with voiced aspiration” (Chao, 1967), indicating the involvement of breathy phonation. On fricatives, there is a two-way voicing contrast, whereby the voiced fricatives are truly voiced, and on sonorants, there is a modal-murmured distinction that corresponds to the a) Portions of this work were presented at the 18th International Congress of Phonetic Sciences, Glasgow, Scotland, UK; the 89th annual meeting of the Linguistic Society of America, Portland, OR; and the 22nd annual meeting of the International Association of Chinese Linguistics in conjunction with the 26th North American Conference on Chinese Linguistics, College Park, MD. b) Electronic mail: [email protected]J. Acoust. Soc. Am. 144 (3), September 2018 V C 2018 Acoustical Society of America 1293 0001-4966/2018/144(3)/1293/16/$30.00
16
Embed
Contextually dependent cue realization and cue weighting ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Contextually dependent cue realization and cue weighting for alaryngeal contrast in Shanghai Wua)
Jie Zhangb)
Department of Linguistics, University of Kansas, 1541 Lilac Lane, Lawrence, Kansas 66045, USA
Hanbo YanSchool of Chinese Studies and Exchange, Shanghai International Studies University, Shanghai 200083, China
(Received 2 November 2017; revised 3 August 2018; accepted 16 August 2018; published online11 September 2018)
Phonological categories are often differentiated by multiple phonetic cues. This paper reports a pro-
duction and perception study of a laryngeal contrast in Shanghai Wu that is not only cued in multi-
ple dimensions, but also cued differently on different manners (stops, fricatives, sonorants) and in
different positions (non-sandhi, sandhi). Acoustic results showed that, although this contrast has
been described as phonatory in earlier literature, its primary cue is in tone in the non-sandhi con-
text, with vowel phonation and consonant properties appearing selectively for specific manners of
articulation. In the sandhi context where the tonal distinction is neutralized, these other cues may
remain depending on the manner of articulation. Sonorants, in both contexts, embody the weakest
cues. The perception results were largely consistent with the aggregate acoustic results, indicating
that speakers adjust the perceptual weights of individual cues for a contrast according to manner
and context. These findings support the position that phonological contrasts are formed by the integra-
tion of multiple cues in a language-specific, context-specific fashion and should be represented as such.VC 2018 Acoustical Society of America. https://doi.org/10.1121/1.5054014
[MS] Pages: 1293–1308
I. INTRODUCTION
A standard assumption about phonological contrast is
that it is categorical, based on either segments (/p/ vs /b/) or
features ([�voice] for /p/, [þvoice] for /b/; Jakobson et al.,1952; Chomsky and Halle, 1968; Stevens, 2002; Clements,
2009). A major challenge for phoneticians and phonologists
alike is to account for how speakers categorize gradient and
variable acoustic signals into such discrete entities. Two
salient aspects of this challenge relate to how featural con-
trasts are instantiated acoustically. First, contrasts are often
differentiated by multiple acoustic cues. The stop voicing
contrast in English, for example, is associated with differ-
ences in voice-onset time (VOT), closure duration, f0 of the
following vowel, and a host of other acoustic properties
(Lisker, 1986). Second, the acoustic cues for the same con-
trast often depend on the phonological context in which the
contrast appears. For instance, the English voicing contrast
would not benefit from the f0 cue of the following vowel in
the final position, but would benefit from a duration differ-
ence on the vowel preceding it (Chen, 1970; Raphael, 1972).
The investigations of how a contrast is acoustically realized
in a multidimensional fashion, how the different acoustic
cues are weighted in the perception of the contrast, and how
the weighting is affected by the acoustic dimensions along
which the cues vary, the distributional characteristics of the
acoustic cues, the context in which the contrast appears, and
the listeners’ language background have contributed to sig-
nificant theoretical issues in phonetics and phonology, such
as the mode of speech perception (Repp, 1983; Parker et al.,
1986; Massaro, 1987), the nature of distinctive features
(Halle and Stevens, 1971; Kingston, 1992; Stevens and
Keyser, 2010), the production-perception link (Newman,
2003; Shultz et al., 2012; DiCanio, 2014), the influence of
phonological knowledge of a language on perception
(Massaro and Cohen, 1983; Flege and Wang, 1989; Dupoux
et al., 1999; Hall�e and Best, 2007), the theories of perceptual
contribution of secondary cues (Holt et al., 2001; Francis
et al., 2008; Kingston et al., 2008; Llanos et al., 2013), and
the mechanisms of phonetic category learning (Clayards
et al., 2008; Toscano and McMurray, 2010; McMurray
et al., 2011).
This paper contributes to this scholarship by presenting
a case study on the cue realization and cue weighting of a
laryngeal contrast on different segments in different contexts
in Shanghai Wu. Like many Wu dialects of Chinese,
Shanghai has a three-way distinction among voiceless aspi-
rated, voiceless unaspirated, and voiced stops. The voiced
series, however, is not realized with typical closure voicing,
but is known as “voiceless with voiced aspiration” (Chao,
1967), indicating the involvement of breathy phonation. On
fricatives, there is a two-way voicing contrast, whereby the
voiced fricatives are truly voiced, and on sonorants, there is
a modal-murmured distinction that corresponds to the
a)Portions of this work were presented at the 18th International Congress of
Phonetic Sciences, Glasgow, Scotland, UK; the 89th annual meeting of the
Linguistic Society of America, Portland, OR; and the 22nd annual meeting
of the International Association of Chinese Linguistics in conjunction with
the 26th North American Conference on Chinese Linguistics, College
indicating that voiceless fricatives again induced a sharper
peak for the CPP curve on the following vowel. These results
are given in Figs. 4 and 5.4
FIG. 1. Duration of onset consonants in monosyllables and the second syllable of disyllables. *: p< 0.05; **: p< 0.01; ***: p< 0.001.
FIG. 2. H1*-H2* results over the duration of the vowels after stops, fricatives, and sonorants for monosyllables. Symbols represent observed data (vertical
lines indicate 6 SE) and lines represent growth curve model fits using cubic orthogonal polynomials. *: p< 0.05; **: p< 0.01; ***: p< 0.001.
1298 J. Acoust. Soc. Am. 144 (3), September 2018 Jie Zhang and Hanbo Yan
For the spectral and periodicity measures on the sonor-
ant consonants themselves, for monosyllables, the model for
H1*-H2* did not significantly improve with the addition of
voicing or its interactions with the linear, quadratic, and
cubic time terms (p> 0.75 for all comparisons), but the
model for CPP did improve with the addition of voicing on
the intercept [v2(1)¼ 4.818, p¼ 0.028] and the quadratic
time term [v2(1)¼ 4.064, p¼ 0.044]. Parameter estimates
indicated that the modal sonorants had an overall higher
CPP value than the murmured sonorants (voicing intercept:
estimate¼�1.815, SE¼ 0.510, t¼ 3.561, p¼ 0.005), and
the murmured sonorants had a more U-shaped curve than the
modal sonorants (voicing and quadratic time term interac-
For sonorant onsets on the second syllable of disyllables, the
models for H1*-H2* and CPP did not significantly improve
with the addition of voicing or its interactions with the lin-
ear, quadratic, and cubic time terms (p> 0.33 for all compar-
isons). The monosyllabic and disyllabic results are given in
Figs. 6 and 7, respectively.
3. f0
The f0 results for the monosyllables and the second syl-
lable of disyllables are given in Figs. 8 and 9, respectively.
For monosyllables, the addition of voicing improved the
model for the stops [v2(1)¼ 8.350, p¼ 0.004] and fricatives
[v2(1)¼ 15.153, p< 0.001], and the addition of its interac-
tion with the linear time term improved the model for the fri-
catives [v2(1)¼ 11.224, p< 0.001] and sonorants [v2(1)
¼ 4.472, p¼ 0.034]. Parameter estimates for the full model,
which include the effects of voicing and its interaction with
the linear, quadratic, and cubic time terms for the three man-
ners are summarized in Table III. With the voiceless/modal
category as the baseline, the negative intercepts indicated
that the f0s after the voiced/murmured consonants were sig-
nificantly lower than those after the voiceless/modal conso-
nants, and the positive coefficients for the interaction
between voicing and the linear time term indicated that the
f0s after the voiced/murmured consonants had sharper rising
slopes than those after the voiceless/modal consonants;
therefore, the f0 difference between the two types of onsets
decreased over the duration of the vowel. For the second syl-
lable in disyllables, however, only for the fricatives did the
addition of the laryngeal feature significantly improve the
model [v2(1)¼ 3.849, p¼ 0.050]. No other model compari-
sons were significant (all p> 0.12). Parameter estimates for
the full models indicated that the effects of voicing on the
intercept or higher time terms were not significant for any
manner, including the fricatives.
4. Linear discriminant analysis
Consonant duration and CPP and f0 values averaged
over the entire vowel duration were used as the acoustic vari-
ables in the linear discriminant analysis. These variables
were selected as representatives of the acoustic properties of
the consonant, vowel phonation, and vowel f0. Consonant
duration was selected as the consonant cue as previous stud-
ies have primarily shown the perceptual effect of duration
(e.g., Wang, 2011; Gao and Hall�e, 2013; Gao, 2015), and
Wang (2011) has shown that listeners did not use closure
voicing as a perceptual cue for stops. CPP was selected as
the phonation cue as our acoustic results above showed
stronger CPP effects than H1*-H2*. The variables were cen-
tered and scaled before being submitted to the discriminant
analysis.
Table IV summarizes the coefficients for the variables
for the linear discriminant functions as well as the Wilks’slambda, F, and p values for the discriminations. Significant
FIG. 3. CPP results over the duration of the vowels after stops, fricatives, and sonorants for monosyllables. Symbols represent observed data (vertical lines
indicate 6SE) and lines represent growth curve model fits using cubic orthogonal polynomials. *: p< 0.05; **: p< 0.01; ***: p< 0.001.
FIG. 4. H1*-H2* results over the duration of the vowels after stops, fricatives, and sonorants for the second syllable of disyllables. Symbols represent observed
data (vertical lines indicate 6SE) and lines represent growth curve model fits using cubic orthogonal polynomials. *: p< 0.05; **: p< 0.01; ***: p< 0.001.
J. Acoust. Soc. Am. 144 (3), September 2018 Jie Zhang and Hanbo Yan 1299
FIG. 6. H1*-H2* and CPP results over
the duration of sonorant onsets for
monosyllables. Symbols represent
observed data (vertical lines indicate
6SE) and lines represent growth curve
model fits using cubic orthogonal poly-
nomials. *: p< 0.05; **: p< 0.01; ***:
p< 0.001.
FIG. 7. H1*-H2* and CPP results over
the duration of sonorant onsets for the
second syllable of disyllables. Symbols
represent observed data (vertical lines
indicate 6SE) and lines represent
growth curve model fits using cubic
orthogonal polynomials. *: p< 0.05;
**: p< 0.01; ***: p< 0.001.
FIG. 8. Normalized f0 results over the duration of the vowels after stops, fricatives, and sonorants for monosyllables. Symbols represent observed data (verti-
cal lines indicate 6SE) and lines represent growth curve model fits using cubic orthogonal polynomials. *: p< 0.05; **: p< 0.01; ***: p< 0.001.
FIG. 9. Normalized f0 results over the duration of the vowels after stops, fricatives, and sonorants for the second syllable of disyllables. Symbols represent
observed data (vertical lines indicate 6SE) and lines represent growth curve model fits using cubic orthogonal polynomials. *: p< 0.05; **: p< 0.01; ***:
p< 0.001.
FIG. 5. CPP results over the duration of the vowels after stops, fricatives, and sonorants for the second syllable of disyllables. Symbols represent observed
data (vertical lines indicate 6SE) and lines represent growth curve model fits using cubic orthogonal polynomials. *: p< 0.05; **: p< 0.01; ***: p< 0.001.
1300 J. Acoust. Soc. Am. 144 (3), September 2018 Jie Zhang and Hanbo Yan
predictors, as indicated by stepwise variable selection, are
given in bold. “Voiceless/modal” was dummy coded as 0.
Therefore, a negative coefficient for a factor indicates that a
higher value for that factor is more likely to lead to a
“voiceless/modal” classification. For monosyllables (non-
sandhi), the only consistent predictor was f0; but for frica-
tives, both CPP and duration were significant as well, and
the stepwise analysis selected f0 first, then CPP, followed by
duration. For the second syllable in disyllables (sandhi), only
the fricatives could be significantly discriminated, and the
stepwise analysis selected duration first, then CPP.
C. Discussion
The acoustic results above indicate that this laryngeal
contrast in Shanghai is primarily a tone contrast in the non-
sandhi context (monosyllables), as although the H1*-H2* and
CPP comparisons between the voiceless/modal and voiced/
murmured categories were generally in the expected direction,
with the voiceless/modal consonants exhibiting numerically
lower H1*-H2* and higher CPP on the following vowel than
the voiced/murmured ones, only the CPP comparison for fri-
catives reached significance under the growth curve analysis;
f0 curves on the vowels after voiceless/modal and voiced/mur-
mured consonants, however, differed significantly on both the
intercept and slope for all three manners except for the slope
for stops. There are indications that the consonants themselves
still played a role in the contrast as the fricatives exhibited a
duration difference, while the sonorants exhibited a CPP dif-
ference based on the contrast. Moreover, the attenuation of
the f0 difference over the vowel after voiceless/modal vs
voiced/murmured consonants also suggests that the f0 differ-
ence, at least in part, stems from the onset consonants. The
LDAs provided the relative weighting of the acoustic cues
from consonant duration, vowel phonation, and vowel f0 and
corroborated the acoustic finding that the laryngeal contrast in
the non-sandhi context is primarily tonal, with secondary cues
from CPP and consonant duration for the fricatives.
In the sandhi context (second syllable of disyllables), the f0difference was neutralized, but the stops gained a voicing differ-
ence despite losing the closure duration difference, and the frica-
tives exhibited both duration and voicing differences. For the
sonorants, however, no difference between the modal and mur-
mured categories was detected in consonant duration, consonant
phonation, vowel phonation, or f0. The LDAs did not encode
the effect of voicing, but confirmed that f0 cannot be used to dis-
criminate the contrast, and that fricatives have enough second-
ary cues in duration and CPP to be differentiated.
These results show that the acoustic cues for the contrast
indeed vary by the manner and position in which the contrast
is realized. In the sandhi position where a phonological pro-
cess presumably neutralizes the main cue for the contrast—
f0, the contrast itself is incompletely neutralized for frica-
tives and arguably for stops, but completely neutralized for
sonorants as far as the measures included here are concerned.
The weakness of this contrast on sonorants hence finds some
support in the results.
Unlike in previous studies (e.g., Cao and Maddieson, 1992;
Ren, 1992; Gao, 2015), the H1*-H2* and CPP results here gen-
erally did not show a significant effect of the laryngeal feature.
For f0, although we showed that it significantly covaried with
the consonant feature in the non-sandhi context—a result shared
by all previous research—we did not find incomplete neutraliza-
tion in the sandhi context indicated by Ren (1992), Chen
(2011), and Wang (2011). There are two potential reasons for
these disparities. One is that, given our speakers were consider-
ably younger than the speakers used in earlier studies, it is possi-
ble that Shanghai is gradually losing the phonation difference,
and the contrast is now primarily cued by tone in the younger
generations (see Gao, 2015; Gao and Hall�e, 2016, 2017, for age
and gender-based differences that support this contention).
Another possibility is that the different results are partly due to
the different statistical methods used. In the linear mixed-
effects-based growth curve analyses, the random effects struc-
ture included not only subject and item, but also subject-by-
voicing interaction. This helps reduce the type I error in hypoth-
esis testing (Barr et al., 2013), in this case, the effect of voicing.
TABLE III. Parameter estimates for the monosyllable f0 analysis. Baseline
TABLE IV. Coefficients for the variables for the linear discriminant functions, as well as the Wilks’s lambda, F, and p values for the discriminations.
TABLE VI. Acoustic measures of the base tokens for the perception experiment as well as when the f0 of the base tokens was switched to that of the other
laryngeal category (given in parentheses). H1*-H2*, CPP, and f0 were the average values over the vowel.
Kuznetsova, A., Brockhoff, B., and Christensen, H. (2016). “Tests in linear
mixed effects models,” available at https://cran.r–project.org/web/packages/
lmerTest/index.html (Last viewed August 3, 2018).
Laver, J. (1980). The Phonetic Description of Voice Quality (Cambridge
University Press, Cambridge, UK).
Lemon, S. C., Roy, J., Clark, M. A., Friedmann, P. D., and Rakowski, W.
(2003). “Classification and Regression Tree analysis in public health:
Methodological review and comparison with logistic regression.” Ann.
Behav. Med. 36, 172–180.
Lisker, L. (1986). “‘Voicing’ in English: A catalogue of acoustic features
signalling /b/ versus /p/ in trochees,” Lang. Speech 29, 3–11.
Llanos, F., Dmitrieva, O., Shultz, A., and Francis, A. L. (2013). “Auditory
enhancement and second language experience in Spanish and English
weighting of secondary voicing cues,” J. Acoust. Soc. Am. 134,
2213–2224.
Massaro, D. W. (1987). “Psychophysics versus specialized processes in
speech perception: An alternative perspective,” in The Psychophysics ofSpeech Perception, edited by M. E. H. Schouten (Martinus Mijhoff,
Boston), pp. 46–65.
Massaro, D., and Cohen, M. (1983). “Phonological context in speech
perception,” Percept. Psychophys. 34, 338–348.
McMurray, B., Cole, J. S., and Munson, C. (2011). “Features as an emergent
product of computing perceptual cues relative to expectations,” in WhereDo Phonological Features Come From?: Cognitive, Physical andDevelopmental Bases of Distinctive Speech Categories, edited by G. N.
Clements, and R. Ridouane (John Benjamins, Amsterdam/Philadelphia),
pp. 197–235.
Mikuteit, S., and Reetz, H. (2007). “Caught in the ACT: The timing of aspi-
ration and voicing in Bengali,” Lang. Speech 50, 247–277.
Miller, A. L. (2007). “Guttural vowels and guttural co-articulation in
Juj’hoansi,” J. Phonetics 35, 56–84.
Mirman, D. (2014). Growth Curve Analysis and Visualization Using R(CRC Press, Boca Raton, FL).
Newman, R. S. (2003). “Using links between speech perception and speech
production to evaluate different acoustic metrics: A preliminary report,”
J. Acoust. Soc. Am. 113, 2850–2860.
Parker, E. M., Diehl, R. L., and Kluender, K. R. (1986). “Trading relations
in speech and nonspeech,” Percept. Psychophys. 39, 129–142.
Port, R., and Crawford, P. (1989). “Incomplete neutralization and pragmat-
ics in German,” J. Phonetics 17, 257–282.
R Core Team (2014). “R: A language and environment for statistical com-
puting (version 3.1.0),” (R Foundation for Statistical Computing, Vienna),
available at http://www.R-project.org/ (Last viewed October 10, 2017).
Raphael, L. J. (1972). “Preceding vowel duration as a cue to the perception
of the voicing characteristic of word-final consonants in American
English,” J. Acoust. Soc. Am. 51, 1296–1303.
Ren, N.-Q. (1992). “Phonation types and stop consonant distinctions:
Shanghai Chinese,” Ph.D. dissertation, University of Connecticut, Storrs.
Repp, B. H. (1983). “Trading relations among acoustic cues in speech per-
ception are largely a result of phonetic categorization,” Speech Commun.
2, 341–361.
Shen, Z.-W., and Wang, W. S. (1995). “Wuyu zhuoseyin de yanjiu—Tongji
shang de fenxi he lilun shang de kaol€u” (“A study of voiced stops in thje
Wu dialects—Statistical analysis and theoretical considerations”), in
Wuyu Yanjiu (Studies of the Wu Dialects), edited by E. Zee (New Asia
Books, Hong Kong), pp. 219–238.
Shue, Y.-L., Keating, P., Vicenik, C., and Yu, K. (2011). “VoiceSauce: A
Program for Voice Analysis,” available at http://www.ee.ucla.edu/~spapl/
voicesauce/ (Last viewed November 1, 2015).
Shultz, A. A., Francis, A. L., and Llanos, F. (2012). “Differential cue
weighting in perception and production of consonant voicing,” J. Acoust.
Soc. Am. 132, EL95–EL101.
Sj€olander, K. (2004). “The Snack Sound Toolkit,” available at http://
www.speech.kth.se/snack/ (Last viewed March 2, 2018).
Steriade, D. (1997). “Phonetics in phonology: The case of laryngeal neu-