3 rd tone sandhi in Standard Chinese: A corpus approach Jiahong Yuan 1 and Yiya Chen 2 University of Pennsylvania 1 , Leiden University 2 Abstract In Standard Chinese, a low tone (Tone3) is often realized with a rising F 0 contour before another low tone; this tone change is known as the 3 rd tone sandhi. This study investigated the acoustic characteristics of the 3 rd tone sandhi in Standard Chinese in telephone conversations and broadcast news speech. The sandhi rising tone was found to be different from the lexical rising tone (Tone2) in disyllabic words in two measures: the magnitude of the F 0 rise and the time span of the F 0 rise. We also found that word frequency affected the realization of the sandhi rising tone. Specifically, the sandhi rising tone in highly frequent words exhibited a smaller F 0 rise (i.e., a greater difference from the lexical rising tone) than that observed in less frequent words. This result suggests that different processes may be involved in producing high- vs. low-frequency words in Chinese. Key Words: Tone, Tone sandhi, Conversation, Radio news, Corpus
23
Embed
3 tone sandhi in Standard Chinese: A corpus approachjiahong/publications/j03.pdf · 3rd tone sandhi in Standard Chinese: ... tones frequently undergo changes in connected speech,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
3rd tone sandhi in Standard Chinese: A corpus approach
Jiahong Yuan1 and Yiya Chen2
University of Pennsylvania1, Leiden University2
Abstract
In Standard Chinese, a low tone (Tone3) is often realized with a rising F0 contour
before another low tone; this tone change is known as the 3rd tone sandhi. This study
investigated the acoustic characteristics of the 3rd tone sandhi in Standard Chinese in
telephone conversations and broadcast news speech. The sandhi rising tone was found to
be different from the lexical rising tone (Tone2) in disyllabic words in two measures: the
magnitude of the F0 rise and the time span of the F0 rise. We also found that word
frequency affected the realization of the sandhi rising tone. Specifically, the sandhi rising
tone in highly frequent words exhibited a smaller F0 rise (i.e., a greater difference from
the lexical rising tone) than that observed in less frequent words. This result suggests that
different processes may be involved in producing high- vs. low-frequency words in
Chinese.
Key Words: Tone, Tone sandhi, Conversation, Radio news, Corpus
1. Introduction
In lexical tone languages, in which fundamental frequency (F0) changes differentiate
word meanings, tones frequently undergo changes in connected speech, and surface with
F0 contours that differ from the canonical tonal shapes produced in isolation. This tonal
change process is commonly referred to as tone sandhi. During the last two decades, a
significant amount of research has been conducted regarding tone sandhi in various
Chinese dialects; this research culminated in the work by M. Chen (2000). Although
previous studies have greatly improved our understanding of the tone sandhi phenomena
in general, the weakness in most (if not all) studies is that the generalizations are
primarily based on introspective judgments or laboratory speech of a few speakers. Thus,
it is desirable to complement the existing literature by examining the realization of tone
sandhi in large data corpora with naturally occurring speech. The specific sandhi
phenomenon on which we focus in this paper is the 3rd (low) tone sandhi in Standard
Chinese, in which the first tone in a sequence of two low tones surfaces with a rising F0,
which is comparable to or neutralized with the 2nd lexical tone (rising) in the language.
Previous linguistic studies on the 3rd tone sandhi have mainly concerned with two
aspects of the phenomenon. The first aspect concerns the formation of the tone sandhi
domain (e.g., Shih, 1986; Zhang, 1988; Chen, 2000; Duanmu, 2000). The general
consensus in the literature is that disyllabic words with two low tones form a 3rd tone
sandhi domain, in which the first low tone changes to a sandhi rising (SR) tone. The
application of the 3rd tone sandhi across linguistic boundaries above the word level is
known to be determined by a number of factors such as syntactic structure, information
structure, speech prosody, and speaking rate (Speer et al., 1989; Shen, 1994; Shih, 1997;
Chen, 2003; Kuo et al., 2007).
The second aspect of 3rd tone sandhi concerns the exact phonetic nature of the derived
SR tone as compared with the lexical rising (LR) tone. The first well-known report
pertaining to the 3rd tone sandhi is Chao (1948), who described the change as the
replacement of the low tone with an LR tone. This view was challenged by two reports
that were published during the same period (Hockett, 1947; Martin, 1957); both
researchers described the SR tone in a stressed position as a new category that is similar
but not identical to the LR tone. In recent decades, the debate has been whether there is
indeed complete neutralization between the SR and LR tones, and if complete
neutralization is not present, then what are the acoustic parameters that differentiate these
tones? Zee (1980), who conducted the first instrumental investigation of the 3rd tone
sandhi to our knowledge, demonstrated that derived SR tones are pronounced with a
lower dip as well as a lower ending F0 than LR tones on the basis of two native speakers
of Beijing Mandarin. This subtle difference between SR and LR tones was supported by
later acoustic studies (Kratochvil, 1984; Shen, 1990; Xu, 1993, 1997; Peng, 2000; Kuo et
al., 2007), although varying magnitude of the difference between the two tones have been
reported. Based on a review of the literature and an acoustic study of the 3rd tone sandhi
in Taiwan Mandarin, Myers and Tsai (2003) proposed that the 3rd tone sandhi is
processed differently by different groups of Mandarin speakers: native speakers of
Beijing Mandarin apply the 3rd tone sandhi by phonetically modifying Tone3 so that it
sounds more similar to Tone2 whereas speakers of other varieties of Mandarin
categorically replace Tone3 with Tone2.
Despite the consistent trend of differences reported between SR and LR tones, it has
remained unclear whether listeners can hear the difference. Wang and Li (1967)
conducted the first perceptual experiment to test the ability of listeners to differentiate
between SR and LR tones. In the experiment, the subjects were asked to identify whether
a prerecorded word was an SR-Tone3 word (e.g., qi3ma3, ‘at least’) or a LR-Tone3 word
(e.g., qi2ma3, ‘to ride on a horse’). Their results demonstrated that the overall percentage
of accuracy ranged from 49.2% to 54.2% for the 14 listeners who did not participate in
the recording of the stimuli, suggesting that listeners cannot differentiate SR and LR
tones in word identification experiments. However, for the two subjects who recorded the
stimuli, the overall percentages of accuracy were above the chance level at 56.9% and
67.3%, respectively. Peng (2000) conducted a similar word identification experiment and
analyzed the identification results based on the signal detection theory (Macmillan and
Creelman, 2005). In her results, the mean sensitivity index A’ of the 15 listeners was
0.50, which suggested a random guess. However, there were two problems with her
conclusion. First, the standard deviation of A’ was very high (0.17); thus, there was a
significant variability in performance among the listeners. The second problem is that she
calculated the ratios of true and false positives in a manner that differs from that typically
applied in the signal detection theory1. Speer and Xu (2008) examined the time-course of
the resolution of lexical ambiguity from the 3rd tone sandhi by tracking the eye-
movements of listeners during a word-monitoring task. Surprisingly, they found that
1 In the study by Peng (2000), both the identification of Tone3 for underlying Tone2 and the identification of Tone2 for underlying Tone3 were considered false alarms. However, in standard signal detection theory, however, only one of them should be treated as false alarms, depending on which tone is treated as ‘positive’ or ‘alarm’.
when listeners heard an LR-Tone3 sequence, they made early glances at the character for
an SR tone, and when they heard an SR-Tone3 sequence, they made early glances at the
character for an LR Tone. Their result suggested that the listeners were sensitive to the
fine-grained phonetic differences between LR and SR tones.
The studies reviewed above were all based on laboratory speech, excluding the work
of Kratochvil (1984), which analyzed only one speaker. While we in general agree with
the importance and validity of laboratory speech in uncovering phonological patterns and
phonetic realizations (Xu, 2010), the small acoustic differences between the SR and LR
tones that were found in the previous studies must be examined using more naturally
occurring speech. The same argument has been offered regarding the nature of the
“incomplete neutralization” of the voicing contrast in a number of languages, such as
Dutch, German, and Catalan, in which underlying voiced word-final obstruents are
devoiced as a phonological process; however, phonetic studies have found small
differences between underlying voiced and voiceless word-final obstruents. There has
been an extensive debate in the literature regarding whether the incomplete neutralization
of final voicing was an experimental artifact of orthography or laboratory speech
(Fourakis and Iverson, 1984; Jassem and Richter, 1989; Port and Crowford, 1989;
Ernestus and Baayen, 2006; Warner et al., 2006; Kleber et al., 2010).
The goal of this study was to examine the acoustic difference between SR and LR
tones in large corpora of natural speech by expanding our preliminary work reported in
Chen and Yuan (2007). The use of large corpora also provides an opportunity to examine
possible word frequency effects on the acoustic realization of SR and LR tones. The
effect of frequency on speech production has been repeatedly reported in corpus studies
(e.g., Bybee 2002, on word-final /t/ and /d/ deletion rates; Patterson and Connine 2001,
on flap production; and Aylett and Turk 2004, on syllable duration). Zhao and Jurafsky
(2009) found that low-frequency words with mid-range tones in Cantonese are produced
with higher F0 than high-frequency words and that the F0 trajectories of less frequent
words are more dispersed than that of their more frequent counterparts. Zhang and Lai
(2010) demonstrated that “wug” words (i.e., pseudowords) are more resistant to the
application of the 3rd tone sandhi than real words for Mandarin speakers. For the purpose
of this paper, we examined the possible effect of word frequency on the acoustic
realization of SR tones as compared with that for LR tones.
2. Method
2.1. Data
Two large speech corpora were utilized in this study: the HKUST Mandarin Telephone
Speech (LDC2005S15) and the HUB4 Mandarin Broadcast News Speech (LDC98S73).
Broadcast news speech is formal read speech that is produced by well-trained
professional speakers of Standard Chinese; telephone conversation speech is produced by
typical speakers of Standard Chinese who may have different dialectal accents. Syllable
boundaries were automatically obtained through forced alignment using the Penn
Phonetics Lab Forced Aligner (Yuan and Liberman 2008). The CALLHOME Mandarin
Chinese Lexicon (LDC96L15) was used to identify words and tonal sequences from the
corpora.
We analyzed disyllabic words with four tonal sequences: low-low (T3+T3), low-rising
(T3+T2), rising-low (T2+T3), and rising-rising (T2+T2). The main comparison in this
paper is the realization of Tone3 and Tone2 when both tones are followed by Tone3. As a
control, we compared T3+T2 and T2+T2 sequences. Table 1 lists the total number of
tonal sequences used in the study.
Table 1: Total number of tokens for different tonal sequences.