Top Banner

of 18

Rythmic Metrics of Korean

Apr 13, 2018

Download

Documents

dooblah
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/27/2019 Rythmic Metrics of Korean

    1/18

    Rhythm Metrics of Spoken Korean*

    1)

    Tae-Yeoub Jang

    (Hankuk University of Foreign Studies)

    Tae-Yeoub Jang. 2009. Rhythm metrics of spoken Korean.

    Language and Linguistics. 46, 169-186. In this paper, the rhythm

    characteristics of spoken Korean are investigated. Instead of

    conventional methods in which rhythm events are examined by

    researchers' impressionistic judgments and interpretation,

    calculation of numerical measures are performed so that a large

    amount of data tokens can be processed. Besides, the technique of

    automatic speech recognition (ASR) is adopted for the purpose of

    future application of measured values into practical systems.Results show that some metrics are useful for characterizing the

    rhythm structure of Korean utterances as compared to that of other

    languages, especially those traditionally classified as stress-based

    language. It is also discussed whether the conventional dichotomy of

    rhythm classification is valid and plausible.

    Keywords: Korean rhythm, rhythm metrics, syllable-based rhythm, stress-based

    rhythm

    1. Introduction

    Traditional binary distinction of speech rhythm, i.e.,

    syllable-timed or stress-timed (Pike 1945, Abercrombie 1967), has

    been considerably weakened on the ground of phonetic studies

    * This work was supported by Hankuk University of Foreign Studies Research Fund

    of 2008.

  • 7/27/2019 Rythmic Metrics of Korean

    2/18

    170 46

    providing experimental evidence against inter-stress orinter-syllable isochrony (Pointon 1980, Dauer 1983, Roach 1995,

    among others). The terminology has also become more generalized

    to syllable-based and stress-based rhythm (Laver 1994: 528),

    which will be adopted in the current study.

    Recent studies on speech rhythm tend to be focused on

    discovering acoustic characteristics of rhythm that will distinguish

    one language from another regardless of whether it is classified aseither stress-based or syllable-based. Such metrics known as %V,

    V, C (Ramus et al. 1999), VarcoC, VarcoV (Dellwo 2006, 2008),

    nPVI-V, rPVI-C (Grabe and Low 2002) have been found to be

    useful for characterizing the rhythmic structure of various spoken

    languages.

    A number of studies have been investigated through one or more

    metrics for various languages, other than English, includingChinese (Lin and Wang 2007), Japanese (Murty et al. 2007),

    Thai-Polish (Grabe and Low 2002), Estonian (Grabe and Low 2002,

    Asu and Nolan 2005), and Dutch-French-Spanish (White and

    Mattys 2007). These studies attempt to show that cross linguistic

    comparisons of rhythm structures between various languages are

    convenient and valid when variability metrics are employed.

    Speech rhythm of Korean has most frequently been categorizedinto the class of language with the syllable-based rhythm. However,

    it has often been classified into stress-based (Lee 1982) or even

    mora-based (Cho 2004). However, it has not yet been

    systematically investigated in terms of various rhythm metrics

    which were discovered relatively recently. Consequently, in the

    current study, eight metrics (%V, V, C, VarcoC, VarcoV, nPVI-V,

    rPVI-C, and Speech rate) are utilized in an attempt to characterize

    the rhythm structure of spoken Korean. As it will be hard to

  • 7/27/2019 Rythmic Metrics of Korean

    3/18

    Rhythm Metrics of Spoken Korean 171

    interpret the results alone which are in the form of numericalvalues, they will be compared with corresponding values extracted

    from previous studies on rhythm metrics of other languages such as

    English, Dutch, Spanish, and French. This will lead to further

    discussions and tentative determination of which type of rhythm

    better describes the spoken Korean.

    The conventional way of rhythm investigation mainly depends

    upon the impressionistic judgment and interpretation of speechdata. Even most of the studies trying to characterize speech rhythm

    using fairly recent rhythm metrics more often than not adopt the

    method of calculating measures in a habitual way of manual data

    treatment. A shortcoming of this method is that only a relatively

    small amount of data can be examined. On the contrary, in the

    current study, a large number of speech tokens will be analyzed in fully

    automatical ways so that more reliable and robust results can be derived.These results can be directly applicable to practical systems such as speech

    recognition/synthesis machines and pronunciation education tools.

    2. Rhythm measures

    The basic idea underlying various rhythm metrics is that

    stress-based languages have more variable temporal characteristics

    among their vocalic and/or consonantal intervals of spoken

    utterances than syllable-based languages. The phonological contrast

    triggered by stressing or de-stressing of syllables will bring about

    such differences in acoustic quality, especially temporal

    characteristics, accompanying elongation of stressed syllables or

    shortening of unstressed or reduced syllables. Therefore, duration of

    syllables will become more variable in stress-based languages than

  • 7/27/2019 Rythmic Metrics of Korean

    4/18

    172 46

    in syllable-based languages. The syllable structure of stress-basedlanguages is also expected to be more complex as assigning stress is

    usually related to the weight of the syllables, for instance, by

    allowing consonant clusters in the onset and/or coda position of a

    syllable. This will generate greater variability among consonantal

    intervals.

    Eight rhythm metrics that are most frequently investigated in

    many languages have been employed in the current study. Thenotion of each metric can be summarized as in (1).

    (1) a. %V: proportion of vowel intervals (Ramus et al. 1999)

    b. V: raw variation of vowel intervals (Ramus et al. 1999)

    c. C: raw variation of consonantal intervals (Ramus et al.

    1999)

    d. VarcoC: rate-normalized variability of consonantal intervals

    (Dellwo 2006)e. VarcoV: rate-normalized variability of vocalic intervals

    (Dellwo 2006, White and Mattys 2007)

    f. nPVI-V: normalised Pairwise Variability Index (Grabe and

    Low 2002)

    g. rPVI-C: raw Pair-wise Variability Index (Grabe and Low

    2002)

    h. Speech rate: number of syllables per second

    The metric %V is calculated by dividing the total duration of

    vowel intervals by the total duration of the utterance. Utterance

    internal pauses of non-speech parts, which are frequently found in

    non-native speech, are to be excluded in the current study. The

    measures V and C are obtained by calculating standard

    deviations of vowel and consonant intervals, respectively. The VarcoV and

    VarcoC are rate-normalised versions of V and C. They are derived by

    dividing raw values by the mean duration of the utterance. The PVI values

    represent variations of adjacent intervals of vowels (nPVI-V) and consonants

  • 7/27/2019 Rythmic Metrics of Korean

    5/18

    Rhythm Metrics of Spoken Korean 173

    (rPVI-C). Using these pair-wise indices are expected to capture thecomplexities of neighboring syllables. The last metric, speech rate, has been

    included in order to check whether the other metrics play a sustained role as

    rhythm type discriminator regardless of articulation rate of utterances as well

    as to examine the characteristics of normal speech rate of Korean in

    comparison to other languages.

    The basic expectation is that values of variability metrics (bto g,

    in (1) above) of Korean, if categorized as a syllable-based language,will be less than those of English or other stress-based languages,

    presumably due to lack of duration-sensitive prosodic events such

    as stress and pitch accents. On the other hand, %V is expected to

    be greater in Korean than that of stress-based languages

    considering that relatively complicated syllable-internal consonant

    clusters (e.g., three consecutive consonants at the onset position or

    four at the coda position in English syllables) are not phoneticallypermissible in Korean. Finally, the speech rate, measured by the

    number of syllables per unit time, of Korean utterances are

    expected to be closer to that of other syllable-based languages such

    as Spanish or French than stress-based languages, assuming that

    syllables of syllable-based languages in general comprise a smaller

    number of segments than those of stress-based languages. These hypotheses

    will be tested in the experiment.

    3. Data and methods

    Speech data tokens are extracted from a large database named A

    read speech corpus of Seoul Korean (2003) developed and released

    by The National Institute of the Korean Language.1) After

    1) The corpus is publically available at http://www.korean.go.kr.

  • 7/27/2019 Rythmic Metrics of Korean

    6/18

    174 46

    sentence-unit tokenization and other necessary preprocessing,phonetic annotation is performed to generate detailed phone-level

    boundary information. Given the size of data, automatic

    segmentation and labeling techniques are employed to alleviate an

    excessively large amount of time and efforts for manual annotation.

    3.1. Data

    Assuming no rhythm variation by speakers' age, I pick 40

    speakers whose age varies from 20's to 40's: 10 males and 10

    females in their 20's, 10 males in their 30's and 10 females in their

    40's. The recording prompt of the corpus, according to the corpus

    documentation, is composed of passages from 19 Korean fairy tales

    and short novels. The number of sentences for each speaker to read

    is 930 and there is a total of 36,410 speech tokens used in the

    current analysis.2)

    Speech data files are preserved in the form of digital waveform files with

    the 16-bit quantization and 16 KHz sampling rate.

    3.2. Automatic segmentation

    As the rhythm metrics calculation is based on temporal

    information of each phone in the speech tokens, phone-level

    segmentation and annotation is the most important procedure prior

    to metrics calculation. As is mostly the case, this is achieved by

    constructing a phone-level automatic speech recognizer, whose

    2) The total number should be 37,200 (930 sentences x 40 speakers), but 790tokens are excluded: erroneous files caused by defective recording, files with too

    short sentences (e.g., a-ni-o) from which the extraction of utterance-level

    rhythm structure is considered to be inappropriate.

  • 7/27/2019 Rythmic Metrics of Korean

    7/18

    Rhythm Metrics of Spoken Korean 175

    implementation is summarised in Table 1.

    Training data KAIST data (Park et al. 1995) with 10863

    read speech tokens by 89 Korean speakers

    Units 34 phoneme-like units including silence

    Modeling 3-state continuous left-to-right Hidden

    Markov Models

    Features

    39 dimensional feature vectors:12 MFCC + 1 energy + 13 deltas + 13

    delta-deltas

    System enhancement

    unterance internal pause/silence modeling

    dictionary expansion through pronunciation

    variation modelling

    Automatic phone recognizer specification

    Automatic segmentation is conducted by the tool named Hidden

    Markov Model Tool Kit(HTK) version 3.2 (Young et al. 2003).Figure 1 is an example of autolabels compared with the

    corresponding labels produced by hand.

    Comparison between autolable (upper tier) and handlabel (lower

    tier): labels of a single utterance "tuk wi-lo ol-la seoss-ta". This autolable file

    has been randomly selected from the whole set of autolabels and the

    corresponding handlabling is performed by myself without referring to the

    ready-performed autolabels.

    As illustrated, the two labeling methods are quite correlated. It

    has been verified that over 94% of autolabels are less-than-10-msec apart

  • 7/27/2019 Rythmic Metrics of Korean

    8/18

    176 46

    from their corresponding handlabels. It does not necessarily mean thatautolabeler is responsible for the 6%-mismatch. In fact, errors caused by

    hand tend to be more frequent and unpredictable that it is premature to

    determine which method is more reliable, apart from the time required. As a

    consequence, it is assumed, in the current analysis, that autolabeling does not

    significantly undermine the quality of measurement.

    3.3. Calculation of metrics

    Once phonetic labels are provided, all the rhythm metrics are

    calculated, again automatically, to generate an overall mean value

    for each metric by averaging a set of values obtained from each

    utterance token. After metrics are calculated for each token, an

    overall value for each of eight metrics is generated by averaging the

    corresponding metric values.

    A program is created to enable this task to be performed fully

    automatically, as shown in Figure 2.

    Metrics calculation procedure

    3.4. Cross-linguistic comparison

    Interpretation of metrics is conducted by comparing their values with

  • 7/27/2019 Rythmic Metrics of Korean

    9/18

    Rhythm Metrics of Spoken Korean 177

    corresponding metric values of English, Dutch, Spanish and French that arereported in White and Mattys (2007). Although there are other studies

    measuring rhythm metrics of various languages, it is not appropriate to

    employ their results for comparison since the methods used are not uniform

    as they occasionally modify the formula of rhythm metrics for the purpose of

    language specific adaptation.

    However, validity of the present comparison is still restricted

    mainly because the data used in the current experiment and thedata of the other languages in White and Mattys (2007) are not

    homogeneous. No more than 30 tokens are used for each of four

    languages in their experimentation while a much greater number of

    data tokens are currently being analyzed for Korean. Inevitably, the

    standard deviation of each metric value for Korean will be a lot

    greater than that for other languages. Certainly, it would have been

    a more parallel comparison if the metric values for other languagesare calculated from a larger database, but this difference does not

    make it inappropriate to compare those results as the algorithm for

    each rhythm metric has been consistent in both studies.

    Nevertheless, due to the difference in data size, the cross-linguistic

    comparison is to be performed without statistical significance tests.

    4. Experimental results

    Table 2 is the metric values of Korean and four other languages.

    Each value is the mean (and standard deviation) of values averaged

    over all tokens.

  • 7/27/2019 Rythmic Metrics of Korean

    10/18

    178 46

    Syllable-based ? Stress-based

    Spanish French Korean English Dutch

    %V 48 (0.8) 45 (0.5) 54 (6.9) 38 (0.5) 41 (1.2)

    V 32 (1.9) 44 (2.2) 64 (22.5) 49 (2.2) 49 (2.6)

    C 40 (2.3) 51 (3.6) 49 (18.3) 59 (2.4) 49 (4.1)

    VarcoV 41 (2.0) 50 (0.9) 64 (14.0) 64 (1.7) 65 (1.5)

    VarcoC 46 (2.0) 44 (0.8) 59 (14.0) 47 (1.0) 44 (1.8)

    nPVI-V 36 (1.6) 50 (1.8) 61 (13.2) 73 (1.2) 82 (2.4)

    rPVI-C 43 (2.1) 56 (4.3) 55 (18.6) 70 (2.8) 52 (4.2)

    Speech Rate

    (syl/sec)8.0 (0.3) 5.6 (0.3) 6.4 (0.9) 5.2 (0.2) 6.0 (0.3)

    Mean value (and their standard deviation) of rhythm metrics of Korean ascompared to Spanish, French and English. Values of languages other than Korean

    are based on White and Mattys (2007).

    Results show that the metrics %V and nPVI-V seem to verify the

    hypothesis that the Korean speech rhythm is syllable-based.

    Especially, the %V value of Korean is greater than that of all the

    other languages considered. It indicates that Korean utterances are

    composed of relatively a larger portion of vocalic intervals than

    English utterances, conforming to the notion that a language closer

    to the ideal syllable-based language enforces its syllables to contain

    relatively less complex consonantal clusters than languages with the

    ideal stress-based rhythm.

    Unexpectedly, VarcoV of Korean is closer to stress-based

    languages and V, C and rPVI-C do not seem to give useful

    information on classification, either. On the other hand, the speech

    rate of Korean seems, at a glance, to be meaningful as it is greaterthan English and smaller than Spanish. However, further

    investigations are necessary to confirm its crucial role in

  • 7/27/2019 Rythmic Metrics of Korean

    11/18

    Rhythm Metrics of Spoken Korean 179

    characterizing speech rhythm, as other various factors includingspeaker style, text type, recording environment are believed to

    affect the speech rate more or less. Thus, it is premature to regard

    speech rate as critical cue to distinction between syllable-based and

    stress-based rhythm.

    All in all, on the assumption that Korean has more a

    syllable-based rhythm than stress-based rhythm, the two metrics

    %V and nPVI-V are helpful for describing its rhythm structure.Consequently, the rhythm structures of Korean and the other four

    languages are represented in Figure 3 in terms of those two rhythm

    measures.

    Rhythm structure of five languages (Korean, Spanish, French, Dutch and

    English) based on rhythm metrics with respect to vocalic intervals: each of the

    coordinate values represents (nPVI-V: %V), respectively.

    At a glance, based on the imaginary separation line, Korean

    appears to be classified as a syllable-based language together with

  • 7/27/2019 Rythmic Metrics of Korean

    12/18

    180 46

    Spanish and French. This observation, however, can be somewhatillusory when we consider each of the two metrics independent of

    the other. While %V locate Korean uppermost resulting in farther

    distance from English and Dutch, the other metric nPVI-V put

    Korean somewhere in the middle on a continuum between two

    extreme types of rhythm. This implies that those two measures are

    not sufficient to define the rhythm class of Korean, although it is

    more likely that spoken Korean bears a fairly different rhythmstructure from conventionally classified syllable-base languages.

    Before concluding, on the basis of these two metrics, that the

    rhythm dichotomy is misleading or that characterization of rhythm

    varies depending on individual rhythm metrics, it is necessary to

    check if those metrics are seriously affected by another factor. The

    most suspicious factor is speech rateas it has been argued by other

    studies that such metrics as V and C are susceptible to the rateof speech (Barry et al. 2003, Dellwo and Wagner 2003). If values

    %V and nPVI-V are also found to change significantly in accordance

    with variation of speech rate, they should not be considered to play

    an important role of characterizing rhythm as they might be

    vulnerable to other factors.

    In order to perform this verification, 100 slow speech tokens (4-5

    syl/sec) and 100 fast (6-7 syl/sec) speech tokens are randomlypicked and the metrics for each token are calculated. Then, the

    significance of difference between metrics from the data with two different

    speech rates is assessed through the two-tailed t-tests. The result is shown

    in Table 3.

  • 7/27/2019 Rythmic Metrics of Korean

    13/18

    Rhythm Metrics of Spoken Korean 181

    Slow

    (4-5 syl/sec)

    Fast

    (6-7 syl/sec)p value

    No. of Tokens 100 100

    V 55.1 54.7 p

  • 7/27/2019 Rythmic Metrics of Korean

    14/18

    182 46

    of prosody processing systems and pronunciation education tools.Based on the metrics utilized in the current experiment, Korean

    seems to be categorized as a language with the syllable-based

    rhythm structure, but not as distinct from stress-based language as

    Spanish or French. More comprehensive comparisons with metrics

    extracted from other languages are expected to give further clues to

    discover whether it makes better sense to describe speech rhythm in relative

    terms on a continuum between the two stereotypes of speech rhythm, i.e.,extremely syllable-timedand extremely stress-timed, instead of defining on

    the traditional stronger version of binary classification. Still another

    possibility, not yet obvious, is that even weak version of dichotomy is not

    appropriate as individual metrics can categorize each language in quite

    different manners. To come closer to the clarification, more languages need to

    be tested in terms of other rhythm measures that have been employed in

    previous research, or that are still to be discovered in the future.

  • 7/27/2019 Rythmic Metrics of Korean

    15/18

    Rhythm Metrics of Spoken Korean 183

    References

    Abercrombie, D. (1967) Elements of General Phonetics. Edinburgh:

    Edinburgh University Press.

    Asu, E. L. and F. Nolan (2005) Estonian rhythm and the Pairwise

    Variability Index. In Proceedings, Fonetik 2005 29-32.

    Gteborg University, Sweden.

    Barry, W. J., B. Andreeva, M. Russo, S. Dimitrova and T.Kostadinova (2003) Do rhythm measures tell us anything

    about language type? In Proceedings of the 15th

    International Congress of Phonetics Sciences 2693-2696.

    Barcelona.

    Cho, Moon-Hwan. (2004) Rhythm typology of Korean speech.

    Cognitive Process5: 249-253..

    Dauer, R. M. (1983) Stress-timing and syllable-timing reanalyzed.Journal of Phonetics11: 51-62.

    Dellwo, V. (2006) Rhythm and speech rate: A variation coefficient

    for delta C. In P. Karnowski and I. Szigeti (eds.), Language

    and Language Processing: Proceedings of the 38th

    Linguistic Colloquium231-241, Piliscsaba 2003. Frankfurt:

    Peter Lang.

    Dellwo, V. (2008) The role of speech rate in perceiving speech

    rhythm. In Speech Prosody 2008, Proceedings of Speech

    Prosody 2008 series. Campinas, 375-378.

    Dellwo, V. and Wagner, P. (2003). Relations between language

    rhythm and speech rate. In Proceedings of the 15th

    International Congress of Phonetics Sciences 471-474.

    Barcelona.

    Grabe, E. and E. L. Low (2002) Durational variability in speech

    and the rhythm class hypothesis. Papers in Laboratory

  • 7/27/2019 Rythmic Metrics of Korean

    16/18

    184 46

    Phonology7: 515-546. Berlin: Mouton.

    Laver, J. (1994) Principles of Phonetics. Cambridge: Cambridge

    University Press.

    Lee, Hyun-Bok. (1982) A phonetic study on Korean rhythm.

    Malsori [Speech Sounds] 4: 31-48. The Korean Society of

    Phonetic Sciences and Speech Technology (in Korean).

    Lin, Hua and Qian Wang (2007) Mandarin rhythm: An acoustic

    study. Journal of Chinese Linguistics and Computing17(3): 127-140.

    Murty, L., T. Otake and Anne Cutler (2007) Perceptual tests of

    rhythmic similarity: I. mora rhythm. Language and Speech

    50(1): 77-99.

    Nazzi, T., J. Bertoncini and J. Mehler (1998) Language

    discrimination by newborns: toward an understanding of the

    role of rhythm.Journal of Experimental Psychology24(3):756-766.

    Park, J., O. Kwon, D. Kim, I. Choi, H. Jeong and C. Un (1995)

    Speech data collection for Korean speech recognition. The

    Journal of the Acoustic Society of Korea14(4): 7481. (in

    Korean).

    Pike, K. (1945) The Intonation of American English. Ann Arbor:

    University of Michigan Press.G. Pointon. (1980) Is Spanish really syllable-timed? Journal of

    Phonetics8: 293-305.

    Ramus, F., M. Nespor and J. Mehler (1999) Correlates of linguistic

    rhythm in the speech signal. Cognition73: 265-292.

    Roach, P. (1982) On the distinction between stress-timed and

    syllable-timed languages. In D. Crystal (ed.), Linguistic

    Controversies73-79. London: Edward Arnold.

    Young, S., G. Evermann, D. Kershaw, G. Moore, J. Odell, D.

  • 7/27/2019 Rythmic Metrics of Korean

    17/18

    Rhythm Metrics of Spoken Korean 185

    Ollason, D. Povey, V. Valtchev and P. Woodland (2003)HTK Book (for version 3.2). Cambridge University Press.

    White, L. and S. L. Mattys (2007) Calibrating rhythm: first

    language and second language studies. Journal of Phonetics

    35: 501-522.

    Department of English Linguistics

    Hankuk University of Foreign Studies

    270 Imun-dong, Dongdaemun-gu, Seoul 130-791, Korea

    e-mail: [email protected]

    Received: Aug. 31, 2009

    Revised: Oct. 15, 2009

    Accepted: Oct. 18, 2009

  • 7/27/2019 Rythmic Metrics of Korean

    18/18