1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim [email protected]
Dec 29, 2015
1
W3C Workshop on Internationalizing SSML
SSML Extension for Korean
Workshop : 2005/11/02 (Wed)
Sang-Jin [email protected]
2
Contents Characteristic of Korean SSML Extension for Chinese Characters in
Korean SSML Extension for Homograph Words in
Korean Conclusion
3
Characteristic of Korean Hangul, The Korean Character
Consists of forty letters 21 vowels (including 13 diphthongs), and 19 consonants
Syllable V, CV, VC, and CVC (C : consonant, V : vowel)
Eojeol, the word phrase is different from a phrase in English Completely different from Japanese except for the grammatical stru
cture Completely different from Chinese although Korean has borrowed
many Chinese words and some Chinese characters
4
Characteristic of Korean Vowels in Hangul, The Korean Character
Monothong vowels classified according to tongue position and height
5
Characteristic of Korean Consonants in Hangul, The Korean Character
Consonants classified according to place and manner of articulation
6SSML Extension for Chinese Characters in Korean Chinese Characters in Korean
Present Korean and Japanese use many Chinese Characters But, pronunciation of the characters is different
Same characters is represented differently according to the country
These simplified characters are not used in Korea
7SSML Extension for Chinese Characters in Korean Chinese Characters in Korean
We can write text only with Korean characters Not unusual to use Chinese characters as well
The pronunciation of the are exactly same
8SSML Extension for Chinese Characters in Korean Chinese Characters in Korean TTS
The input text for text-to-speech(TTS) system has to be converted into a phonetic list
If Chinese characters are mixed with Korean characters, they have to be substituted to Korean
We don’t use all Chinese characters, rather there is a frequently-used-Chinese-character-list recommended by our Korean government and its size is 2000
We need to utilize this list and their pronunciations in the Korean TTS system, since the pronunciations of them are different from Chinese and Japanese
9SSML Extension for Chinese Characters in Korean SSML Extension for Chinese Characters in
Korean Same characters but different pronunciation in
Chinese Characters according to the country
<lexicon xml:lang=”ko” uri=”http://www.multilingual.org/lexicon.file”><lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_freq_KR.file”><lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_technical.file”>
<lexicon xml:lang=”ja-KR” uri=”http://www.multilingual.org/Chinese_lexicon_JP.file”><lexicon xml:lang=”cn-KR” uri=”http://www.multilingual.org/Chinese_lexicon_CN.file”>
10SSML Extension for
Homograph Words in Korean Homograph Words in Korean
Same word, different pronunciation, different meaning
The difference is “duration”
11SSML Extension for
Homograph Words in Korean SSML Extension for Homograph Words in
Korean Only the difference for these words is the duration
in pronunciation necessary to give the duration information to a TTS
system for these kinds of words SSML recommendation supports “say-as” element
and “sub” element, these elements cannot handle the above problem successfully
12SSML Extension for
Homograph Words in Korean SSML Extension for Homograph Words in
Korean We suggest “tone” tag for this problem Attribute values for tone element are ‘long’, ‘short’
and ‘default’ would be enough for Korean.
13
Conclusion SSML Extension for Chinese Characters in Korean
lexicon element doesn’t support “xml:lang” tag We suggest xml:lang=“ko”, xml:lang=“ko-CN”, xml:lang=“ja-KR”, xml:
lang=“cn-KR” tags
SSML Extension for Homograph Words in Korean “say-as” and “sub” elements cannot handle homograph problem su
ccessfully We suggest “tone” element Attribute values, type=“long”, type=“short”, and type=“default” woul
d be enough for Korean