Top Banner
1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim [email protected]
13

1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim [email protected].

Dec 29, 2015

Download

Documents

Alisha Harrell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

1

W3C Workshop on Internationalizing SSML

SSML Extension for Korean

Workshop : 2005/11/02 (Wed)

Sang-Jin [email protected]

Page 2: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

2

Contents Characteristic of Korean SSML Extension for Chinese Characters in

Korean SSML Extension for Homograph Words in

Korean Conclusion

Page 3: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

3

Characteristic of Korean Hangul, The Korean Character

Consists of forty letters 21 vowels (including 13 diphthongs), and 19 consonants

Syllable V, CV, VC, and CVC (C : consonant, V : vowel)

Eojeol, the word phrase is different from a phrase in English Completely different from Japanese except for the grammatical stru

cture Completely different from Chinese although Korean has borrowed

many Chinese words and some Chinese characters

Page 4: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

4

Characteristic of Korean Vowels in Hangul, The Korean Character

Monothong vowels classified according to tongue position and height

Page 5: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

5

Characteristic of Korean Consonants in Hangul, The Korean Character

Consonants classified according to place and manner of articulation

Page 6: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

6SSML Extension for Chinese Characters in Korean Chinese Characters in Korean

Present Korean and Japanese use many Chinese Characters But, pronunciation of the characters is different

Same characters is represented differently according to the country

These simplified characters are not used in Korea

Page 7: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

7SSML Extension for Chinese Characters in Korean Chinese Characters in Korean

We can write text only with Korean characters Not unusual to use Chinese characters as well

The pronunciation of the are exactly same

Page 8: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

8SSML Extension for Chinese Characters in Korean Chinese Characters in Korean TTS

The input text for text-to-speech(TTS) system has to be converted into a phonetic list

If Chinese characters are mixed with Korean characters, they have to be substituted to Korean

We don’t use all Chinese characters, rather there is a frequently-used-Chinese-character-list recommended by our Korean government and its size is 2000

We need to utilize this list and their pronunciations in the Korean TTS system, since the pronunciations of them are different from Chinese and Japanese

Page 9: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

9SSML Extension for Chinese Characters in Korean SSML Extension for Chinese Characters in

Korean Same characters but different pronunciation in

Chinese Characters according to the country

<lexicon xml:lang=”ko” uri=”http://www.multilingual.org/lexicon.file”><lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_freq_KR.file”><lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_technical.file”>

<lexicon xml:lang=”ja-KR” uri=”http://www.multilingual.org/Chinese_lexicon_JP.file”><lexicon xml:lang=”cn-KR” uri=”http://www.multilingual.org/Chinese_lexicon_CN.file”>

Page 10: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

10SSML Extension for

Homograph Words in Korean Homograph Words in Korean

Same word, different pronunciation, different meaning

The difference is “duration”

Page 11: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

11SSML Extension for

Homograph Words in Korean SSML Extension for Homograph Words in

Korean Only the difference for these words is the duration

in pronunciation necessary to give the duration information to a TTS

system for these kinds of words SSML recommendation supports “say-as” element

and “sub” element, these elements cannot handle the above problem successfully

Page 12: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

12SSML Extension for

Homograph Words in Korean SSML Extension for Homograph Words in

Korean We suggest “tone” tag for this problem Attribute values for tone element are ‘long’, ‘short’

and ‘default’ would be enough for Korean.

Page 13: 1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim sangjin@icu.ac.kr.

13

Conclusion SSML Extension for Chinese Characters in Korean

lexicon element doesn’t support “xml:lang” tag We suggest xml:lang=“ko”, xml:lang=“ko-CN”, xml:lang=“ja-KR”, xml:

lang=“cn-KR” tags

SSML Extension for Homograph Words in Korean “say-as” and “sub” elements cannot handle homograph problem su

ccessfully We suggest “tone” element Attribute values, type=“long”, type=“short”, and type=“default” woul

d be enough for Korean