Top Banner
SSML Extensions SSML Extensions for for Chinese Voice Browsing Chinese Voice Browsing Helen MENG, Wai-Kit LO, Tien-Ying FUNG, Yuk-Chi LI and Zhiyong WU Human-Computer Communications Laboratory Human-Computer Communications Laboratory Department of Systems Engineering and Engineering Management Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong The Chinese University of Hong Kong 2nd November, 2005 2nd November, 2005
25

SSML Extensions for Chinese Voice Browsing

Jan 23, 2016

Download

Documents

Helmut Hummler

SSML Extensions for Chinese Voice Browsing. Helen MENG, Wai-Kit LO, Tien-Ying FUNG, Yuk-Chi LI and Zhiyong WU Human-Computer Communications Laboratory Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2nd November, 2005. Outline. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SSML Extensions  for  Chinese Voice Browsing

SSML Extensions SSML Extensions for for

Chinese Voice BrowsingChinese Voice Browsing

Helen MENG, Wai-Kit LO, Tien-Ying FUNG, Yuk-Chi LI and Zhiyong WU

Human-Computer Communications LaboratoryHuman-Computer Communications LaboratoryDepartment of Systems Engineering and Engineering ManagementDepartment of Systems Engineering and Engineering Management

The Chinese University of Hong Kong The Chinese University of Hong Kong

2nd November, 20052nd November, 2005

Page 2: SSML Extensions  for  Chinese Voice Browsing

22

OutlineOutline

• Characteristics of ChineseCharacteristics of Chinese

• Proposed attributes for existing elementsProposed attributes for existing elements– ““dialect-accent”dialect-accent”

• Proposed elementsProposed elements– <phrase><phrase> and and <word><word>– <tone><tone>

• Proposed attribute valuesProposed attribute values– for for “interpret-as”“interpret-as” attribute in attribute in <say-as><say-as> element element

• SummarySummary

Page 3: SSML Extensions  for  Chinese Voice Browsing

33

Characteristics of ChineseCharacteristics of Chinese

• Rich in dialects, Rich in dialects, e.g., Cantonese, Shanghaiese, Mandarine.g., Cantonese, Shanghaiese, Mandarin

– Write alike, speak differently Write alike, speak differently • similar writing system; e.g., similar writing system; e.g., 中国 中国 and and 中國中國• significantly different pronunciationssignificantly different pronunciations

– MandarinMandarin with different with different accentsaccents

• No explicit phrase and word boundariesNo explicit phrase and word boundaries– e.g., e.g., 我 們 現 在 在 開 電 話 會 議我 們 現 在 在 開 電 話 會 議

(we are) (now) (having) (a teleconference)(we are) (now) (having) (a teleconference)

– proper segmentation is critical for proper segmentation is critical for prosodic controlprosodic control, , pronunciation selectionpronunciation selection for homographs and resolution of for homographs and resolution of semantic ambiguitysemantic ambiguity

• Monosyllabic and tonalMonosyllabic and tonal– Syllable + Lexical Tone Syllable + Lexical Tone lexical meaning of Chinese character lexical meaning of Chinese character

• tone can change according to meaning, context, mode of speakingtone can change according to meaning, context, mode of speaking

Page 4: SSML Extensions  for  Chinese Voice Browsing

44

Phonetic Transcription SchemesPhonetic Transcription Schemes

• Pronunciation of a characterPronunciation of a character

= tonal syllable= tonal syllable

= = syllablesyllable + + tonetone

• Many transcription schemes developed for different Many transcription schemes developed for different

dialectsdialects– syllablesyllable in Roman alphabets in Roman alphabets

– tone tone as a one-digit Arabic numberas a one-digit Arabic number

– Popular schemes arePopular schemes are

• pinyinpinyin (for Mandarin) (for Mandarin) 銀行 銀行 (bank)(bank): /: /yinyin2 2 hanghang2/2/

• jyutpingjyutping (for Cantonese) (for Cantonese) 銀行 銀行 (bank)(bank): /: /nganngan4 4 honghong4/4/

Page 5: SSML Extensions  for  Chinese Voice Browsing

55

Chinese Tone SystemsChinese Tone Systems

Figure 1. Mandarin tone system (4 tones + 1 ‘light’ tone)Figure 1. Mandarin tone system (4 tones + 1 ‘light’ tone)

Figure 2. Cantonese tone system (9 tones, specified in 6 classes)Figure 2. Cantonese tone system (9 tones, specified in 6 classes)

(2). 陽平/yang ping/,

low levele.g., 麻

(3). 上/shang/,rising

e.g., 馬

(4). 去/qu/,

goinge.g., 罵

(1). 陰平/yin ping/,high levele.g., 媽

(2). 陰上 , high rising

e.g., 史

(3). 陰去 , high going

e.g.,試

(1). 陰平 , high levele.g., 詩

(5). 陽上 , low rising

e.g., 市

(6). 陽去 , low going

e.g., 事

(4). 陽平 , low levele.g., 時

7(1). 陰入 ,high entering

e.g., 色

8(3). 中入 ,middle entering

e.g., 舌

9(6). 陽入 ,low entering

e.g., 蝕

level of F0

time

Page 6: SSML Extensions  for  Chinese Voice Browsing

““dialect-accent”dialect-accent”

Beijing Mandarin

Guangdong Mandarin

Hong Kong Cantonese

Page 7: SSML Extensions  for  Chinese Voice Browsing

77

Proposed “dialect-accent” AttributeProposed “dialect-accent” Attribute

• Specify dialects and accents in a languageSpecify dialects and accents in a language– use with use with xml:langxml:lang [XML1.0] [XML1.0]– dialect-accent = primary-subtag[“-”optional-subtag]dialect-accent = primary-subtag[“-”optional-subtag]– primary-subtag = 2ALPHAprimary-subtag = 2ALPHA

• specify specify dialectdialect• e.g., e.g., MDMD for Mandarin, for Mandarin, CTCT for Cantonese for Cantonese

– optional-subtag = 2ALPHAoptional-subtag = 2ALPHA• specify specify accentaccent• e.g., e.g., BJBJ for Beijing, for Beijing, GDGD for Guangdong, for Guangdong, HKHK for Hong Kong for Hong Kong• follows the abbreviations of Chinese provinces, autonomous regions follows the abbreviations of Chinese provinces, autonomous regions

and special administrative regions listed in the EDU.CN Domain Policy and special administrative regions listed in the EDU.CN Domain Policy (( 中國教育和科研計算機網 中國教育和科研計算機網 EDU.CN EDU.CN 網絡域名註冊辦法網絡域名註冊辦法 ))11

– examplesexamples• Mandarin in Beijing and Guangdong accent: Mandarin in Beijing and Guangdong accent: MD-BJMD-BJ, , MD-GD MD-GD • Cantonese in Hong Kong and Guangdong accent: Cantonese in Hong Kong and Guangdong accent: CT-HKCT-HK, , CT-GDCT-GD

1 Defined by the China Education and Research Network Information Centre (CERNET 網絡信息中心)

Page 8: SSML Extensions  for  Chinese Voice Browsing

88

““dialect-accent” Attribute (continue)dialect-accent” Attribute (continue)

<p>Hello, where are you from?</p> <p>Hello, where are you from?</p>

<p xml:lang="zh-CH" dialect-accent="MD-BJ"><p xml:lang="zh-CH" dialect-accent="MD-BJ">

我我 (I am) (I am) 從從 (from) (from) 北京北京 (Beijing) (Beijing) 來的。來的。 </p> </p>

<p xml:lang="zh-CH" dialect-accent="MD-GD"><p xml:lang="zh-CH" dialect-accent="MD-GD">

我我 (I am) (I am) 從 從 (from) (from) 廣東廣東 (Guangdong) (Guangdong) 來的。來的。 </p> </p>

<p xml:lang="zh-CH" dialect-accent="CT-HK"><p xml:lang="zh-CH" dialect-accent="CT-HK">

我我 (I am) (I am) 從 從 (from) (from) 香港香港 (Hong Kong)(Hong Kong) 來的。來的。 </p></p>

xml:lang values Dialect Accent“dialect-accent”

value

CT-HK

CT-GD

MD-HK

MD-BJ

MD-TW

Hong Kong

Guangdong

Hong Kong

Beijing

Taiwan

Cantonese

Mandarin

zh-HK

Mandarin withBeijing accent

Mandarin with Guangdong accent

Cantonese with Hong Kong accent

Page 9: SSML Extensions  for  Chinese Voice Browsing

<phrase> and <word> elements<phrase> and <word> elements

Page 10: SSML Extensions  for  Chinese Voice Browsing

1010

Enrich <p>, <s> with <phrase>, <word>Enrich <p>, <s> with <phrase>, <word>

• Current SSML 1.0: <p> and <s>Current SSML 1.0: <p> and <s>• Proposed elements: Proposed elements: <phrase><phrase> and and <word><word>

– Serve as cues for prosodic control (e.g., pause)Serve as cues for prosodic control (e.g., pause)– Assist correct pronunciation selection for Assist correct pronunciation selection for

homographshomographs• A Cantonese exampleA Cantonese example

– The character The character 行行 has FIVE pronunciationshas FIVE pronunciations

/haang4/ 行行山 (hiking)  /hang6/ 品行行 (discipline)  /hong2/ 洋 行行 (foreign trading

company)

/hong4/ 銀行行 (bank)  /hang4/ 行行人 (pedestrian)

Page 11: SSML Extensions  for  Chinese Voice Browsing

1111

Proposed <phrase> ElementProposed <phrase> Element

• Definition:Definition:– Defines the course of a Chinese phraseDefines the course of a Chinese phrase– No attributesNo attributes– Occurs within <s>Occurs within <s>– These elements can be nested within <phrase>These elements can be nested within <phrase>

• <audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <voice>, <say-as>, <sub>, <voice>, <word><word>

• Example (an ancient poem) Example (an ancient poem) 終年倒運少有餘財– Pessimistic phrasing

• <phrase> 終年倒運 </phrase> <phrase> 少有餘財 </phrase>

– Optimistic phrasing• <phrase> 終年倒運少 </phrase> <phrase> 有餘財 </phrase>

Whole year unlucky Not much money left

Only with a few unlucky events in the year

Have money left

Page 12: SSML Extensions  for  Chinese Voice Browsing

1212

Proposed <word> ElementProposed <word> Element• DefinitionDefinition::

– Defines the course of a Chinese wordDefines the course of a Chinese word– No attributesNo attributes– Occur within <s> and <phrase>Occur within <s> and <phrase>– These elements can be nested within <phrase>These elements can be nested within <phrase>

• <audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <voice>as>, <sub>, <voice>

• Example Example 這一晚會如常舉行這一晚會如常舉行– Segmentation 1

• <word> 這一 </word> <word> 晚會 </word>

<word> 如常 </word> <word> 舉行 </word>

– Segmentation 2• <word> 這一晚 </word> <word>會 </word>

<word> 如常 </word> <word> 舉行 </word>

This banquet

as usual hold

Tonight will

1. This banquet is held as usual

as usual hold

2. Tonight will be held as usual

/wui3/

/wui2/

Page 13: SSML Extensions  for  Chinese Voice Browsing

<tone> element<tone> element

Page 14: SSML Extensions  for  Chinese Voice Browsing

1414

Proposed <tone> ElementProposed <tone> Element

• ToneTone– Important in Chinese pronunciationImportant in Chinese pronunciation– Tones can vary according to differences in meaning, context and Tones can vary according to differences in meaning, context and

mode of speakingmode of speaking– 相相

• in in tone 2tone 2 means means photophoto• in in tone 3tone 3 means means facial appearance / ministerfacial appearance / minister

• Current SSML 1.0: phonemeCurrent SSML 1.0: phoneme– Requires pronunciation transcriptionRequires pronunciation transcription– Example Example

<phoneme alphabet="x-lshk-jyutping" ph="soeng2">相 </phoneme> <phoneme alphabet="x-lshk-jyutping" ph="soeng3">相 </phoneme>

• Proposed <tone> elementProposed <tone> element– with the required “value” attributewith the required “value” attribute

<tone value="2">相 </tone> (photo)<tone value="3">相 </tone> (face appearance)

– inherit the alphabet attribute, or explicitly specifyinherit the alphabet attribute, or explicitly specify

Page 15: SSML Extensions  for  Chinese Voice Browsing

1515

Examples of Using “tone” ElementExamples of Using “tone” Element

• Tone changes on Tone changes on meaningmeaning – 糖 (candy / sugar)

<tone value="2">糖 </tone> (tone 2 /tong2/: means candy)

<tone value="4">糖 </tone> (tone 4 /tong4/: means sugar)

• Tone changes on Tone changes on contextcontext– 爺 (grandfather)

阿 <tone value="4">爺 </tone> (tone 4 /je4/: preceded by 阿 )

爺 <tone value="2">爺 </tone> (tone 2 /je2/: preceded by 爺 )

• Tone changes on Tone changes on mode of speakingmode of speaking: : – 英文 (English)

英 <tone value=“4">文 </tone> (tone 4 /man4/: formal mode)

英 <tone value="2"> 文 </tone> (tone 2 /man2/: colloquial mode)

Page 16: SSML Extensions  for  Chinese Voice Browsing

Values for “interpret-as” in <say-as>Values for “interpret-as” in <say-as>

Page 17: SSML Extensions  for  Chinese Voice Browsing

1717

Proposed Legal Values for “interpret-as” AttributeProposed Legal Values for “interpret-as” Attribute

• VoiceXML2.0 Appendix PVoiceXML2.0 Appendix P– boolean, date, digits, currency, number, phone, time boolean, date, digits, currency, number, phone, time

• SSML 1.0 <say-as> attribute valuesSSML 1.0 <say-as> attribute values(W3C Working Group Note 2005)(W3C Working Group Note 2005)

– date, time, telephone, characters, cardinal, ordinaldate, time, telephone, characters, cardinal, ordinal

• Propose 6 new values:Propose 6 new values:– Chinese-name,Chinese-name,– fraction,fraction,– measure,measure,– net,net,– percentage,percentage,– ratioratio

Page 18: SSML Extensions  for  Chinese Voice Browsing

1818

““Chinese-name” ValueChinese-name” Value

• Specify as name to aid pronunciation selectionSpecify as name to aid pronunciation selection– 單明明單明明 :: 單 /daan1/ /sin6/ (surname)

明 明 /ming4 ming4/ /ming4 ming2/ (given name)

• Format: S*G*Format: S*G*– SS: surname, : surname, GG: given name: given name

– ExamplesExamples

• <say-as interpret-as=“<say-as interpret-as=“Chinese-nameChinese-name” format=“” format=“SGSG”>”> 姚 明姚 明 </say-as></say-as> (Yao Ming) (Yao Ming)

• <say-as interpret-as=“<say-as interpret-as=“Chinese-nameChinese-name” format=“” format=“SGGSGG”>”> 單 明 明單 明 明 </say-</say-as>as> (Sin Ming Ming)(Sin Ming Ming)

• <say-as interpret-as=“<say-as interpret-as=“Chinese-nameChinese-name” format=“” format=“SSGSSG”>”> 歐 陽 修歐 陽 修 </say-</say-as>as> (Au-yeung Sau)(Au-yeung Sau)

Page 19: SSML Extensions  for  Chinese Voice Browsing

1919

““fraction” Valuefraction” Value

• Specify as fractionSpecify as fraction– e.g. 3/4e.g. 3/4

• Verbalization of fraction in Chinese:Verbalization of fraction in Chinese:– with an additional word: with an additional word: 分之分之 ((out of)out of)– A / B (A / B (AA out ofout of BB): ): BB 分之分之 AA [note that the order is reversed!][note that the order is reversed!]

– e.g. e.g. 33//44 is verbalized as is verbalized as 四 (four) 分之分之 (out of)(out of) 三 (three)

• ““format” and “detail” attributes not requiredformat” and “detail” attributes not required• ExampleExample

我吃了 3/4 個橙(I) (ate) (orange)

我吃了 <say-as interpret-as="fraction">3/4</say-as> 個橙我吃了四分之三個橙 (I ate three-fourth of the orange)

Page 20: SSML Extensions  for  Chinese Voice Browsing

2020

““measure” Valuemeasure” Value

• Specify as measurementSpecify as measurement– e.g. 10cm, 30ml e.g. 10cm, 30ml

• measurement = measurement = numbernumber + + unitunit– numbernumber [VoiceXML2.0]; e.g. 10 is ten (not one zero) [VoiceXML2.0]; e.g. 10 is ten (not one zero)– unitunit: translated and pronounced in Chinese, : translated and pronounced in Chinese,

e.g. cm is e.g. cm is 厘米厘米 , g is , g is 克克 , oz is , oz is 安士安士 , yd is , yd is 碼碼 • ““format” and “detail” attributes not required format” and “detail” attributes not required • Example Example

他的身高是 180cm

他 的 身 高 是 <say-as interpret-as="measure">180cm</say-as>

他的身高是一百八十厘米 (his height is 180cm)

(his) (height) (is)

Page 21: SSML Extensions  for  Chinese Voice Browsing

2121

““net” Valuenet” Value

• Specify as Specify as URIURI or or emailemail address address• Possible ways to verbalize a URI:Possible ways to verbalize a URI:

1.1. Read the whole string in Read the whole string in EnglishEnglish, including punctuations, including punctuations2.2. Omit http:// (ftp://, etc.), read the rest in Omit http:// (ftp://, etc.), read the rest in EnglishEnglish3.3. Read alphabets in Read alphabets in EnglishEnglish, punctuations in , punctuations in ChineseChinese

• ““format” attribute value: “email” or “uri”format” attribute value: “email” or “uri”• ExampleExample

詳情請瀏覽 http://www.w3.org

(for details) (please) (browse)– Possible verbalizations:

1. H T T P colon slash slash W W W dot W three dot O R G2. W W W dot W three dot O R G3. W W W 點 W 三 點 O R G (點 : dot 三 : three)

[Similarly the protocol part may be kept as another option]

詳情請瀏覽 <say-as interpret-as="net" format="uri"> http://www.w3.org </say-as>

Page 22: SSML Extensions  for  Chinese Voice Browsing

2222

““percentage” Valuepercentage” Value

• Specify as percentageSpecify as percentage• Verbalization of percentage in ChineseVerbalization of percentage in Chinese

– with an additional word: with an additional word: 百分之百分之 (out of a hundred)(out of a hundred)– A%: A%: 百分之百分之 A A – e.g. e.g. 7070%% is verbalized as is verbalized as 百 分 之百 分 之 (out of a hundred)(out of a hundred) 七 十七 十

(seventy)(seventy)

• ““format” and “detail” attributes not requiredformat” and “detail” attributes not required• ExampleExample

海洋約佔全球總面積的 70%

海洋約佔全球總面積的 <say-as interpret-as="percentage">70%</say-as>

海洋約佔全球總面積的百分之七十 (ocean covers 70% of global surface)

(ocean) (covers) (global) (surface)

Page 23: SSML Extensions  for  Chinese Voice Browsing

2323

““ratio” Valueratio” Value

• Specify as ratioSpecify as ratio– e.g. 1:3 e.g. 1:3

• Verbalization of ratio in Chinese:Verbalization of ratio in Chinese:– with an additional word: with an additional word: 比比 (to)(to) – A:B (A:B (A A toto BB): ): A A 比比 BB– e.g. 1:99 is verbalized as e.g. 1:99 is verbalized as 一一 (one)(one)比比 (to)(to) 九十九九十九 (ninety nine)(ninety nine)

• ““format” and “detail” attributes not requiredformat” and “detail” attributes not required• ExampleExample

用 1:99 的稀釋漂白水

用 <say-as interpret-as="ratio">1:99</say-as> 的稀釋漂白水用一比九十九的稀釋漂白水 (use diluted bleach at a ratio of 1:99)

(use) (diluted) (bleach water)

Page 24: SSML Extensions  for  Chinese Voice Browsing

2424

SummarySummary

• ““dialect-accent”dialect-accent” attribute to enrich the attribute to enrich the xml:lang attributexml:lang attribute

• <phrase><phrase> and and <word><word> for text processing for text processing• <tone><tone> for pronunciation for pronunciation• 6 values for “interpret-as” attribute6 values for “interpret-as” attribute

– Chinese-nameChinese-name– fractionfraction– measuremeasure– netnet– percentagepercentage– ratioratio

Page 25: SSML Extensions  for  Chinese Voice Browsing

Thank YouThank You