Top Banner
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of L inguistics, Academia Sinica
19

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Jan 27, 2016

Download

Documents

aaralyn

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody. Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica. Applying the Fujisaki model to M andarin. 1. Phonetics Lab, Academia Sinica, Taiwan ( http://phslab.ling.sinica.edu.tw/ ) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Zhao-yu SuPhonetics Lab, Institute of Linguistics,

Academia Sinica

Page 2: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Applying the Fujisaki model to Mandarin

– 1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/) PI: Prof. Chiu-yu Tseng

• Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003)

– 2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/) PI: Pro. Keikichi Hirose

• Mandarin--manual extraction of Fujisaki parameters• Japanese—automatic extraction of Fujisaki parameter

– 3. DSP and Speech Technology Lab , CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/) PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William

• Mandarin—manual extraction of Fujisaki parameters

Page 3: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Outline

• Introduction--the Fujisaki model• Auto-extraction comparison– methods used a

t two labs to generate the Fujisaki parameters1. Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 200

4, 2005, 2006)2. Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Nar

usawa 2002, 2003)

• Manual extraction—Method used at CUHK to extract Fujisaki parameters

– DSP and Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)

Page 4: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

log (F0)=base frequency+ phrase components +accent components

The Fujisaki Model (Fujisaki & Hirose 1984)

=

phrase components accent components superposed model

+

Page 5: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Auto-extraction based on Mixdorff’s method (2000, 2003)

High-frequency contour (HFC)Low-frequency contour (LFC)

Original F0 contour

highpass filter(stop frequency at 0.5 Hz)

Page 6: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Decision of phrase commandsLow-frequency contour (LFC) from Mixdorff’s method

Position of local minimum optimization

Perceptual phrase boundary

The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan

T

tFMSEF t

2

00

))(ln(evaluation :

Page 7: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Phonetics Lab, Academia Sinica-- Auto-extraction results of Mandarin ( Mixdorf

f 2003)

Page 8: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Hirose Lab— Auto extraction (Narusawa 2002, 2003)

Residual contour--target of phrase components

Original f0 contour

Derivative--

target of phrase components

Page 9: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Decision of phrase commands

The optimum I can be selected when c(I) is maximum.

Dynamic Programming (DP)Residual contour

Page 10: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Hirose Lab— Compensation from text analysis t

o aid auto-extraction

Using parsed text to adjust

extracted Fujisaki parameter

Page 11: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Hirose Lab— Auto-extraction of Japanese (Narusawa 200

2, 2003)• Original method

– An accent component should be located on a phrase component.

• New method

– Pause is considered.– Correction after using information from parsed text.

Page 12: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Auto-extraction of phrase components—Comparison of 2 labs

• Phrase components– Phonetics Lab, IL, AS (modified Mixdorff 2003):

Pre-extraction of phrase components--relatively close.

– Hirose Lab:

Pre-extraction-- not as close, but the final output can be compensated by text analysis.

1. Auto-extract acoustic signal f0 contour

2. Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)

Page 13: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Manual adjustment--Gu, CUHK

• Note: 1. Insertion of phrase components is subjective.

2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)

Page 14: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Manual adjustment--Gu, CUHK

Page 15: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Possible Future Considerations (1/2)

• 1. Distinguishing acoustic feature is only pause? duration? Or f0?

• 2. Or combination of acoustic features—pause, duration, and/or f0?– E.g. Test if duration can compensate F0 reset

Page 16: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Possible Future Considerations (2/2) Improving

auto-extraction of tone components

• 3. The concept of tone nucleus– By retaining only the nucleus of syllable while ign

oring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment)

– By ignoring horizontal f0 variation (from Gu’s manual adjustment)

Page 17: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

One major ambiguity among 3 labs—phrase component unit selection

1. Phonetics Lab, Academia Sinica, Taiwan –Mandarin prosodic phrase (intonation and phrase)

2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu)

3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected:

PPh—adjusted from visual display

PW—adjusted from perceptual decision

Page 18: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Why Prosodic Unit Selection can be a problem unique to Mandarin?

Japanese: Bunsetsu--compound word consisting of two or more content words

Mandarin:1. Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to

maintain the tendency of one application of phrase component function.

2. HKCU--Manual adjustment can be accurate but not systematic enough. e.g. A phrase component sometimes corresponds to a prosodic phrase,

sometimes shorter.

Page 19: On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Concluding Remarks

• 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming.

• 2. What possible improvement can auto-extraction borrow from manual adjustment?– Focusing on nucleus (syllable)– Understanding more of acoustic properties (F0, duration…)

• 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. – Linguistic information—parsing (text analysis and syntax), semantics

and pragmatics– Cognitive information---speech planning and processing