On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Zhao-yu SuPhonetics Lab, Institute of Linguistics,

Academia Sinica

Applying the Fujisaki model to Mandarin

– 1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/) PI: Prof. Chiu-yu Tseng

• Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003)

– 2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/) PI: Pro. Keikichi Hirose

• Mandarin--manual extraction of Fujisaki parameters• Japanese—automatic extraction of Fujisaki parameter

– 3. DSP and Speech Technology Lab , CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/) PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William

• Mandarin—manual extraction of Fujisaki parameters

http://phslab.ling.sinica.edu.tw/

http://www.gavo.t.u-tokyo.ac.jp/

http://dsp.ee.cuhk.edu.hk/

Outline

• Introduction--the Fujisaki model• Auto-extraction comparison– methods used a

t two labs to generate the Fujisaki parameters1. Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 200

4, 2005, 2006)2. Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Nar

usawa 2002, 2003)

• Manual extraction—Method used at CUHK to extract Fujisaki parameters

– DSP and Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)

log (F0)=base frequency+ phrase components +accent components

The Fujisaki Model (Fujisaki & Hirose 1984)

=

phrase components accent components superposed model

+

Auto-extraction based on Mixdorff’s method (2000, 2003)

High-frequency contour (HFC)Low-frequency contour (LFC)

Original F0 contour

highpass filter(stop frequency at 0.5 Hz)

Decision of phrase commandsLow-frequency contour (LFC) from Mixdorff’s method

Position of local minimum optimization

Perceptual phrase boundary

The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan

T

tFMSEF t

2

00

))(ln(evaluation :

Phonetics Lab, Academia Sinica-- Auto-extraction results of Mandarin ( Mixdorf

f 2003)

Hirose Lab— Auto extraction (Narusawa 2002, 2003)

Residual contour--target of phrase components

Original f0 contour

Derivative--

target of phrase components

Decision of phrase commands

The optimum I can be selected when c(I) is maximum.

Dynamic Programming (DP)Residual contour

Hirose Lab— Compensation from text analysis t

o aid auto-extraction

Using parsed text to adjust

extracted Fujisaki parameter

Hirose Lab— Auto-extraction of Japanese (Narusawa 200

2, 2003)• Original method

– An accent component should be located on a phrase component.

• New method

– Pause is considered.– Correction after using information from parsed text.

Auto-extraction of phrase components—Comparison of 2 labs

• Phrase components– Phonetics Lab, IL, AS (modified Mixdorff 2003):

Pre-extraction of phrase components--relatively close.

– Hirose Lab:

Pre-extraction-- not as close, but the final output can be compensated by text analysis.

1. Auto-extract acoustic signal f0 contour

2. Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)

Manual adjustment--Gu, CUHK

• Note: 1. Insertion of phrase components is subjective.

2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)

Manual adjustment--Gu, CUHK

Possible Future Considerations (1/2)

• 1. Distinguishing acoustic feature is only pause? duration? Or f0?

• 2. Or combination of acoustic features—pause, duration, and/or f0?– E.g. Test if duration can compensate F0 reset

Possible Future Considerations (2/2) Improving

auto-extraction of tone components

• 3. The concept of tone nucleus– By retaining only the nucleus of syllable while ign

oring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment)

– By ignoring horizontal f0 variation (from Gu’s manual adjustment)

One major ambiguity among 3 labs—phrase component unit selection

1. Phonetics Lab, Academia Sinica, Taiwan –Mandarin prosodic phrase (intonation and phrase)

2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu)

3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected:

PPh—adjusted from visual display

PW—adjusted from perceptual decision

Why Prosodic Unit Selection can be a problem unique to Mandarin?

Japanese: Bunsetsu--compound word consisting of two or more content words

Mandarin:1. Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to

maintain the tendency of one application of phrase component function.

2. HKCU--Manual adjustment can be accurate but not systematic enough. e.g. A phrase component sometimes corresponds to a prosodic phrase,

sometimes shorter.

Concluding Remarks

• 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming.

• 2. What possible improvement can auto-extraction borrow from manual adjustment?– Focusing on nucleus (syllable)– Understanding more of acoustic properties (F0, duration…)

• 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. – Linguistic information—parsing (text analysis and syntax), semantics

and pragmatics– Cognitive information---speech planning and processing

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Documents

fujisaki model fujisaki

model autoextraction

phrase componentscomparison

insertion of phrase

fujisaki parametersdsp

speech technology lab

acoustic signal f0 contour

japanese hirose