8/3/2019 Prosody Modeling
1/31
4/30/2012 1
Representing Intonational Variation
Julia Hirschberg
CS 4706
8/3/2019 Prosody Modeling
2/31
4/30/2012 2
Today
How can we represent meaningful speechvariation so we can assign in TTS?
Expanded vs. compressed pitch range?
Louder vs. softer speech? Faster vs. slower speech?
Differences in intonational prominence?
Differences in intonational phrasing? Differences in pitch contours?
8/3/2019 Prosody Modeling
3/31
4/30/2012 3
Schemes for Representing IntonationalVariation
An early proposal: Joshua Steele
Language Learning Approaches
/ IS it INteresting /
/ dyou feel ANGry? /
/ WHATS the PROBlem? / (McCarthy,
1991:106)
What aspects of speech to capture? Continuous or categorical?
If categorical, what are the possible classes?
http://www2.hawaii.edu/~hunterh/Docs/JoshuaSteel.pdfhttp://www.cels.bham.ac.uk/resources/essays/KumakiDiss.pdfhttp://www.cels.bham.ac.uk/resources/essays/KumakiDiss.pdfhttp://www2.hawaii.edu/~hunterh/Docs/JoshuaSteel.pdf8/3/2019 Prosody Modeling
4/31
4/30/2012 4
Many Models
Auditory: Language teachers: what representations can
learners understand
Acoustic:
Examine the speech signal for critical vs. accidentalvariation
Experimental approaches
Identify potential meaningful variation
Design production or perception studies to test
E.g. what does a contour mean?
8/3/2019 Prosody Modeling
5/31
4/30/2012 5
F0 Models
Basic division into linear and superpositionalmodels
Linear models: sequence of independent
choices from an intonation lexicon Superpositional models: hierarchy of
phonological components (utterance toprosodic phrase)
8/3/2019 Prosody Modeling
6/31
4/30/2012 6
Intonation Models
LinearorTone sequencemodels British school (Kingdon 58, OConnor &
Arnold 73, Cruttenden 97): based on
auditory analysis
American School (Pierrehumbert 80,
ToBI): mainly acoustic analysis
Dutch school (t Hart, Collier and Cohen
1990): perceptual data
Superpositionalmodels (Fujisaki 1983,Mbius et al. 1993): acoustic/physiological
8/3/2019 Prosody Modeling
7/31
4/30/2012 7
Superpositional models
Pitch pattern of intonation modeled with twocomponents: phrase component and accentcomponent.
Phrase has basic shape, and pitchmovements for individual accents aresuperimposed over basic shape:
plus
=
Apples, oranges and tomatoes
8/3/2019 Prosody Modeling
8/31
4/30/2012 8
Lily and Rosa thought this was divine.Prince William was gorgeous
and he was looking for a bride.
They dreamed of wedding bells.
Declination: downtrend in f0 over the courseof an utterance
Best seen as statistical abstraction: if onetakes f0 measurements from enough
utterances, over time, a downtrend in f0 willemerge
Good for modeling utterance-level trends
8/3/2019 Prosody Modeling
9/31
4/30/2012 9
Superpositional models
Advantages
Good at modeling declination in intonation languages
Successful in speech synthesis for languages like
Japanese (little variation in accent type, e.g.) Capture prosodic structure in languages which have
both tone and intonation (e.g. Mandarin)
Disadvantages
Too rigid: All contours must be modeled with anaccent and a phrase component
Many SAE contours cannot be captured easily
8/3/2019 Prosody Modeling
10/31
4/30/2012 10
No account of different accent types, or
variations in phrase endings No notation system which allows users to
share observations from large speech corporaor to compare contours
Used primarily for synthesis
8/3/2019 Prosody Modeling
11/31
4/30/2012 11
Tone sequence models
Claim: Intonation is generated fromsequences of (possibly) categorically differentand phonologically distinctive tones
8/3/2019 Prosody Modeling
12/31
4/30/2012 12
Types of Tone-sequence Models
t arg
et
H
L
t a
rge t
Type 1: based on pitch movements
Type 2: based on pitch levels
The American School
The British School
The Dutch School
8/3/2019 Prosody Modeling
13/31
4/30/2012 13
The British School
Tone sequence model and pitch movement analysis(e.g.fallingvs. risingintonation)
Basic unit of intonational description: intonation phrase(tone unit)
Delimited by pauses, phrase-final lengthening, pitchmovement
Syllables within a tone unit can be stressed or accented
telephone
Accented syllables are stressed and pitch prominent
8/3/2019 Prosody Modeling
14/31
4/30/2012 14
An example
...a
POINTwhere you have to
CLEANit
and I think itsHOrriblerrible
Theres a point where you have to clean it and I think its horrible...
8/3/2019 Prosody Modeling
15/31
4/30/2012 15
Intonation Phrases
Internal structure
Determined by location of accents in an IP
Each accent defines the beginning of a
prosodic constituent
8/3/2019 Prosody Modeling
16/31
4/30/2012 16
Intonation phrase structure
JOHNs never BEEN to Jamaica
Prenuclear accent unit Nuclear accent unit
But
Prehead
Stressed syllable
Head Nucleus
8/3/2019 Prosody Modeling
17/31
4/30/2012 17
Six nuclear choices in English
Jam a ic
falling
aic
rising
Ja maa
a c
rising-falling
iJa m a
falling-rising
Jam a i
ca
Rising-falling-rising
a ci
Ja m aa
level
Jam a ica
8/3/2019 Prosody Modeling
18/31
4/30/2012 18
The American School
American school-type models make a distinctionbetween accents (what makes a particular wordprominent) and boundary tones (how a phraseends)
Autosegmental metrical or two-tone models
Only two tones, which may be combined
H = high target L = low target
8/3/2019 Prosody Modeling
19/31
4/30/2012 19
Pierrehumbert 1980
Contours = pitch accents, phrase accents,boundary tones
Pitch
Accents*
Phrase
Accents*
Boundary
Tone
H* L*
L*+H L+H*
H*+L H+L*
L- H-L% H%
8/3/2019 Prosody Modeling
20/31
4/30/2012 20
Price, Ostendorf et al
Break indices: degree ofjuncture betweenwords
0 8 (none to a lot)
What Id like is a nice roast beef sandwich.
8/3/2019 Prosody Modeling
21/31
4/30/2012 21
To(nes and)B(reak)I(ndices)
Developed by prosody researchers in fourmeetings over 1991-94
Putting Pierrehumbert 80 and Price,Ostendorf, et al together
Goals:
devise common labeling scheme forStandard American English that is robust
and reliable promote collection of large, prosodically
labeled, shareable corpora
8/3/2019 Prosody Modeling
22/31
4/30/2012 22
ToBI standards also proposed for Japanese, German,Italian, Spanish, British and Australian English,....
Minimal ToBI transcription:
Recording of speech
F0 contour
ToBI tiers: orthographic tier: words
break-index tier: degrees of junction (Price et al 89)
tonal tier: pitch accents, phrase accents, boundary tones
(Pierrehumbert 80) miscellaneous tier: disfluencies, non-speech sounds, etc.
8/3/2019 Prosody Modeling
23/31
4/30/2012 23
Sample ToBI Labeling
8/3/2019 Prosody Modeling
24/31
4/30/2012 24
Online training material,available at:
http://anita.simmons.edu/~tobi/index.html Evaluation
Good inter-labeler reliability for expertand naive labelers: 88% agreement on
presence/absence of tonal category, 81%agreement on category label, 91%agreement on break indices to within 1level (Silverman et al. 92,Pitrelli et al 94)
8/3/2019 Prosody Modeling
25/31
4/30/2012 25
Pitch Accent/Prominence in ToBI Which items are made intonationally prominent
and how: tonal targets/levels not movement
Accent type:
H* simple high (declarative) L* simple low (ynq)
L*+H scooped, late rise (uncertainty/incredulity)
L+H* early rise to stress (contrastive focus)
H+!H* fall onto stress (implied familiarity)
8/3/2019 Prosody Modeling
26/31
4/30/2012 26
Downstepped accents:
!H*,
L+!H*,
L*+!H
Degree of prominence:
within a phrase: HiF0 (~nuclear accent)
across phrases ??
8/3/2019 Prosody Modeling
27/31
4/30/2012 27
Prosodic Phrasing in ToBI Levels of phrasing:
intermediate phrase: one or more pitchaccents plus a phrase accent, H- or L-
intonational phrase: 1 or more intermediate
phrases + boundary tone, H% or L% ToBI break-index tier
0 no word boundary
1 word boundary
8/3/2019 Prosody Modeling
28/31
4/30/2012 28
2 strong juncture with no tonalmarkings
3 intermediate phrase boundary
4 intonational phrase boundary
8/3/2019 Prosody Modeling
29/31
4/30/2012 29
L*+H
L*
H*
H-H%H-L%L-H%L-L%
8/3/2019 Prosody Modeling
30/31
4/30/2012 30
H* !H*
H+!H*
L+H*
H-H%H-L%L-H%L-L%
8/3/2019 Prosody Modeling
31/31
4/30/2012 31
Next Class
Predicting prosodic assigments from text