Prosody in Generation

JH 04/19/23 1

Prosody in GenerationProsody in Generation

JH 04/19/23 2

Natural Language Natural Language Generation (NLG)Generation (NLG)

• Typical NLG system does Text planning transforms communicative goal into

sequence or structure of elementary goals Sentence planning chooses linguistic resources to

achieve those goals Realization produces surface output

JH 04/19/23 3

Research Directions in NLGResearch Directions in NLG

• Past focus Hand-crafted rules inspired by small corpora Very little evaluation Monologue text generation

• New directions Large-scale corpus-based learning of system

components Evaluation important but how to do it still

unclear Spoken monologue and dialogue

04/19/234

AT&T Labs AT&T Labs ResearchResearch

How to produce speech instead How to produce speech instead of text?of text?

JH 04/19/23 5

OverviewOverview

• Spoken NLG in Dialogue Systems• Text-to-Speech (TTS) vs. Concept-

to-Speech (CTS)• Current Approaches to CTS

Hand-built systems Corpus-based systems

• NLG Evaluation• Open Questions

JH 04/19/23 6

Importance of NLG in Importance of NLG in Dialogue SystemsDialogue Systems

• Conveying information intonationally for conciseness and naturalness System turns in dialogue systems can be

shorterS: Did you say you want to go to Boston?S: (You want to go to) Boston H-H%

• Not providing mis-information through misleading prosody...S: (You want to go to) Boston L-L%

JH 04/19/23 7

• Silverman et al ‘93: Mimicking human prosody improves transcription accuracy in reverse telephone directory task

• Sanderman & Collier ‘97Subjects were quicker to respond to ‘appropriately phrased’ ambiguous responses to questions in a monitoring task

Q: How did I reserve a room? vs. Which facility did the hotel have?

A: I reserved a room L-H% in the hotel with the fax.

A: I reserved a room in the hotel L-H% with the fax.

JH 04/19/23 8

OverviewOverview





JH 04/19/23 9

Prosodic Generation for TTSProsodic Generation for TTS

• Default prosodic assignment from simple text analysis

• Hand-built rule-based system: hard to modify and adapt to new domains

• Corpus-based approaches (Sproat et al ’92) Train prosodic variation on large labeled

corpora using machine learning techniques Accent and phrasing decisions Associate prosodic labels with simple features

of transcripts

JH 04/19/23 10

• # of words in phrase

• distance from beginning or end of phrase

• orthography: punctuation, paragraphing

• part of speech, constituent information

Apply learned rules to new text

• Incremental improvements continue: Adding higher-accuracy parsing (Koehn et al ‘00)

• Collins ‘99 parser

• More sophisticated learning algorithms (Schapire & Singer ‘00)

• Better representations: tree based?

• Rules always impoverished• How to define Gold Standard?

JH 04/19/23 11

Spoken NLGSpoken NLG

• Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic structure,… information explicitly available to NLG

• Concept-to-Speech (CTS) systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how

• But….generating prosody for CTS isn’t so easy

JH 04/19/23 12

OverviewOverview


to-Speech (CTS)• Current approaches to CTS


• NLG evaluation• Open questions

JH 04/19/23 13

Relying upon Prior ResearchRelying upon Prior Research

• MIMIC CTS (Nakatani & Chu-Carroll ‘00) Use domain attribute/value distinction to drive

phrasing and accent: critical information focussedMovie: October Sky

Theatre: Hoboken Theatre

Town: Hoboken• Attribute names and values always accented• Values set off by phrase boundaries

Information status conveyed by varying accent type (Pierrehumbert & Hirschberg ‘90)• Old (given) L*• Inferrable (by MIMIC, e.g. theatre name from town) L*+H

JH 04/19/23 14

• Key (to formulating valid query) L+H*• New H*

Marking Dialogue Acts• NotifyFailure:

U: Where is “The Corrupter” playing in Cranford.S: “The Corrupter”[L+H*] is not [L+H*] playing in Cranford

[L*+H].• Other rules for logical connectives, clarification and

confirmation subdialogues

• Contrastive accent for semantic parallelism (Rooth ‘92, Pulman ‘97) used in GoalGetter and OVIS (Theune ‘99)

The cat eats fish. The dog eats meat.

JH 04/19/23 15

But … many But … many counterexamplescounterexamples

• Association of prosody with many syntactic, semantic, and pragmatic concepts still an open question

• Prosody generation from (past) observed regularities and assumptions: Information can be ‘chunked’ usefully by

phrasing for easier user understanding• But in many different ways

Information status can be conveyed by accent:• Contrastive information is accented?S: You want to go to L+H* Nijmegen, L+H* not

Eindhoven.

JH 04/19/23 16

Given information is deaccented? Speaker/hearer givenness

U: I want to go to Nijmegen.

S: You want to go to H* Nijmegen?

Intonational contours can convey speech acts, speaker beliefs:• Continuation rise can maintain the floor?

S: I am going to get you the train information [L-H%]. Backchanneling can be produced

appropriately?

S: Okay. Okay? Okaaay… Mhmm..

JH 04/19/23 17

Wh and yes-no questions can be signaled appropriately?

S: Where do you want to go.

S: What is your passport number? Discourse/topic structure can be signaled by

varying pitch range, pausal duration, rate?

JH 04/19/23 18

OverviewOverview





JH 04/19/23 19

MAGICMAGIC

• MM system for presenting cardiac patient data Developed at Columbia by McKeown and colleagues in

conjunction with Columbia Presbyterian Medical Center to automate post-operative status reporting for bypass patients

Uses mostly traditional NLG hand-developed components Generate text, then annotate prosodically Corpus-trained prosodic assignment component

• Corpus: written and oral patient reports 50min multi-speaker, spontaneous + 11min single speaker,

read 1.24M word text corpus of discharge summaries

JH 04/19/23 20

Transcribed, ToBI labeled Generator features labeled/extracted:

• syntactic function• p.o.s.• semantic category• semantic ‘informativeness’ (rarity in corpus)• semantic constituent boundary location and length• salience• given/new• focus• theme/ rheme• ‘importance’• ‘unexpectedness’

JH 04/19/23 21

Very hard to label features

• Results: new features to specify TTS prosody Of CTS-specific features only semantic

informativeness (likeliness of occuring in a corpus) useful so far (Pan & McKeown ‘99)

Looking at context, word collocation for accent placement helps predict accent (Pan & Hirschberg ‘00)RED CELL (less predictable) vs. BLOOD cell (more)Most predictable words are accented less frequently (40-

46%) and least predictable more (73-80%)Unigram+bigram model predicts accent status w/77% (+/-.51)

accuracy

JH 04/19/23 22

Stochastic, Corpus-based NLGStochastic, Corpus-based NLG

• Generate from a corpus rather than hand-built system For MT task, Langkilde & Knight ‘98 over-

generate from traditional hand-built grammar Output composed into lattice Linear (bigram) language model chooses best

path

• But … no guarantee of grammaticality How to evaluate/improve results? How to incorporate prosody into this kind of

generation model?

JH 04/19/23 23

FERGUS (Bangalore & FERGUS (Bangalore & Rambow ‘00)Rambow ‘00)

• Corpus-based learning to refine syntactic, lexical and prosodic choice

• Domain is DARPA Communicator task (air travel information)

• Uses stochastic tree model + linear LM + XTAG (hand-crafted) grammar

• Trained on WSJ dependency trees tagged with p.o.s., morphological information, syntactic SuperTags (grammatical function, subcat frame, arg realization), WordNet sense tags and prosodic labels (accent and boundary)

JH 04/19/23 24

• Input: Dependency tree of lexemes Any feature can be specified, e.g. syntactic, prosodic

controlcontrol

poachers poachers <L+H*><L+H*> nownow tradetrade

thethe undergroundunderground

JH 04/19/23 25

• Tree Chooser: Selects syntactic/prosodic properties for input nodes based match

with features of mothers and daughters in corpus

controlcontrol

poacherspoachers<L+H*><L+H*> nownow tradetrade

thethe undergroundunderground

JH 04/19/23 26

• Unraveler: Produces lattice of all syntactically possible

linearizations of tree using XTAG grammar

controlcontrol

poacherspoachers nownow tradetrade

thethe

undergroundunderground

poacherspoachersnownow

tradetrade

undergroundunderground

JH 04/19/23 27

• Linear Precedence Chooser: Finds most likely lattice traversal, using trigram

language model

Now [H*] poachers [L+H*] [L-] control the underground trade [H*] [L-L%].

• Many ways to implement each step How to choose which works ‘best’? How to evaluate output?

JH 04/19/23 28

OverviewOverview





JH 04/19/23 29

Evaluating NLGEvaluating NLG

• How to judge success/progress in NLG an open question Qualitative measures: preference Quantitative measures:

• task performance measures: speed, accuracy• automatic comparison to a reference corpus (e.g. string

edit-distance and variants, tree-similarity-based metrics)

Not always a single “best” solution

• Critical for stochastic systems to combine qualitative judgments with quantitative measures (Walker et al ’97)

JH 04/19/23 30

Qualitative Validation of Qualitative Validation of Quantitative MetricsQuantitative Metrics

• Subjects judged understandability and quality Candidates proposed by 4 evaluation metrics

to minimize distance from Gold Standard (Bangalore, Rambow & Whittaker ‘00)

Tree-based metrics correlate significantly with understandability and quality judgments -- string metrics do not

New objective metrics learned• Understandability accuracy = (1.31*simple tree

accuracy -.10*substitutions=.44)/.87• Quality accuracy = (1.02*simple tree accuracy

- .08*substitutions - .35)/.67

JH 04/19/23 31

OverviewOverview





JH 04/19/23 32

More Open Questions for More Open Questions for Spoken NLGSpoken NLG

• How much to model human original?• Planning for appropriate intonational

variation even important in recorded prompts• Timing and backchanneling• What kind of output is most

comprehensible?• What kind of output elicits most easily

understood user response? (Gustafson et al ’97,Clark & Brennan ‘99)

• Implementing variations in dialogue strategy Implicit confirmation Mixed initiative

Prosody in Generation

Documents