Top Banner
A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen
35

A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Mar 28, 2015

Download

Documents

Autumn Corbett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

A short introduction to Natural Language Generation

Kees van Deemter

Computing Science

University of Aberdeen

Page 2: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

These introductory slides

... owe much to earlier slides by Chris Mellish

Page 3: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

First: NLG from a practical perspective Goal (usually):

computer software which produces understandable and appropriate texts in some human language

Input: some non-linguistic representation of information (e.g.,

tables in database, logical formulas, JAVA code, ...) Output:

documents, reports, explanations, help messages, ... Knowledge sources required:

knowledge of language and of the domain; maybe of the intended audience as well

Page 4: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Text

Language Technology

Natural Language Understanding

Natural Language Generation

Speech Recognition

Speech Synthesis

Text

Meaning

Speech Speech

Page 5: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example System: FoG

Function: Produces textual weather reports in English and French

Input: Graphical/numerical weather depiction

User: Environment Canada (Canadian Weather Service)

Developer: CoGenTex. [Kitteridge, Goldberg and Driedger 1994.]

Status: Fielded, in operational use since 1992

Page 6: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

FoG: Input

Page 7: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

FoG: Output

Page 8: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example System: STOP

Function: Produce personalised stop-smoking leaflets

Input: Questionnaire about smoking status, beliefs, etc

Target user: NHS

Developer: Aberdeen University (CS, Medicine, GP Depts)

[Reiter & Robertson 1999] See http://www.csd.abdn.ac.uk/research/stop/onlineQ.htm

Status: Clinical trial suggested not effective

Page 9: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

STOP: Input

SMOKING QUESTIONNAIREPlease answer by marking the most appropriate box for each question like this:

Q1 Have you smoked a cigarette in the last week, even a puff?YES NO

Please complete the following questions Please return the questionnaire unanswered in theenvelope provided. Thank you.

Please read the questions carefully. If you are not sure how to answer, just give the best answer you can.

Q2 Home situation:Livealone

Live withhusband/wife/partner

Live withother adults

Live withchildren

Q3 Number of children under 16 living at home ………………… boys ………1……. girls

Q4 Does anyone else in your household smoke? (If so, please mark all boxes which apply)husband/wife/partner other family member others

Q5 How long have you smoked for? …10… years Tick here if you have smoked for less than a year

Page 10: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

STOP: Output

Dear Ms Cameron

Thank you for taking the trouble to return the smoking questionnaire that we sent you. It appears from your answers that although you're not planning to stop smoking in the near future, you would like to stop if it was easy. You think it would be difficult to stop because smoking helps you cope with stress, it is something to do when you are bored, and smoking stops you putting on weight. However, you have reasons to be confident of success if you did try to stop, and there are ways of coping with the difficulties.

Page 11: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example System: Dial Your Disc (DYD) Function:

Context-sensitive descriptions of Mozart’s instrumental music

Input: Music database + history of interaction

Target user: Music industry, customers for music-on-demand

Developer: Philips Electronics (Nat Lab – IPO, Eindhoven; 1993-6)

[Van Deemter & Odijk 1995] Status:

Not deployed; methods reused in GOALGETTER and other systems

Page 12: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example System: Dial Your Disc (DYD)

User composes a home-made CD. A number of tracks are on the CD already.

Speech (with keyword spotting) tells system what type of music the user would like to add to the CD

E.g., “I’d like some piano music”. “I’m interested in solo performances”. “piano”, “solo”

System chooses one composition with solo piano (at random). The music starts. After a while, a text is spoken (while the music is turned down).

Previous descriptions are taken into account. For example, the second time a piano sonata is selected, the following text may be generated:

(Many choices were randomised, so you would seldom get the same monologue twice)

Page 13: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example System: Dial Your Disc (DYD)

Example of approximate output, in its most elaborate form:

“The following+ composition+, from which you are going to hear a fragment+ of part three+, was written+ by Mozart in the beginning+ of seventeen+ seventy+ five+, in Munich+. The work is also+ a sonata+ in f+, like the preceding+ composition, but now+ for piano+. The KV+ number of this work is K. two+ eight+ zero+. This sonata+ consists of three+ parts+: allegro assai+, adagio+, and presto+. The presto lasts two+ minutes+ forty+ five+ seconds+. This presto is located on track six+ of first+ CD+ of volume seventeen+. The piano+ is played by Mitsuko Uchida+. The recording+ of the sonata+ was made+ in the Henry Wood+ Hall in London+, England, in the eighties+. The quality+ of its recording is DDD+. The following+ is a fragment+ of the third+ part+.” [A fragment follows] Each “+” marks a pitch accent on the preceding word

Page 14: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example System: ILEX Function:

Context-sensitive descriptions of museum artefacts Input:

Museum database + history of interaction Target user:

National Museums of Scotland Developer:

Edinburgh University [R.Dale et al. 1998; Oberlander et al. 1998]

See http://www.hcrc.ed.ac.uk/ilex/systemintro.html Status:

Commercial application under investigation

Page 15: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.
Page 16: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

When to use NLG?NLG is better than having people write texts

when: There are many potential documents to be

written, differing according to the context (user, situation, language)

There are some general principles behind document design.

Page 17: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Why is NLG hard? NLG involves making many choices, e.g.

which content to include, what order to say it in, what words and syntactic constructions to use.

Linguistics does not yet provide us with a ready-made, precise theory about how to make such choices to produce coherent text

Page 18: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Why is NLG hard? The choices to be made interact with one

another in complex ways Many results of choices (e.g. length and

readability of the text) are only visible at the end of the process

Page 19: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Choices

The Serbian Prime Minister, Zoran Djindjic, has been assassinated in the capital, Belgrade.

The pro-reform, pro-Western leader was shot in the stomach and in the back outside government offices at around 1300 (1200 gmt), and died of his wounds in hospital.

(BBC news, UK edition, 12/3/03)

Page 20: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Tasks and architecture Most practical NLG systems use a fixed order

in which these generation tasks are performed

After Reiter 1994, we often speak of the NLG pipeline

Different systems use slightly different orderings.

Page 21: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Tasks and Architecture in NLG

Content Determination

Document Structuring

Aggregation

Lexicalisation

Generation of Referring Expressions

Linguistic Realisation

Physical Realisation

Document Planning

Micro-planning

Surface Realisation

Page 22: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example: Noun Phrase design

A noun phrase can convey an arbitrary amount of information: Someone vs a designer vs an old designer

vs an old designer with red hair … How much information should we “pack into”

a given NP?

Page 23: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Some Issues to Consider Telling the reader what they need to know (e.g., who

you’re talking about, and what’s worth knowing about them)

Clarity and readability of the NP; other effects on the reader (e.g., via politeness) Successful use of pronouns and abbreviated

references

Page 24: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Example Content(NB we assume that words, basic syntax etc have been

chosen)

This T-shirt was made by James Sportler .Sportler is a famous British designer.He drives an ancient pink Jaguar.He works in London with Thomas Wendsop.Wendsop won the first prize in the FWJG awards.

Can/should we add more to the NP?

Page 25: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

One possible additionThis T-shirt was made by James Sportler, who works in London with

Thomas Wendsop .

Sportler is a famous British designer. He drives an ancient pink Jaguar.

Wendsop won the first prize in the FWJG awards.

Facts about Wendsop are now separated from one another (focus).

Wendsop now has greater prominence in the text (ordering)

Page 26: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Another possible addition

This T-shirt was made by James Sportler, a famous British designer who works in London with Thomas Wendsop, who won the first prize in the FWJG awards .

Sportler drives an ancient pink Jaguar.

The NP is now very complex (readability) “He” now doesn’t seem to work in the second

sentence (pronouns)

Page 27: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Another possible addition

This T-shirt was made by James Sportler, a famous British designer .

He drives an ancient pink Jaguar.

He works in London with Thomas Wendsop.

Wendsop won the first prize in the FWJG awards.

Possibly the best solution, but why?

Page 28: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

NLG Beyond Words

Plain text words and punctuation

Printed documents (eg newspapers) need to consider typography, layout, graphics

Online documents (eg Web pages) need to consider hypertext links

Speech (eg radio broadcasts, telephone) need to consider prosody

Visual presentation (eg Embodied Conversational Agents) need to consider animation, facial expressions too

Page 29: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Plain Text

When time is limited, travel by limousine, unless cost is also limited, in which case go by train. When only cost is limited a bicycle should be used for journeys of less than 10 kilometers, and a bus for longer journeys. Taxis are recommended when there are no constraints on time or cost, unless the distance to be travelled exceeds 10 kilometers. For journeys longer than 10 kilometers, when time and cost are not important, journeys should be made by hire car.

Page 30: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

With Typography and Layout

When only time is limited:travel by Limousine

When only cost is limited:travel by Bus if journey more than 10 kilometerstravel by Bicycle if journey less than 10 kilometers

When both time and cost are limited:travel by Train

When time and cost are not limited:travel by Hire Car if journey more than 10 kilometerstravel by Taxi if journey less than 10 kilometers

Page 31: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Plain Text (e.g. Andre and Rist 2000) Push the code switch S-4 to the right. The code switch is located in

front of the transformer.

Text and Graphics

Page 32: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Embodied Conversational Agents (ECAs) Until recently, textual aspects of ECAs were

largely canned Recent systems use NLG Example: NECA e-Showroom system for car

sales. Input to NLG includes: facts about the car agent’s interests interaction history

Page 33: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Second perspective: NLG as a branch of linguistics The choices made by an NLG system involve the

mapping between words and things/ideas. Surely, this is linguistic territory!

If linguists cannot say how the different stories about James Sportler differ, then who can?

An NLG program might be seen as a model of language production (in terms of its output; the human production process may be very different)

This course is neutral between the practical and the theoretical perspective, but I am mostly interested in contributions to (linguistic) theory.

Page 34: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Conclusions NLG is the (somewhat less investigated) twin

brother of NL Understanding Just like the interpretive perspective (of NLU),

the generative perspective (of NLG) poses deep theoretical problems about language and communication

NLG has great potential for applications In applications and theory alike, NLG and

NLU are sometimes difficult to separate

Page 35: A short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen.

Hidden agenda

Highlight open questions Get more people to work on Natural

Language Generation (NLG)