Top Banner
A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE
29

A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Dec 14, 2015

Download

Documents

Kenia Pryer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

A Short History of Two-Level Morphology

Lauri Karttunen, Xerox PARC

Kenneth R. Beesley, XRCE

Page 2: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 2

Overview

• IntroductionWhat is morphology?

Two strains of finite-state morphology

State of the art circa 1980.

• Two-Level MorphologyOrigins, basic idea

Implementations, compilers

• Recent Developments

Page 3: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 3

What is Morphology?

• MorphosyntaxWords are composed of smalled units of meaning called

morphemes that must be combined in a certain order.piti-less-ness vs. *piti-ness-less

• Morphological AlternationsThe shape of a morpheme depends on its environment.

piti-less vs *pity-less

Page 4: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 4

Sequential Model

...

Surface form

Intermediate form

Lexical form

fst 1

fst 2

fst n

Ordered sequenceof rewrite rules

(Chomsky & Halle ‘68)can be modeledby a cascade of

finite-state transducersJohnson ‘72

Kaplan & Kay ‘81

Page 5: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 5

Parallel Model

Set of parallelof two-level rules

compiled into finite-state automatainterpreted as transducers

Koskenniemi ‘83

fst 1 fst 2 fst n...

Surface form

Lexical form

Page 6: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 6

Sequential vs. Parallel

inte

rsectcompose

FST Perhaps too large to be practical.

...

Surface form

Intermediate form

Lexical form

fst 1

fst 2

fst n

fst 1 fst 2 fst n...

Surface form

Lexical form

Page 7: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 7

State of the Art circa 1980

• Cut-and-paste analysisleaves --> leave --> leav --> leaf

ad-hoc programs, not reversible for generation

• Paradigm tablescomprendre 45

not reversible for analysis, impractical for morphologically complex languages

• Chomsky-Halle rewrite rulesx -> y / z _ w

computationally complex, no implementation, reversible?

Page 8: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 8

Discovery and Rediscovery

• C. Douglas Johnson (1972) showed that– phonological rewrite rules are interpreted in a way

that makes them less powerful than they appear– rewrite rules can be modeled by finite transducers– for any two finite transducers applied in a sequence

there exists an equivalent single transducer (Schützenberger 1961).

• Johnson’s result was ignored and forgotten, rediscovered by Ronald M. Kaplan and Martin Kay at Xerox around 1980.

Page 9: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 9

Sequential Application

N -> m / _ p

p -> m / m _

k a N p a n

k a m p a n

k a m m a n

Page 10: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 10

Sequential Application in Detail

N:m

N

?? 0

2

1

pN:m

m

pN

m

p:m

?? 0 1

mp

m

k a N p a n

k a m p a n

k a m m a n

0 0 0 2 0 0 0

0 0 0 1 0 0 0

Page 11: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 11

CompositionN:m

N

?? 0

3

1

N:m

m

p

N

?

m2

p:m

p:m

N m

N:mk a N p a n

k a m m a n

0 0 0 3 0 0 0

Page 12: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 12

Building a Compiler

• Requires a finite-state calculusconcatenation, union, intersection,

complementation...

• Constraints are regular languages “if p occurs then q follows”

. . . p. . . . q. . . .

?* p ?* q ?*

~[ ?* p ~[ ?* q ?* ]]

• The idea of double negation was Kaplan and Kay’s first insight. Many details remained to be worked out.

Page 13: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 13

The Problem of “Overanalysis”

k a m m a n

k a m p a n

k a m p a n

k a m m a n

k a N p a n

Page 14: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 14

The Birth of Two-Level Morpholgy

• In the spring of 1981 Kimmo Koskenniemi came to UT at Austin in search of a dissertation topic.

• Karttunen demoed his TEXFIN analyzer/generator for Finnish.

• Kaplan and Kay briefed him about their discoveries. Koskenniemi visited PARC.

• After a gestation period of about a year, two-level morphology was born.

Page 15: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 15

The Three Ideas of Two-Level Morphology

• Rules are symbol-to-symbol constraints that are applied in parallel, not sequentially like rewrite rules.

• The constraints can refer to the lexical context, to the surface context or to both contexts at the same time.

• Lexical lookup and morphological analysis are performed in tandem.

Page 16: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 16

Two-Level Constraints 1

k a N p a n

k a m m a n

k a N p a n

k a m m a n

N:m correspondence requires a following p on the lexical side.

p:m correspondence requires a preceding m on the surface side.

In this context, all other possible realization of a lexical p are prohibited.

In this context, all other possible realization of a lexical N are prohibited.

Page 17: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 17

Two-Level Constraints 2

s p y 0 + s

s p i e 0 s

y:i <=> _ 0:e

s p y 0 + s

s p i e 0 s

0:e <=> Cons: y: _ +: s:

Page 18: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 18

Parallel Application

N:mRule

p:mRule

k a m m a n

k a N p

Page 19: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 19

Lookup and Analysis in Tandem

k aN

p

N:mRule

p:mRule

k a m m a n

Page 20: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 20

Two-Level Implementations

• 1982 Koskenniemi (Pascal)• 1983 Karttunen et al. at UTexas (Lisp)• 1986- Antworth et al. at SIL (C)• 1987 Black et al. Alvey Project (Lisp)• 1989 Beesley Alpnet (Lisp)• 1991 Pulman et al. ALEP (Prolog)• 1995 Carter SRI CLE (Prolog)• 1995 Petitpierre et al. MULTEXT (C)

Page 21: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 21

Two-Level Rule Compilers

• 1985 Kaplan and Koskenniemi: the basic compilation algorithm developed during Koskenniemi’s visit at CSLI at Stanford on a Dandelion (Xerox Lisp machine). It was based on the techniques Kaplan and Kay had developed for compiling rewrite-rules.

• 1985-87 Koskenniemi and Karttunen: the first compiler

• 1992 Current C version (twolc) by Karttunen and Beesley.

• 1996 Grimley-Evans, Kiraz, Pulman: compiler for a “partition-based” two-level formalism

Page 22: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 22

Seeds of Dissatisfaction

• Two-level morphological analyzers became a standard component in natural language processing systems.

• But there was no publicly available compiler until recently.

• Morphotactics was “improved” by adding feature unification.Two-level analyzers acquired a reputation for being slow.

• Two-level rules are notoriously difficult to write, even with a compiler.

Page 23: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 23

Rule Conflicts

Resolution by underspecification:

k:0 | k:v <=> Vowel _ Vowel

k:v <=> u _ u

u _ u

Vowel _ Vowel

k:0

k:v

makunma un

Exception pukunpuvun

General rule

Page 24: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 24

Recent Developments

• The pioneers of finite-state morphology new that a cascade of transducers or a set of parallel rules could be combined into a single transducer.

• But the resulting single transducer is typically huge compared to the size of the original rule networks. Impractical in most cases.

• The obvious solution, not seen for a long time, was to compose the rules with the lexicon.

Page 25: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 25

Lexical Transducer

SourceLexicon

R1 R2 Rn...

LexicalTransducer

& &

o

Karttunen, Kaplan, Zaenen 1992intersection

composition

inflected form

canonical form inflection codes

s

s p y 0 +Noun +PL

p ei s0

Page 26: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 26

Cascade of Compositions

SourceLexicon

R1

Cj

Rn

...

Ci

o

o

o

o

replacerule

constraint

LexicalTransducer

composition

Page 27: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 27

Linguistic Issues

• The idea of rules as parallel constraints was not picked up by mainstream linguists in the 80’s.

• Many arguments had been advanced to show that phonological alternations could not be described or explained without sequential rewrite rules.

• The two-level model was perceived as a computational “hack”, not worthy of academic interest.

Page 28: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 28

Rise of Optimality Theory

• Optimality Theory, the dominant paradigm in phonology since 1993 is a two-level model with parallel constraints.

• Most optimality constraints can be encoded trivially as two-level rules.

• The main difference is that OT constraints are ranked and violable.

Page 29: A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Lauri Karttunen / 24 Aug 2001 / page 29

Back to the Big Picture

...

Surface form

Intermediate form

Lexical form

fst 1

fst 2

fst n

fst 1 fst 2 fst n...

Surface form

Lexical form

While the sequential model waspopular among mainstream linguists,computational linguists preferred theparallel model. Now it is almostthe other way round, although forcomputational linguists there is nosubstantive difference.