Top Banner
FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002
24

FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

FST Morphology

Miriam Butt

October 2002

Based on Beesley and Karttunen 2002

Page 2: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Recap

Last Time: Finite State Automata can model most anything that involes a finite amount of states.

We modeled a Coke Machine and saw that it could also be thought of as defining a language.

We will now look at the extension to natural language more closely.

Page 3: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

A One-Word Language

c a n t o

Page 4: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

A Three-Word Language

c a n t o

ti g r e

me s a

Page 5: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Analysis: A Successful Match

c a n t o

ti g r e

me s a

m e s aInput:

Page 6: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Rejects

The analysis of libro, tigra, cant, mesas will fail.

Why?

Page 7: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Transducers: Beyond Accept and Reject

c a n t o

ti g r e

me s a

ti g r e

c a n t om

e s a

Page 8: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Transducers: Beyond Accept and Reject

Analysis Process:• Start at the Start State

• Match the input symbols of string against the lower-side symbol on the arcs, consuming the input symbols and finding a path to a final state.

• If successful, return the string of upper-side symbols on the path as the result.

• If unsucessful, return nothing.

Page 9: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

A Two-Level Transducer

c a n t o

ti g r e

me s a

ti g r e

c a n t om

e s a

m e s aOutput:m e s aInput:

Page 10: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

A Lexical Transducer

c a n t a

Output:

c a n tInput:

r+PresInd

+1P+Sg

c a n t 0 0 0 0 0o

o

c a n t

+Verb

a r +PresInd +1P +Sg+Verb

One Possible Path through the Network

Page 11: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

A Lexical Transducer

c a n t o

Output:

c a n tInput:

+Masc +Sg

c a n t o 0 00

o

c a n t

+Noun

o +Masc +Sg+Noun

Another Possible Path through the Network

Page 12: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

The Tags

Tags or Symbols like +Noun or +Verb are arbitrary: the naming convention is determined by the (computational) linguist and depends on the larger picture (type of theory/type of application).

One very successful tagging/naming convention is the Penn Treebank Tag Set

Page 13: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

The Tags

What kind of Tags might be useful?

Page 14: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Generation vs. Analysis

The same finite state transducers we have been using for the analysis of a given surface string can also be used in reverse: for generation.

The XRCE people think of analysis as lookup, of generation of lookdown.

Page 15: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Generation --- Lookdown

c a n t a

Output:

c a n tInput:

r+PresInd

+1P+Sg

c a n t 0 0 0 0 0o

oc a n t

+Verb

a r +PresInd +1P +Sg+Verb

Page 16: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Generation --- Lookdown

Analysis Process:

• Start at the Start State and the beginning of the input string

• Match the input symbols of string against the upper-side symbols on the arcs, consuming the input symbols and finding a path to a final state.

• If successful, return the string of lower-side symbols on the path as the result.

• If generation is unsucessful, return nothing.

Page 17: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Concatenation

One can also concatenate two existing languages (finite state networks with one another to build up new words productively/dynamically.

This works nicely, but one has to write extra rules to avoid things like: *trys, *tryed, though trying is okay.

Page 18: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Concatenationw o r k

i n gs

e d

Network for the Language {“work”}

Network for the Language {“s”, “ed”, “ing”}

Page 19: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Concatenation

w o r k i n gs

e dConcatenation of the two networks

What strings/language does this result in?

Page 20: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Composition

Composition is an operation on two relations.

Composition of the two relations <x,y> and <y,z> yields <x, z>

Example: <“cat”, “chat”> with <“chat”, “Katze”> gives <“cat”, “Katze”>

Page 21: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Composition

K a t z e

c h a t

c a t

c h a t

Page 22: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Composition

K a t z e

c a t

c h a t

Merging the two networks

Page 23: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Composition

K a t z e

c a t

The Composition of the Networks

What is this reminiscent of?

Page 24: FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.

Other Uses for the Transducers

C A T

c a t

Upper/Lower Casing