Top Banner
CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee
51

CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

CMSC 723 / LING 645: Intro to Computational Linguistics

September 15, 2004: Dorr

More about FSA’s, Finite State Morphology (J&M 3)

Prof. Bonnie J. DorrDr. Christof Monz

TA: Adam Lee

Page 2: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

More about FSAs

TransducersEquivalence of DFSAs and NFSAsRecognition as search: depth-first, breadth-

search

Page 3: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Recognition using NFSAs

Page 4: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

NFSA Recognition of “baaa!”

Page 5: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Breadth-first Recognition of “baaa!”

should be q2

Page 6: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Regular languages

Regular languages are characterized by FSAs

For every NFSA, there is an equivalent DFSA.

Regular languages are closed under concatenation, Kleene closure, union.

Page 7: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Concatenation

Page 8: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Kleene Closure

Page 9: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Union

Page 10: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Morphology

Definitions and Problems– What is Morphology?– Topology of Morphologies

Approaches to Computational Morphology– Lexicons and Rules– Computational Morphology Approaches

Page 11: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Morphology

The study of the way words are built up from smaller meaning units called Morphemes

Syntax Lexeme/Inflected Lexeme Grammars sentences

Morphology Morpheme/Allomorph Morphotactics words

Phonology Phoneme/Allophone Phonotactics letters

Abstract versus Realized HOP +PAST hop +ed hopped /hapt/

Page 12: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Phonology and Morphology

Phonology vs. OrthographyHistorical spelling

– night, nite – attention, mission, fish

Script Limitations– Spoken English has 14 vowels

• heed hid hayed head had hoed hood who’d hide how’d taught Tut toy enough

– English Alphabet has 5• Use vowel combinatios: far fair fare• Consonantal doubling (hopping vs. hoping)

Page 13: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Syntax and Morphology

Phrase-level agreement– Subject-Verb

• John studies hard (STUDY+3SG)

– Noun-Adjective• Las vacas hermosas

Sub-word phrasal structures– נויספרבש– נו+ים+ספר+ב+ש– That+in+book+PL+Poss:1PL– Which are in our books

conj

prep

noun

posspluralarticle

Page 14: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Topology of Morphologies

Concatenative vs. Templatic

Derivational vs. Inflectional

Regular vs. Irregular

Page 15: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Concatenative Morphology

Morpheme+Morpheme+Morpheme+…Stems: also called lemma, base form, root, lexeme

– hope+ing hoping hop hoppingAffixes

– Prefixes: Antidisestablishmentarianism– Suffixes: Antidisestablishmentarianism– Infixes: hingi (borrow) – humingi (borrower) in Tagalog– Circumfixes: sagen (say) – gesagt (said) in German

Agglutinative Languages– uygarlaştıramadıklarımızdanmışsınızcasına– uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına– Behaving as if you are among those whom we could not cause to become civilized

Page 16: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Templatic Morphology

Roots and Patterns

وكتمب

ب K T B

و? ??َم�

تك

בוכת

ב

ו? ??

תכ

maktuubwritten

ktuuvwritten

Page 17: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Templatic Morphology: Root Meaning

KTB: writing “stuff”

כתב

מכתב

כתב

כתיבspelling

כתובתaddress

كتب

كاتب

مكتوب

كتابbook

مكتبةlibrary

مكتبoffice

write

writer

letter

Page 18: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Derivational vs. Inflectional

Word Classes– Parts of speech: noun, verb, adjectives, etc.– Word class dictates how a word combines with

morphemes to form new words

Page 19: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Derivational morphology

Nominalization: computerization, appointee, killer, fuzziness

Formation of adjectives: computational, clueless, embraceable

CatVar: Categorial Variation Databasehttp://clipdemos.umiacs.umd.edu/catvar/

Page 20: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Inflectional morphology

Adds: Tense, number, person, mood, aspect

Word class doesn’t changeWord serves new grammatical roleFive verb forms in EnglishOther languages have (lots more)

Page 21: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Nouns and Verbs (in English)

Nouns have simple inflectional morphology– cat– cat+s, cat+’s

Verbs have more complex morphology

Page 22: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Regulars and Irregulars

Nouns– Cat/Cats– Mouse/Mice, Ox, Oxen, Goose, Geese

Verbs– Walk/Walked– Go/Went, Fly/Flew

Page 23: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Regular (English) Verbs

Morphological Form Classes Regularly Inflected Verbs

Stem walk merge try map

-s form walks merges tries maps

-ing form walking merging trying mapping

Past form or –ed participle walked merged tried mapped

Page 24: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Irregular (English) Verbs

Morphological Form Classes Irregularly Inflected Verbs

Stem eat catch cut

-s form eats catches cuts

-ing form eating catching cutting

Past form ate caught cut

-ed participle eaten caught cut

Page 25: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

“To love” in Spanish

Page 26: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Computational Morphology

Finite State Morphology– Finite State Transducers (FST)

Input/Output

Analysis/Generation

Page 27: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Computational Morphology

WORD STEM (+FEATURES)* cats cat +N +PL cat cat +N +SG cities city +N +PLgeese goose +N +PLducks (duck +N +PL) or

(duck +V +3SG)merging merge +V +PRES-PART caught (catch +V +PAST-PART) or

(catch +V +PAST)

Page 28: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Building a Morphological Parser

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only

Page 29: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Lexicon-only Morphology

acclaim acclaim $N$

acclaim acclaim $V+0$

acclaimed acclaim $V+ed$

acclaimed acclaim $V+en$

acclaiming acclaim $V+ing$

acclaims acclaim $N+s$

acclaims acclaim $V+s$

acclamation acclamation $N$

acclamations acclamation $N+s$

acclimate acclimate $V+0$

acclimated acclimate $V+ed$

acclimated acclimate $V+en$

acclimates acclimate $V+s$

acclimating acclimate $V+ing$

• The lexicon lists all surface level and lexical level pairs

• No rules …?

• Analysis/Generation is easy

• Very large for English

• What about Arabic or Turkish?

• Chinese?

Page 30: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Building a Morphological Parser

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only

Page 31: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Lexicon and Rules:FSA Inflectional Noun Morphology

reg-noun Irreg-pl-noun Irreg-sg-noun plural

fox

cat

dog

geese

sheep

mice

goose

sheep

mouse

-s

• English Noun Lexicon

• English Noun Rule

Page 32: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Lexicon and Rules: FSA English Verb Inflectional Morphology

reg-verb-stem irreg-verb-stem irreg-past-verb past past-part pres-part 3sg

walkfrytalkimpeach

cutspeakspokensing

sang

caughtateeaten

-ed -ed -ing -s

Page 33: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

FSA for Derivational Morphology: Adjectival Formation

Page 34: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

More Complex Derivational Morphology

Page 35: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Using FSAs for Recognition: English Nouns and their Inflection

Page 36: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Morphological Parsing

Finite-state automata (FSA) – Recognizer– One-level morphology

Finite-state transducers (FST)– Two-level morphology

• PC-Kimmo (Koskenniemi 83)

– input-output pair

Page 37: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Terminology for PC-Kimmo

Upper = lexical tapeLower = surface tapeCharacters correspond to pairs, written a:b If “a:a”, write “a” for shorthandTwo-level lexical entries# = word boundary ^ = morpheme boundaryOther = “any feasible pair that is not in this transducer”Final states indicated with “:” and non-final states

indicated with “.”

Page 38: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Four-Fold View of FSTs

As a recognizerAs a generatorAs a translatorAs a set relater

Page 39: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Nominal Inflection FST

Page 40: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Lexical and Intermediate Tapes

Page 41: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Spelling Rules

Name Rule Description Example

Consonant Doubling 1-letter consonant doubled before -ing/-ed beg/begging

E-deletion Silent e dropped before -ing and -ed make/making

E-insertion e added after s,z,x,ch,sh before s watch/watches

Y-replacement -y changes to -ie before -s, -i before -ed try/tries

K-insertion verbs ending with vowel + -c add -k panic/panicked

Page 42: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Chomsky and Halle Notation

ε → e / xsz

^ __ s #

Page 43: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Intermediate-to-Surface Transducer

Page 44: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

State Transition Table

Page 45: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Two-Level Morphology

Page 46: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Sample Run

KIMMO DEMO

Page 47: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

FSTs and ambiguity

Parse Example 1: unionizable– union +ize +able– un+ ion +ize +able

Parse Example 2: assess– assessv– assN +essN

Parse Example 3: tender– tenderAJ– tenNum+dAJ+erCMP

Page 48: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

What to do about Global Ambiguity?

Accept first successful structure

Run parser through all possible paths

Bias the search in some manner

Page 49: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Computational Morphology

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only

Page 50: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Computational Morphology

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only (next time!!)

Page 51: CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Readings for next time

J&M Chapter 6