Top Banner
Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011
75

Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Morphology & FSTsShallow Processing Techniques for NLP

Ling570October 17, 2011

Page 2: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

RoadmapTwo-level morphology summary

Unsupervised morphology

Page 3: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Combining FST Lexicon & Rules

Two-level morphological system: ‘Cascade’Transducer from Lexicon to IntermediateRule transducers from Intermediate to Surface

Page 4: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Integrating the LexiconReplace classes with stems

Page 5: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Using the E-insertion FST

(fox,fox): q0, q0,q0,q1, accept(fox#,fox#): q0.q0.q0.q1,q0, accept (fox^s#,foxes#): q0,q0,q0,q1,q2,q3,q4,q0,accept(fox^s,foxs): q0,q0,q0,q1 ,q2,q5,reject(fox^z#,foxz#) ?

Page 6: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

IssuesWhat do you think of creating all the rules for a

languages – by hand?Time-consuming, complicated

Page 7: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

IssuesWhat do you think of creating all the rules for a

languages – by hand?Time-consuming, complicated

Proposed approach: Unsupervised morphology induction

Page 8: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

IssuesWhat do you think of creating all the rules for a

languages – by hand?Time-consuming, complicated

Proposed approach: Unsupervised morphology induction

Potentially useful for many applications IR, MT

Page 9: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unsupervised MorphologyStart from tokenized text (or word frequencies)

talk 60talked 120walked 40walk 30

Page 10: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unsupervised MorphologyStart from tokenized text (or word frequencies)

talk 60talked 120walked 40walk 30

Treat as coding/compression problemFind most compact representation of lexicon

Popular model MDL (Minimum Description Length) Smallest total encoding:

Weighted combination of lexicon size & ‘rules’

Page 11: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

ApproachGenerate initial model:

Base set of words, compute MDL length

Page 12: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

ApproachGenerate initial model:

Base set of words, compute MDL length

Iterate:Generate a new set of words + some model to

create a smaller description size

Page 13: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

ApproachGenerate initial model:

Base set of words, compute MDL length

Iterate:Generate a new set of words + some model to

create a smaller description size

E.g. for talk, talked, walk, walked4 words

Page 14: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

ApproachGenerate initial model:

Base set of words, compute MDL length

Iterate:Generate a new set of words + some model to create

a smaller description size

E.g. for talk, talked, walk, walked4 words2 words (talk, walk) + 1 affix (-ed) + combination info2 words (t,w) + 2 affixes (alk,-ed) + combination info

Page 15: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Successful ApplicationsInducing word classes (e.g. N,V) by affix patterns

Unsupervised morphological analysis for MT

Word segmentation in CJK

Word text/sound segmentation in English

Page 16: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unit #1 Summary

Page 17: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Formal Languages Formal Languages and Grammars

Chomsky hierarchy Languages and the grammars that

accept/generate

Page 18: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Formal Languages Formal Languages and Grammars

Chomsky hierarchy Languages and the grammars that

accept/generate

EquivalencesRegular languagesRegular grammarsRegular expressionsFinite State Automata

Page 19: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Finite-State Automata & Transducers

Finite-State Automata:Deterministic & non-deterministic automata

Equivalence and conversionProbabilistic & weighted FSAs

Page 20: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Finite-State Automata & Transducers

Finite-State Automata:Deterministic & non-deterministic automata

Equivalence and conversionProbabilistic & weighted FSAs

Packages and operations: Carmel

Page 21: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Finite-State Automata & Transducers

Finite-State Automata:Deterministic & non-deterministic automata

Equivalence and conversionProbabilistic & weighted FSAs

Packages and operations: Carmel

FSTs & regular relationsClosures and equivalencesComposition, inversion

Page 22: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

FSA/FST ApplicationsRange of applications:

ParsingTranslationTokenization…

Page 23: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

FSA/FST ApplicationsRange of applications:

ParsingTranslationTokenization…

Morphology:Lexicon: cat: N, +Sg; -s: PlMorphotactics: N+PLOrthographic rules: fox + s foxesParsing & Generation

Page 24: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

ImplementationTokenizers

FSA acceptors

FST acceptors/translators

Orthographic rule as FST

Page 25: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Language Modeling

Page 26: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

RoadmapMotivation:

LM applications

N-grams

Training and Testing

Evaluation: Perplexity

Page 27: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Predicting WordsGiven a sequence of words, the next word is

(somewhat) predictable: I’d like to place a collect …..

Page 28: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Predicting WordsGiven a sequence of words, the next word is

(somewhat) predictable: I’d like to place a collect …..

Ngram models: Predict next word given previous N

Language models (LMs):Statistical models of word sequences

Page 29: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Predicting WordsGiven a sequence of words, the next word is

(somewhat) predictable: I’d like to place a collect …..

Ngram models: Predict next word given previous N

Language models (LMs):Statistical models of word sequences

Approach: Build model of word sequences from corpusGiven alternative sequences, select the most

probable

Page 30: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

N-gram LM ApplicationsUsed in

Speech recognition

Spelling correction

Augmentative communication

Part-of-speech tagging

Machine translation

Information retrieval

Page 31: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

TerminologyCorpus (pl. corpora):

Online collection of text of speechE.g. Brown corpus: 1M word, balanced text collectionE.g. Switchboard: 240 hrs of speech; ~3M words

Page 32: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

TerminologyCorpus (pl. corpora):

Online collection of text of speechE.g. Brown corpus: 1M word, balanced text collectionE.g. Switchboard: 240 hrs of speech; ~3M words

Wordform: Full inflected or derived form of word: cats,

glottalized

Page 33: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

TerminologyCorpus (pl. corpora):

Online collection of text of speechE.g. Brown corpus: 1M word, balanced text collectionE.g. Switchboard: 240 hrs of speech; ~3M words

Wordform: Full inflected or derived form of word: cats,

glottalized

Word types: # of distinct words in corpus

Page 34: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

TerminologyCorpus (pl. corpora):

Online collection of text of speechE.g. Brown corpus: 1M word, balanced text collectionE.g. Switchboard: 240 hrs of speech; ~3M words

Wordform: Full inflected or derived form of word: cats,

glottalized

Word types: # of distinct words in corpus

Word tokens: total # of words in corpus

Page 35: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Corpus CountsEstimate probabilities by counts in large

collections of text/speech

Should we count:Wordform vs lemma ?

Case? Punctuation? Disfluency?

Type vs Token ?

Page 36: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Words, Counts and Prediction

They picnicked by the pool, then lay back on the grass and looked at the stars.

Page 37: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Words, Counts and Prediction

They picnicked by the pool, then lay back on the grass and looked at the stars.Word types (excluding punct):

Page 38: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Words, Counts and Prediction

They picnicked by the pool, then lay back on the grass and looked at the stars.Word types (excluding punct): 14Word tokens (“ ):

Page 39: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Words, Counts and Prediction

They picnicked by the pool, then lay back on the grass and looked at the stars.Word types (excluding punct): 14Word tokens (“ ): 16.

I do uh main- mainly business data processingUtterance (spoken “sentence” equivalent)

Page 40: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Words, Counts and Prediction

They picnicked by the pool, then lay back on the grass and looked at the stars.Word types (excluding punct): 14Word tokens (“ ): 16.

I do uh main- mainly business data processingUtterance (spoken “sentence” equivalent)What about:

Disfluenciesmain-: fragmentuh: filler (aka filled pause)

Page 41: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Words, Counts and Prediction

They picnicked by the pool, then lay back on the grass and looked at the stars.Word types (excluding punct): 14Word tokens (“ ): 16.

I do uh main- mainly business data processingUtterance (spoken “sentence” equivalent)What about:

Disfluenciesmain-: fragmentuh: filler (aka filled pause)

Keep, depending on app.: can help prediction; uh vs um

Page 42: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

LM TaskTraining:

Given a corpus of text, learn probabilities of word sequences

Page 43: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

LM TaskTraining:

Given a corpus of text, learn probabilities of word sequences

Testing:Given trained LM and new text, determine

sequence probabilities, orSelect most probable sequence among

alternatives

Page 44: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

LM TaskTraining:

Given a corpus of text, learn probabilities of word sequences

Testing:Given trained LM and new text, determine

sequence probabilities, orSelect most probable sequence among

alternatives

LM types:Basic, Class-based, Structured

Page 45: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word PredictionGoal:

Given some history, what is probability of some next word?

Formally, P(w|h)e.g. P(call|I’d like to place a collect)

Page 46: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word PredictionGoal:

Given some history, what is probability of some next word?

Formally, P(w|h)e.g. P(call|I’d like to place a collect)

How can we compute?

Page 47: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word PredictionGoal:

Given some history, what is probability of some next word?

Formally, P(w|h)e.g. P(call|I’d like to place a collect)

How can we compute?Relative frequency in a corpus

C(I’d like to place a collect call)/C(I’d like to place a collect)

Issues?

Page 48: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word PredictionGoal:

Given some history, what is probability of some next word? Formally, P(w|h)

e.g. P(call|I’d like to place a collect)

How can we compute? Relative frequency in a corpus

C(I’d like to place a collect call)/C(I’d like to place a collect)

Issues? Zero counts: language is productive! Joint word sequence probability of length N:

Count of all sequences of length N & count of that sequence

Page 49: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word Sequence ProbabilityNotation:

P(Xi=the) written as P(the)

P(w1w2w3…wn) =

Page 50: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word Sequence ProbabilityNotation:

P(Xi=the) written as P(the)

P(w1w2w3…wn) =

Compute probability of word sequence by chain rule Links to word prediction by history

Page 51: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word Sequence ProbabilityNotation:

P(Xi=the) written as P(the)

P(w1w2w3…wn) =

Compute probability of word sequence by chain rule Links to word prediction by history

Issues?

Page 52: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word Sequence ProbabilityNotation:

P(Xi=the) written as P(the)

P(w1w2w3…wn) =

Compute probability of word sequence by chain rule Links to word prediction by history

Issues? Potentially infinite history

Page 53: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Word Sequence Probability Notation:

P(Xi=the) written as P(the)

P(w1w2w3…wn) =

Compute probability of word sequence by chain rule Links to word prediction by history

Issues? Potentially infinite history Language infinitely productive

Page 54: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Markov AssumptionsExact computation requires too much data

Page 55: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Markov AssumptionsExact computation requires too much data

Approximate probability given all prior wordsAssume finite history

Page 56: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Markov AssumptionsExact computation requires too much data

Approximate probability given all prior wordsAssume finite historyUnigram: Probability of word in isolation (0th order)Bigram: Probability of word given 1 previous

First-order Markov

Trigram: Probability of word given 2 previous

Page 57: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Markov AssumptionsExact computation requires too much data

Approximate probability given all prior words Assume finite history Unigram: Probability of word in isolation (0th order) Bigram: Probability of word given 1 previous

First-order Markov

Trigram: Probability of word given 2 previous

N-gram approximation

)|()|( 11

11

nNnn

nn wwPwwP

)|()( 11

1 k

n

kk

n wwPwPBigram sequence

Page 58: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unigram ModelsP(w1w2…w3)~

Page 59: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unigram ModelsP(w1w2…w3) ~ P(w1)*P(w2)*…*P(wn)

Training: Estimate P(w) given corpus

Page 60: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unigram ModelsP(w1w2…w3) ~ P(w1)*P(w2)*…*P(wn)

Training: Estimate P(w) given corpusRelative frequency:

Page 61: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unigram ModelsP(w1w2…w3) ~ P(w1)*P(w2)*…*P(wn)

Training: Estimate P(w) given corpusRelative frequency: P(w) = C(w)/N, N=# tokens in

corpusHow many parameters?

Page 62: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Unigram ModelsP(w1w2…w3) ~ P(w1)*P(w2)*…*P(wn)

Training: Estimate P(w) given corpusRelative frequency: P(w) = C(w)/N, N=# tokens in

corpusHow many parameters?

Testing: For sentence s, compute P(s)

Model with PFA: Input symbols? Probabilities on arcs? States?

Page 63: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Bigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

Page 64: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Bigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|w1)*…*P(wn|wn-1)*P(EOS|wn)

Training: Relative frequency:

Page 65: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Bigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|w1)*…*P(wn|wn-1)*P(EOS|wn)

Training: Relative frequency: P(wi|wi-1) = C(wi-1wi)/C(wi-1)

How many parameters?

Page 66: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Bigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|w1)*…*P(wn|wn-1)*P(EOS|wn)

Training: Relative frequency: P(wi|wi-1) = C(wi-1wi)/C(wi-1)

How many parameters?

Testing: For sentence s, compute P(s)

Model with PFA: Input symbols? Probabilities on arcs? States?

Page 67: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Trigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

Page 68: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Trigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|BOS,w1)*… *P(wn|wn-2,wn-

1)*P(EOS|wn-1,wn)

Training: P(wi|wi-2,wi-1)

Page 69: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Trigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|BOS,w1)*… *P(wn|wn-2,wn-

1)*P(EOS|wn-1,wn)

Training: P(wi|wi-2,wi-1) = C(wi-2 wi-1wi)/C(wi-2wi-1)

How many parameters?

Page 70: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Trigram ModelsP(w1w2…w3) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|BOS,w1)*… *P(wn|wn-2,wn-

1)*P(EOS|wn-1,wn)

Training: P(wi|wi-2,wi-1) = C(wi-2 wi-1wi)/C(wi-2wi-1)

How many parameters?

How many states?

Page 71: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

Speech and Language Processing - Jurafsky and Martin

An Example<s> I am Sam </s>

<s> Sam I am </s>

<s> I do not like green eggs and ham </s>

Page 72: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

RecapNgrams:

# FSA states:

Page 73: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

RecapNgrams:

# FSA states: |V|n-1

# Model parameters:

Page 74: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

RecapNgrams:

# FSA states: |V|n-1

# Model parameters: |V|n

Issues:

Page 75: Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.

RecapNgrams:

# FSA states: |V|n-1

# Model parameters: |V|n

Issues:Data sparseness, Out-of-vocabulary elements

(OOV) Smoothing

Mismatches between training & test dataOther Language Models