Top Banner
Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011
82

Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Mar 30, 2015

Download

Documents

Garret Wescott
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Language Models & Smoothing

Shallow Processing Techniques for NLPLing570

October 19, 2011

Page 2: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

AnnouncementsCareer exploration talk: Bill McNeill

Thursday (10/20): 2:30-3:30pmThomson 135 & Online (Treehouse URL)

Treehouse meeting: Friday 10/21: 11-12Thesis topic brainstorming

GP Meeting: Friday 10/21: 3:30-5pmPCAR 291 & Online (…/clmagrad)

Page 3: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

RoadmapNgram language models

Constructing language models

Generative language models

Evaluation:Training and TestingPerplexity

Smoothing:Laplace smoothingGood-Turing smoothing Interpolation & backoff

Page 4: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Ngram Language ModelsIndependence assumptions moderate data

needs

Approximate probability given all prior wordsAssume finite historyUnigram: Probability of word in isolation Bigram: Probability of word given 1 previousTrigram: Probability of word given 2 previous

N-gram approximation

)|()|( 11

11

nNnn

nn wwPwwP

)|()( 11

1 k

n

kk

n wwPwPBigram sequence

Page 5: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Berkeley Restaurant Project Sentences

can you tell me about any good cantonese restaurants close by

mid priced thai food is what i’m looking for

tell me about chez panisse

can you give me a listing of the kinds of food that are available

i’m looking for a good place to eat breakfast

when is caffe venezia open during the day

Page 6: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Bigram CountsOut of 9222 sentences

Eg. “I want” occurred 827 times

Page 7: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Bigram ProbabilitiesDivide bigram counts by prefix unigram counts

to get probabilities.

Page 8: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Bigram Estimates of Sentence Probabilities

P(<s> I want english food </s>) =

P(i|<s>)*

P(want|I)*

P(english|want)*

P(food|english)*

P(</s>|food)

=.000031

Page 9: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Kinds of Knowledge

P(english|want) = .0011

P(chinese|want) = .0065

P(to|want) = .66

P(eat | to) = .28

P(food | to) = 0

P(want | spend) = 0

P (i | <s>) = .25

What types of knowledge are captured by ngram models?

Page 10: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Kinds of Knowledge

P(english|want) = .0011

P(chinese|want) = .0065

P(to|want) = .66

P(eat | to) = .28

P(food | to) = 0

P(want | spend) = 0

P (i | <s>) = .25

World knowledge

What types of knowledge are captured by ngram models?

Page 11: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Kinds of Knowledge

P(english|want) = .0011

P(chinese|want) = .0065

P(to|want) = .66

P(eat | to) = .28

P(food | to) = 0

P(want | spend) = 0

P (i | <s>) = .25

World knowledge

Syntax

What types of knowledge are captured by ngram models?

Page 12: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Kinds of Knowledge

P(english|want) = .0011

P(chinese|want) = .0065

P(to|want) = .66

P(eat | to) = .28

P(food | to) = 0

P(want | spend) = 0

P (i | <s>) = .25

World knowledge

Syntax

Discourse

What types of knowledge are captured by ngram models?

Page 13: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Probabilistic Language Generation

Coin-flipping modelsA sentence is generated by a randomized

algorithmThe generator can be in one of several “states”Flip coins to choose the next stateFlip other coins to decide which letter or word to

output

Page 14: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Generated Language:Effects of N

1. Zero-order approximation:XFOML RXKXRJFFUJ ZLPWCFWKCYJ

FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD

Page 15: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Generated Language:Effects of N

1. Zero-order approximation:XFOML RXKXRJFFUJ ZLPWCFWKCYJ

FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD

2. First-order approximation:OCRO HLI RGWR NWIELWIS EU LL NBNESEBYA

TH EEI ALHENHTTPA OOBTTVA NAH RBL

Page 16: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Generated Language:Effects of N

1. Zero-order approximation:XFOML RXKXRJFFUJ ZLPWCFWKCYJ

FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD

2. First-order approximation:OCRO HLI RGWR NWIELWIS EU LL NBNESEBYA

TH EEI ALHENHTTPA OOBTTVA NAH RBL

3. Second-order approximation:ON IE ANTSOUTINYS ARE T INCTORE ST BE S

DEAMY ACHIND ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE

Page 17: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Word Models: Effects of N1. First-order approximation:

REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE

Page 18: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Word Models: Effects of N1. First-order approximation:

REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE

2. Second-order approximation:THE HEAD AND IN FRONTAL ATTACK ON AN

ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED

Page 19: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Shakespeare

Page 20: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

The Wall Street Journal is Not Shakespeare

Page 21: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluation

Page 22: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluation - GeneralEvaluation crucial for NLP systems

Required for most publishable results

Should be integrated early

Many factors:

Page 23: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluation - GeneralEvaluation crucial for NLP systems

Required for most publishable results

Should be integrated early

Many factors:Data MetricsPrior results…..

Page 24: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluation GuidelinesEvaluate your system

Use standard metrics

Use (standard) training/dev/test sets

Describing experiments: (Intrinsic vs Extrinsic)

Page 25: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluation GuidelinesEvaluate your system

Use standard metrics

Use (standard) training/dev/test sets

Describing experiments: (Intrinsic vs Extrinsic)Clearly lay out experimental setting

Page 26: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluation GuidelinesEvaluate your system

Use standard metrics

Use (standard) training/dev/test sets

Describing experiments: (Intrinsic vs Extrinsic)Clearly lay out experimental setting Compare to baseline and previous resultsPerform error analysis

Page 27: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluation GuidelinesEvaluate your system

Use standard metrics

Use (standard) training/dev/test sets

Describing experiments: (Intrinsic vs Extrinsic)Clearly lay out experimental setting Compare to baseline and previous resultsPerform error analysisShow utility in real application (ideally)

Page 28: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Data OrganizationTraining:

Training data: used to learn model parameters

Page 29: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Data OrganizationTraining:

Training data: used to learn model parametersHeld-out data: used to tune additional parameters

Page 30: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Data OrganizationTraining:

Training data: used to learn model parametersHeld-out data: used to tune additional parameters

Development (Dev) set:Used to evaluate system during development

Avoid overfitting

Page 31: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Data OrganizationTraining:

Training data: used to learn model parametersHeld-out data: used to tune additional parameters

Development (Dev) set:Used to evaluate system during development

Avoid overfitting

Test data: Used for final, blind evaluation

Page 32: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Data OrganizationTraining:

Training data: used to learn model parameters Held-out data: used to tune additional parameters

Development (Dev) set: Used to evaluate system during development

Avoid overfitting

Test data: Used for final, blind evaluation

Typical division of data: 80/10/10 Tradeoffs Cross-validation

Page 33: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluting LMsExtrinsic evaluation (aka in vivo)

Embed alternate models in systemSee which improves overall application

MT, IR, …

Page 34: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluting LMsExtrinsic evaluation (aka in vivo)

Embed alternate models in systemSee which improves overall application

MT, IR, …

Intrinsic evaluation:Metric applied directly to model

Independent of larger applicationPerplexity

Page 35: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Evaluting LMsExtrinsic evaluation (aka in vivo)

Embed alternate models in systemSee which improves overall application

MT, IR, …

Intrinsic evaluation:Metric applied directly to model

Independent of larger applicationPerplexity

Why not just extrinsic?

Page 36: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity

Page 37: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Intuition:

A better model will have tighter fit to test dataWill yield higher probability on test data

Page 38: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Intuition:

A better model will have tighter fit to test dataWill yield higher probability on test data

Formally,

Page 39: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Intuition:

A better model will have tighter fit to test dataWill yield higher probability on test data

Formally,

Page 40: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Intuition:

A better model will have tighter fit to test dataWill yield higher probability on test data

Formally,

Page 41: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Intuition:

A better model will have tighter fit to test dataWill yield higher probability on test data

Formally,

For bigrams:

Page 42: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Intuition:

A better model will have tighter fit to test dataWill yield higher probability on test data

Formally,

For bigrams:

Inversely related to probability of sequenceHigher probability Lower perplexity

Page 43: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Intuition:

A better model will have tighter fit to test dataWill yield higher probability on test data

Formally,

For bigrams:

Inversely related to probability of sequenceHigher probability Lower perplexity

Can be viewed as average branching factor of model

Page 44: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity ExampleAlphabet: 0,1,…,9

Equiprobable

Page 45: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity ExampleAlphabet: 0,1,…,9;

Equiprobable: P(X)=1/10

Page 46: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity ExampleAlphabet: 0,1,…,9;

Equiprobable: P(X)=1/10

PP(W)=

Page 47: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity ExampleAlphabet: 0,1,…,9;

Equiprobable: P(X)=1/10

PP(W)=

If probability of 0 is higher, PP(W) will be

Page 48: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity ExampleAlphabet: 0,1,…,9;

Equiprobable: P(X)=1/10

PP(W)=

If probability of 0 is higher, PP(W) will be lower

Page 49: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Thinking about PerplexityGiven some vocabulary V with a uniform

distribution I.e. P(w) = 1/|V|

Page 50: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Thinking about PerplexityGiven some vocabulary V with a uniform

distribution I.e. P(w) = 1/|V|

Under a unigram LM, the perplexity is

PP(W) =

Page 51: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Thinking about PerplexityGiven some vocabulary V with a uniform

distribution I.e. P(w) = 1/|V|

Under a unigram LM, the perplexity is

PP(W) =

Page 52: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Thinking about PerplexityGiven some vocabulary V with a uniform

distribution I.e. P(w) = 1/|V|

Under a unigram LM, the perplexity is

PP(W) =

Perplexity is effective branching factor of language

Page 53: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity and Entropy

Given that

Consider the perplexity equation:

PP(W) = P(W)-1/N =

Page 54: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity and Entropy

Given that

Consider the perplexity equation:

PP(W) = P(W)-1/N =

Page 55: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity and Entropy

Given that

Consider the perplexity equation:

PP(W) = P(W)-1/N = = =

Page 56: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity and Entropy

Given that

Consider the perplexity equation:

PP(W) = P(W)-1/N = = = 2H(L,P)

Where H is the entropy of the language L

Page 57: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

EntropyInformation theoretic measure

Measures information in grammar

Conceptually, lower bound on # bits to encode

Page 58: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

EntropyInformation theoretic measure

Measures information in grammar

Conceptually, lower bound on # bits to encode

Entropy: H(X): X is a random var, p: prob fn

)(log)()( 2 xpxpXHXx

Page 59: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

EntropyInformation theoretic measure

Measures information in grammar

Conceptually, lower bound on # bits to encode

Entropy: H(X): X is a random var, p: prob fn

E.g. 8 things: number as code => 3 bits/trans Alt. short code if high prob; longer if lower

Can reduce

)(log)()( 2 xpxpXHXx

Page 60: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing EntropyPicking horses (Cover and Thomas)

Send message: identify horse - 1 of 8If all horses equally likely, p(i)

Page 61: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing EntropyPicking horses (Cover and Thomas)

Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8

Page 62: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing EntropyPicking horses (Cover and Thomas)

Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8

Some horses more likely:1: ½; 2: ¼; 3: 1/8; 4: 1/16; 5,6,7,8: 1/64

8

1

38/1log8/1log8/1)(i

bitsXH

Page 63: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing EntropyPicking horses (Cover and Thomas)

Send message: identify horse - 1 of 8If all horses equally likely, p(i) = 1/8

Some horses more likely:1: ½; 2: ¼; 3: 1/8; 4: 1/16; 5,6,7,8: 1/64

bitsipipXHi

2)(log)()(8

1

8

1

38/1log8/1log8/1)(i

bitsXH

Page 64: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Entropy of a SequenceBasic sequence

Entropy of language: infinite lengthsAssume stationary & ergodic

)(log)(1

)(1

1211

1

n

LW

nn WpWpn

WHn n

),...,(log1

lim)(

),...,(log),...,(1

lim)(

1

11

nn

nLW

nn

wwpn

LH

wwpwwpn

LH

Page 65: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing P(s): s is a sentence

Let s = w1w2….wn

Assume a bigram model

P(s) = P(w1w2…wn) = P(BOS w1w2….wnEOS)

Page 66: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing P(s): s is a sentence

Let s = w1w2….wn

Assume a bigram model

P(s) = P(w1w2…wn) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|w1)*…*P(wn|wn-1)*P(EOS|wn)

Out-of-vocabulary words (OOV): If n-gram contains OOV word,

Page 67: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing P(s): s is a sentence

Let s = w1w2….wn

Assume a bigram model

P(s) = P(w1w2…wn) = P(BOS w1w2….wnEOS)

~ P(BOS)*P(w1|BOS)*P(w2|w1)*…*P(wn|wn-1)*P(EOS|wn)

Out-of-vocabulary words (OOV): If n-gram contains OOV word,

Remove n-gram from computationIncrement oov_count

N

Page 68: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing P(s): s is a sentence

Let s = w1w2….wn

Assume a trigram model

P(s) = P(w1w2…wn) = P(BOS w1w2….wnEOS)

~P(w1|BOS)*P(w2|w1BOS)*…*P(wn|wn-2 wn-1)*P(EOS|wn-1wn)

Out-of-vocabulary words (OOV): If n-gram contains OOV word,

Remove n-gram from computation Increment oov_count

N =sent_leng + 1 – oov_count

Page 69: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing PerplexityPP(W) =

Where W is a set of m sentences: s1,s2,…,sm

log P(W)

Page 70: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing PerplexityPP(W) =

Where W is a set of m sentences: s1,s2,…,sm

log P(W) =

Page 71: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing PerplexityPP(W) =

Where W is a set of m sentences: s1,s2,…,sm

log P(W) =

N

Page 72: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Computing PerplexityPP(W) =

Where W is a set of m sentences: s1,s2,…,sm

log P(W) =

N = word_count + sent_count – oov_count

Page 73: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Perplexity Model Comparison

Compare models with different history

Page 74: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Homework #4

Page 75: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Building Language ModelsStep 1: Count ngrams

Step 2: Build model – Compute probabilitiesMLESmoothed: Laplace, GT

Step 3: Compute perplexity

Steps 2 & 3 depend on model/smoothing choices

Page 76: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Q1: Counting N-gramsCollect real counts from the training data:

ngram_count.* training_data ngram_count_file

Output ngrams and real count c(w1), c(w1, w2), and c(w1, w2, w3).

Given a sentence: John called Mary Insert BOS and EOS: <s> John called Mary </s>

Page 77: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Q1: OutputCount key

875 a…200 the book…20 thank you very

In “chunks” – unigrams, then bigrams, then trigrams

Sort in decreasing order of count within chunk

Page 78: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Q2: Create Language Model

build_lm.* ngram_count_file lm_fileStore the logprob of ngrams and other parameters in

the lm

There are actually three language models: P(w3), P(w3|w2) and P(w3|w1,w2)The output file is in a modified ARPA format (see

next slide)Lines for n-grams are sorted by n-gram counts

Page 79: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Modified ARPA Format\data\

ngram 1: type = xx; token = yy

ngram 2: type = xx; token = yy

ngram 3: type = xx; token = yy

\1-grams:

count prob logprob w1

\2-grams:

count prob logprob w1 w2

\3-grams:

count prob logprob w1 w2 w3

# xx: is type count

# yy: is token count

# prob is P(w)

# prob is P(w2|w1)

#count in C(w1w2)

Page 80: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Q3: Calculating Perplexitypp.* lm_file n test_file outfile

Compute perplexity for n-gram history given model

sum=0; count=0;

for each s in test_file: if n-gram of history n exists

Compute P(wi|…wi-n+1)sum += log_2 P(wi…)count ++

total = -sum/count

pp(test_file) = 2total

Page 81: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Output format Sent #1: <s> Influential members of the House … </s>

1: log P(Influential | <s>) = -inf(unknown word)

2: log P(members | <s> Influential) = -inf (unseen ngrams)

4: log P(the | members of) = -0.673243382588536

1 sentence, 38 words, 9 OOVs

logprob=-82.8860891791949 ppl=721.341645452964

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

sent_num=50 word_num=1175 oov_num=190

logprob=-2854.78157013778 ave_logprob=-2.75824306293506 pp=573.116699237283

Page 82: Language Models & Smoothing Shallow Processing Techniques for NLP Ling570 October 19, 2011.

Q4: Compute PerplexityCompute perplexity for different n