Top Banner
Natural Language Natural Language Processing Processing Giuseppe Attardi Giuseppe Attardi Introduction to Probability Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Kl
29

Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Jan 20, 2016

Download

Documents

Myra Caldwell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Natural Language Natural Language ProcessingProcessing

Giuseppe AttardiGiuseppe Attardi

Introduction to ProbabilityIntroduction to Probability

IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein

Page 2: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

OutlineOutline

ProbabilityProbability Basic probability Conditional probability

Page 3: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

1. Introduction to 1. Introduction to ProbabilityProbability Experiment (trial)Experiment (trial)

Repeatable procedure with well-defined possible outcomes

Sample Space (S)Sample Space (S)• the set of all possible outcomes • finite or infinite

Example• coin toss experiment• possible outcomes: S = {heads, tails}

Example• die toss experiment• possible outcomes: S = {1,2,3,4,5,6}

Slides from Sandiway Fong

Page 4: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Introduction to ProbabilityIntroduction to Probability

Definition of sample space depends on what we Definition of sample space depends on what we are askingare asking Sample Space (S): the set of all possible outcomes Example

• die toss experiment for whether the number is even or odd• possible outcomes: {even,odd} • not {1,2,3,4,5,6}

Page 5: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

More definitionsMore definitions

EventsEvents an event is any subset of outcomes from the sample space

ExampleExample die toss experiment let A represent the event such that the outcome of the die toss

experiment is divisible by 3 A = {3,6} A is a subset of the sample space S= {1,2,3,4,5,6}

ExampleExample Draw a card from a deck

• suppose sample space S = {heart,spade,club,diamond} (four suits) let A represent the event of drawing a heart let B represent the event of drawing a red card A = {heart} B = {heart,diamond}

Page 6: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Introduction to ProbabilityIntroduction to Probability

Some definitionsSome definitions Counting

• suppose operation oi can be performed in ni ways, then

• a sequence of k operations o1o2...ok • can be performed in n1 n2 ... nk ways

Example• die toss experiment, 6 possible outcomes• two dice are thrown at the same time• number of sample points in sample space = 6 6 =

36

Page 7: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Definition of ProbabilityDefinition of Probability

The probability law assigns to an event a The probability law assigns to an event a nonnegative numbernonnegative number

Called Called PP(A)(A) Also called the probability Also called the probability AA That encodes our knowledge or belief about the That encodes our knowledge or belief about the

collective likelihood of all the elements of collective likelihood of all the elements of AA Probability law must satisfy certain propertiesProbability law must satisfy certain properties

Page 8: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Probability AxiomsProbability Axioms

NonnegativityNonnegativity P(A) 0, for every event A

AdditivityAdditivity If A and B are two disjoint events, then the

probability of their union (either one or the other occurs) satisfies:

P(A B) = P(A) + P(B) MonotonicityMonotonicity

P(A) P(B) for any A B NormalizationNormalization

The probability of the entire sample space S is equal to 1, i.e. P(S) = 1

A A BB = = A A BB = =

Page 9: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

An exampleAn example

An experiment involving a single coin tossAn experiment involving a single coin toss There are two possible outcomes, There are two possible outcomes, HH and and TT Sample space S is Sample space S is {H,T}{H,T} If coin is fair, should assign equal probabilities to If coin is fair, should assign equal probabilities to

2 outcomes2 outcomes Since they have to sum to 1Since they have to sum to 1

PP({H}) = 0.5({H}) = 0.5

PP({T}) = 0.5({T}) = 0.5

PP({H,T}) = ({H,T}) = PP({H}) + ({H}) + PP({T}) = 1.0 ({T}) = 1.0

Page 10: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Another exampleAnother example

Experiment involving 3 coin tossesExperiment involving 3 coin tosses Outcome is a 3-long string of Outcome is a 3-long string of HH or or TT

S ={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} Assume each outcome is equiprobableAssume each outcome is equiprobable

“Uniform distribution” What is probability of the event that exactly 2 heads What is probability of the event that exactly 2 heads

occur?occur?AA = {HHT, HTH, THH} = {HHT, HTH, THH}PP((AA) = ) = PP({HHT})+({HHT})+PP({HTH})+({HTH})+PP({THH})({THH})

= 1/8 + 1/8 + 1/8= 1/8 + 1/8 + 1/8=3/8=3/8

Page 11: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Probability definitionsProbability definitions

In summary:In summary:

Probability of drawing a spade from 52 well-Probability of drawing a spade from 52 well-shuffled playing cards:shuffled playing cards:

25.04

1

52

13

outcomesofnumbertotal

EeventtoingcorrespondoutcomesofnumberEP )(

Page 12: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Probabilities of two eventsProbabilities of two events

If two events A and B are If two events A and B are independentindependent i.e. P(B) is the same whether P(A) occurredi.e. P(B) is the same whether P(A) occurred

ThenThen PP((AA and and BB) = ) = PP((AA) ) ·· PP((BB))

Flip a fair coin twiceFlip a fair coin twice What is the probability that they are both heads?What is the probability that they are both heads?

Draw a card from a deck, then Draw a card from a deck, then put it backput it back, draw a , draw a card from the deck againcard from the deck again What is the probability that both drawn cards are What is the probability that both drawn cards are

hearts?hearts?

Page 13: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

How about non-uniform How about non-uniform probabilities? An exampleprobabilities? An example A biased coin,A biased coin,

twice as likely to come up tails as heads, is tossed twice

What is the probability that What is the probability that at least one head at least one head occurs?occurs?

Sample space = {hh, ht, th, tt} (h = heads, t = tails)Sample space = {hh, ht, th, tt} (h = heads, t = tails) Sample points/probability for the event:Sample points/probability for the event:

ht 1/3 x 2/3 = 2/9 hh 1/3 x 1/3= 1/9 th 2/3 x 1/3 = 2/9 tt 2/3 x 2/3 = 4/9

Answer: 5/9 = Answer: 5/9 = 0.56 (0.56 (sum of weights in sum of weights in redred)) = 1 - 4/9 (prob. of complement)= 1 - 4/9 (prob. of complement)

Page 14: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Computing ProbabilitiesComputing Probabilities

Direct counts (when outcomes are equally Direct counts (when outcomes are equally probable)probable)

Sum of union of disjoint eventsSum of union of disjoint events P(A or B) = P(A) + P(B)

Product of multiple independent eventsProduct of multiple independent events P(A and B) = P(A) ·· P(B)

Indirect probability:Indirect probability: P(A) = 1 – P(S – A)

S

AAP

#

#)(

Page 15: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Moving toward languageMoving toward language

What’s the probability of drawing a 2 from a deck What’s the probability of drawing a 2 from a deck of 52 cards with four 2s?of 52 cards with four 2s?

What’s the probability of a random word (from a What’s the probability of a random word (from a random dictionary page) being a verb?random dictionary page) being a verb?

P(drawing a two) 4

52

1

13.077

P(drawing a verb) #of ways to get a verb

all words

Page 16: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Probability and part of Probability and part of speech tagsspeech tags What’s the probability of a random word (from a What’s the probability of a random word (from a

random dictionary page) being a verb?random dictionary page) being a verb?

How to compute each of theseHow to compute each of these All words = just count all the words in the dictionaryAll words = just count all the words in the dictionary # of ways to get a verb: number of words which are verbs!# of ways to get a verb: number of words which are verbs! If a dictionary has 50,000 entries, and 10,000 are verbs…. If a dictionary has 50,000 entries, and 10,000 are verbs….

PP(V)(V) is is 10000/50000 = 1/5 = .2010000/50000 = 1/5 = .20

P(drawing a verb) #of ways to get a verb

all words

Page 17: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Conditional ProbabilityConditional Probability

A way to reason about the outcome of an A way to reason about the outcome of an experiment based on partial informationexperiment based on partial information In a word guessing game the first letter for the word is

a “t”. What is the likelihood that the second letter is an “h”?

How likely is it that a person has a disease given that a medical test was negative?

A spot shows up on a radar screen. How likely is it that it corresponds to an aircraft?

Page 18: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

More preciselyMore precisely

Given an experiment, a corresponding sample Given an experiment, a corresponding sample space space SS, and a probability law, and a probability law

Suppose we know that the outcome is within Suppose we know that the outcome is within some given event some given event BB

We want to quantify the likelihood that the We want to quantify the likelihood that the outcome also belongs to some other given event outcome also belongs to some other given event AA

We need a new probability law that gives us the We need a new probability law that gives us the conditional probability of conditional probability of AA given given BBP(A|B)

Page 19: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

An intuitionAn intuition

AA is “it’s raining now” is “it’s raining now” PP((AA)) in Tuscany is .01 in Tuscany is .01 BB is “it was raining ten minutes ago” is “it was raining ten minutes ago”

PP((AA||BB)) means “what is the probability of it raining now if means “what is the probability of it raining now if it was raining 10 minutes ago”it was raining 10 minutes ago”

PP((AA||BB)) is probably way higher than is probably way higher than PP(A)(A) Perhaps Perhaps PP((AA||BB)) is .10 is .10

Intuition: The knowledge about Intuition: The knowledge about BB should change our should change our estimate of the probability of estimate of the probability of AA..

Page 20: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Conditional probabilityConditional probability

One of the following 30 items is chosen at One of the following 30 items is chosen at randomrandom

What is What is PP(X)(X), the probability that it is an , the probability that it is an XX? ? What is What is PP(X|red)(X|red), the probability that it is an , the probability that it is an XX

given that it is red? given that it is red?

O X X X O O

O X X O X O

O O O X O X

O O O O X O

O X X X X O

Page 21: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

S

Conditional ProbabilityConditional Probability

let let AA and and BB be events be events PP((BB||AA)) = the = the probabilityprobability of event of event BB occurring givenoccurring given event event

AA occurredoccurred definition:definition: PP((BB||AA) = ) = PP((AA BB) / ) / PP((AA))

A B

Page 22: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Conditional ProbabilityConditional Probability

Note: P(A,B) = P(B|A) · P(A)also: P(A,B) = P(B,A)hence: P(B|A) · P(A) = P(A|B) · P(B)hence: …

A BA,B

)(

),(

)(

)()|(

AP

BAP

AP

BAPABP

Page 23: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Bayes’ TheoremBayes’ Theorem

PP((BB)): prior probability: prior probability PP((BB||AA)): posterior probability: posterior probability

)(

)()|()|(

AP

BPBAPABP

Page 24: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

IndependenceIndependence

What is What is PP((AA, , BB)) if if AA and and BB are independent? are independent?

PP((AA,,BB) = ) = PP((AA) · ) · PP((BB)) iffiff AA, , BB independent. independent.

PP((headsheads, , tailstails) = ) = PP((headsheads) · ) · PP((tailstails) = .5 · .5 = .25) = .5 · .5 = .25

Note: Note: PP((A|BA|B) ) = P= P((AA)) iff iff AA, , BB independent independentAlso: Also: PP((B|AB|A) ) = P= P((BB)) iff iff AA, , BB independent independent

Page 25: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Independent EventsIndependent Events

PP((AA) = ) = PP((AA||BB)) 25/100 = 15/60

PP((AA BB) = ) = PP((AA) ) •• PP((BB)) 15/100 = 25/100 • 60/100

S

A=25 B=60

15

Page 26: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Monty Hall ProblemMonty Hall Problem

The contestant is shown three doors.The contestant is shown three doors. Two of the doors have goats behind them Two of the doors have goats behind them

and one has a car.and one has a car. The contestant chooses a door.The contestant chooses a door. Before opening the chosen door, Monty Hall Before opening the chosen door, Monty Hall

opens a door that has a goat behind it.opens a door that has a goat behind it. The contestant can then switch to the other The contestant can then switch to the other

unopened door, or stay with the original unopened door, or stay with the original choice.choice.

Which is best?Which is best?

Page 27: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

SolutionSolution

Consider the sample space: door Car, A, BConsider the sample space: door Car, A, B There are three options:There are three options:

1.1. Contestant chooses Car. If she changes, she Contestant chooses Car. If she changes, she loses; if she stays, she winsloses; if she stays, she wins

2.2. Contestant chooses A with goat. If she Contestant chooses A with goat. If she switches, she wins; otherwise she loses.switches, she wins; otherwise she loses.

3.3. Contestant chooses B with goat. If she Contestant chooses B with goat. If she switches, she wins; otherwise she loses.switches, she wins; otherwise she loses.

Switching gives 2/3 chances of winningSwitching gives 2/3 chances of winning

Page 28: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

SummarySummary

ProbabilityProbabilityConditional ProbabilityConditional Probability IndependenceIndependence

Page 29: Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Additional MaterialAdditional Material

http://onlinestatbook.com/chapter5/probability.html