Top Banner
Probabilistic inference Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidence E = e Partially observable, stochastic, episodic environment Examples: X = {spam, not spam}, e = email message X = {zebra, giraffe, hippo}, e = image features Bayes decision theory: The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise
24

Probabilistic inference

Feb 22, 2016

Download

Documents

nitara

Probabilistic inference. Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidence E = e Partially observable, stochastic, episodic environment - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probabilistic inference

Probabilistic inference• Suppose the agent has to make a decision about

the value of an unobserved query variable X given some observed evidence E = e – Partially observable, stochastic, episodic environment– Examples: X = {spam, not spam}, e = email message

X = {zebra, giraffe, hippo}, e = image features• Bayes decision theory:

– The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise

– The estimate of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)

– This is the Maximum a Posteriori (MAP) decision

Page 2: Probabilistic inference

MAP decision• Value of x that has the highest posterior

probability given the evidence e

)()|(maxarg)(

)()|()|(maxarg*

xPxePePxPxePexPx

x

x

Page 3: Probabilistic inference

MAP decision• Value of x that has the highest posterior

probability given the evidence e

)()|(maxarg)(

)()|()|(maxarg*

xPxePePxPxePexPx

x

x

)()|()|( xPxePexP likelihood priorposterior

Page 4: Probabilistic inference

MAP decision• Value of x that has the highest posterior

probability given the evidence e

• Maximum likelihood (ML) decision:

)()|(maxarg)(

)()|()|(maxarg*

xPxePePxPxePexPx

x

x

)()|()|( xPxePexP likelihood priorposterior

)|(maxarg* xePx x

Page 5: Probabilistic inference

Example: Naïve Bayes model• Suppose we have many different types of observations

(symptoms, features) F1, …, Fn that we want to use to obtain evidence about an underlying hypothesis H

• MAP decision involves estimating

– If each feature can take on k values, how many entries are in the joint probability table?

)()|,,(),,|( 11 HPHFFPFFHP nn

Page 6: Probabilistic inference

Example: Naïve Bayes model• Suppose we have many different types of observations

(symptoms, features) F1, …, Fn that we want to use to obtain evidence about an underlying hypothesis H

• MAP decision involves estimating

• We can make the simplifying assumption that the different features are conditionally independent given the hypothesis:

– If each feature can take on k values, what is the complexity of storing the resulting distributions?

)()|,,(),,|( 11 HPHFFPFFHP nn

n

iin HFPHFFP

11 )|()|,,(

Page 7: Probabilistic inference

Naïve Bayes Spam Filter• MAP decision: to minimize the probability of error, we should

classify a message as spam if P(spam | message) > P(¬spam | message)

Page 8: Probabilistic inference

Naïve Bayes Spam Filter• MAP decision: to minimize the probability of error, we should

classify a message as spam if P(spam | message) > P(¬spam | message)

• We have P(spam | message) P(message | spam)P(spam) and ¬P(spam | message) P(message | ¬spam)P(¬spam)

Page 9: Probabilistic inference

Naïve Bayes Spam Filter• We need to find P(message | spam) P(spam) and

P(message | ¬spam) P(¬spam)• The message is a sequence of words (w1, …, wn) • Bag of words representation

– The order of the words in the message is not important– Each word is conditionally independent of the others given

message class (spam or not spam)

Page 10: Probabilistic inference

Naïve Bayes Spam Filter• We need to find P(message | spam) P(spam) and

P(message | ¬spam) P(¬spam)• The message is a sequence of words (w1, …, wn) • Bag of words representation

– The order of the words in the message is not important– Each word is conditionally independent of the others given

message class (spam or not spam)

• Our filter will classify the message as spam if

n

iin spamwPspamwwPspammessageP

11 )|()|,,()|(

n

ii

n

ii spamwPspamPspamwPspamP

11

)|()()|()(

Page 11: Probabilistic inference

Bag of words illustration

US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

Page 12: Probabilistic inference

Bag of words illustration

US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

Page 13: Probabilistic inference

Bag of words illustration

US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

Page 14: Probabilistic inference

Naïve Bayes Spam Filter

n

iin spamwPspamPwwspamP

11 )|()(),,|(

prior likelihoodposterior

Page 15: Probabilistic inference

Parameter estimation• In order to classify a message, we need to know the prior

P(spam) and the likelihoods P(word | spam) and P(word | ¬spam)– These are the parameters of the probabilistic model– How do we obtain the values of these parameters?

spam: 0.33¬spam: 0.67

P(word | ¬spam)P(word | spam)prior

Page 16: Probabilistic inference

Parameter estimation• How do we obtain the prior P(spam) and the likelihoods

P(word | spam) and P(word | ¬spam)?– Empirically: use training data

– This is the maximum likelihood (ML) estimate, or estimate that maximizes the likelihood of the training data:

P(word | spam) =# of word occurrences in spam messages

total # of words in spam messages

D

d

n

iidid

d

classwP1 1

,, )|(

d: index of training document, i: index of a word

Page 17: Probabilistic inference

Parameter estimation• How do we obtain the prior P(spam) and the likelihoods

P(word | spam) and P(word | ¬spam)?– Empirically: use training data

• Parameter smoothing: dealing with words that were never seen or seen too few times– Laplacian smoothing: pretend you have seen every vocabulary word

one more time than you actually did

P(word | spam) =# of word occurrences in spam messages

total # of words in spam messages

Page 18: Probabilistic inference

Summary of model and parameters• Naïve Bayes model:

• Model parameters:

n

ii

n

ii

spamwPspamPmessagespamP

spamwPspamPmessagespamP

1

1

)|()()|(

)|()()|(

P(spam)

P(¬spam)

P(w1 | spam)

P(w2 | spam)

P(wn | spam)

P(w1 | ¬spam)

P(w2 | ¬spam)

P(wn | ¬spam)

Likelihoodof spam

prior

Likelihoodof ¬spam

Page 19: Probabilistic inference

Bag-of-word models for images

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Page 20: Probabilistic inference

Bag-of-word models for images1. Extract image features

Page 21: Probabilistic inference

Bag-of-word models for images1. Extract image features

Page 22: Probabilistic inference

1. Extract image features2. Learn “visual vocabulary”

Bag-of-word models for images

Page 23: Probabilistic inference

1. Extract image features2. Learn “visual vocabulary”3. Map image features to visual words

Bag-of-word models for images

Page 24: Probabilistic inference

Bayesian decision making: Summary

• Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E

• Inference problem: given some evidence E = e, what is P(X | e)?

• Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x1,e1), …, (xn,en)}