Top Banner
Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA
25

Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Dec 17, 2015

Download

Documents

Pierce Parsons
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Naïve Bayes Classifier

1

Adopted from slides by Ke Chen from University of Manchester and

YangQiu Song from MSRA

Page 2: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Generative vs. Discriminative ClassifiersTraining classifiers involves estimating f: X Y, or P(Y|X)

Discriminative classifiers (also called ‘informative’ by Rubinstein&Hastie):

1. Assume some functional form for P(Y|X)

2. Estimate parameters of P(Y|X) directly from training data

Generative classifiers

1. Assume some functional form for P(X|Y), P(X)

2. Estimate parameters of P(X|Y), P(X) directly from training data

3. Use Bayes rule to calculate P(Y|X= xi)

Page 3: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Bayes Formula

Page 4: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Generative Model

• Color• Size• Texture• Weight• …

Page 5: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Discriminative Model

• Logistic Regression

• Color• Size• Texture• Weight• …

Page 6: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Comparison

• Generative models– Assume some functional form for P(X|Y), P(Y)– Estimate parameters of P(X|Y), P(Y) directly from

training data– Use Bayes rule to calculate P(Y|X= x)

• Discriminative models– Directly assume some functional form for P(Y|X)– Estimate parameters of P(Y|X) directly from

training data

Page 7: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Probability Basics

7

• Prior, conditional and joint probability for random variables– Prior probability:

– Conditional probability:

– Joint probability:

– Relationship:

– Independence:

• Bayesian Rule

)| ,)( 121 XP(XX|XP 2

)()()(

)(X

XX

PCPC|P

|CP

)(XP

) )( ),,( 22 ,XP(XPXX 11 XX

)()|()()|() 2211122 XPXXPXPXXP,XP(X1

)()() ),()|( ),()|( 212121212 XPXP,XP(XXPXXPXPXXP 1

EvidencePriorLikelihood

Posterior

Page 8: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Probability Basics

8

• Quiz: We have two six-sided dice. When they are tolled, it could end up with the following occurance: (A) dice 1 lands on side “3”, (B) dice 2 lands on side “1”, and (C) Two dice sum to eight. Answer the following questions:

? equals ),( Is 8)

?),( 7)

?),( 6)

?)|( 5)

?)|( 4)

? 3)

? 2)

? )( )1

P(C)P(A)CAP

CAP

BAP

ACP

BAP

P(C)

P(B)

AP

Page 9: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Probabilistic Classification

9

• Establishing a probabilistic model for classification– Discriminative model ),, , )( 1 n1L X(Xc,,cC|CP XX

),,,( 21 nxxx x

Discriminative Probabilistic Classifier

1x 2x nx

)|( 1 xcP )|( 2 xcP )|( xLcP

Page 10: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Probabilistic Classification

10

• Establishing a probabilistic model for classification (cont.)– Generative model

),, , )( 1 n1L X(Xc,,cCC|P XX

GenerativeProbabilistic Model

for Class 1

)|( 1cP x

1x 2x nx

GenerativeProbabilistic Model

for Class 2

)|( 2cP x

1x 2x nx

GenerativeProbabilistic Model

for Class L

)|( LcP x

1x 2x nx

),,,( 21 nxxx x

Page 11: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Probabilistic Classification

11

• MAP classification rule– MAP: Maximum A Posterior

– Assign x to c* if

• Generative classification with the MAP rule– Apply Bayesian rule to convert them into posterior

probabilities

– Then apply the MAP rule

Lc,,cccc|cCP|cCP 1** , )( )( xXxX

Li

cCPcC|P

PcCPcC|P

|cCP

ii

iii

,,2,1 for

)()(

)()()(

)(

xX

xXxX

xX

Page 12: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Naïve Bayes

12

• Bayes classification

Difficulty: learning the joint probability

• Naïve Bayes classification– Assumption that all input attributes are conditionally

independent!

– MAP classification rule: for

)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX

)|,,( 1 CXXP n

)|()|()|(

)|,,()|(

)|,,();,,|()|,,,(

21

21

22121

CXPCXPCXP

CXXPCXP

CXXPCXXXPCXXXP

n

n

nnn

Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*

1***

1

),,,( 21 nxxx x

Page 13: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Naïve Bayes

13

• Naïve Bayes Algorithm (for discrete input attributes)– Learning Phase: Given a training set S,

Output: conditional probability tables; for

elements

– Test Phase: Given an unknown instance ,

Look up tables to assign the label c* to X’ if

; in examples with)|( estimate)|(̂

),1 ;,,1( attribute each of value attribute every For

; in examples with)( estimate)(̂

of value target each For 1

S

S

ijkjijkj

jjjk

ii

Lii

cCxXPcCxXP

N,knj Xx

cCPcCP

)c,,c(c c

Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*

1***

1

),,( 1 naa X

LNX jj ,

Page 14: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Example

14

• Example: Play Tennis

Page 15: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Example

15

• Learning Phase

Outlook Play=Yes

Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature

Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes

Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes

Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14P(Play=No) = 5/14

Page 16: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Example

16

• Test Phase– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,

Wind=Strong)

– Look up tables

– MAP rule

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|

Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be

“No”.

Page 17: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Example

17

• Test Phase– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,

Wind=Strong)

– Look up tables

– MAP rule

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|

Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be

“No”.

Page 18: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Relevant Issues

18

• Violation of Independence Assumption– For many real world tasks,

– Nevertheless, naïve Bayes works surprisingly well anyway!

• Zero conditional probability Problem– If no example contains the attribute value

– In this circumstance, during test

– For a remedy, conditional probabilities estimated with

)|()|( )|,,( 11 CXPCXPCXXP nn

0)|(̂ , ijkjjkj cCaXPaX

0)|(̂)|(̂)|(̂ 1 inijki cxPcaPcxP

)1 examples, virtual"" of (number prior to weight:

) of values possible for /1 (usually, estimate prior :

whichfor examples training of number :

C and whichfor examples training of number :

)|(̂

mm

Xttpp

cCn

caXnmnmpn

cCaXP

j

i

ijkjc

cijkj

Page 19: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Relevant Issues

19

• Continuous-valued Input Attributes– Numberless values for an attribute

– Conditional probability modeled with the normal distribution

– Learning Phase: Output: normal distributions and

– Test Phase:• Calculate conditional probabilities with all the normal

distributions• Apply the MAP rule to make a decision

ijji

ijji

ji

jij

jiij

cC

cX

XcCXP

whichfor examples of X values attribute of deviation standard :

C whichfor examples of values attribute of (avearage) mean :

2

)(exp

2

1)|(̂ 2

2

Ln ccCXX ,, ),,,( for 11 XLn

),,( for 1 nXX X

LicCP i ,,1 )(

Page 20: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Conclusions• Naïve Bayes based on the independence assumption

– Training is very easy and fast; just requiring considering each attribute in each class separately

– Test is straightforward; just looking up tables or calculating conditional probabilities with normal distributions

• A popular generative model– Performance competitive to most of state-of-the-art classifiers

even in presence of violating independence assumption– Many successful applications, e.g., spam mail filtering– A good candidate of a base learner in ensemble learning– Apart from classification, naïve Bayes can do more…

20

Page 21: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Extra Slides

21

Page 22: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Naïve Bayes (1)• Revisit

• Which is equal to

• Naïve Bayes assumes conditional independency

• Then the inference of posterior is

Page 23: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Naïve Bayes (2)• Training: Observation is multinomial; Supervised, with label information– Maximum Likelihood Estimation (MLE)

– Maximum a Posteriori (MAP): put Dirichlet prior

• Classification

Page 24: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Naïve Bayes (3)• What if we have continuous Xi?

• Generative training

• Prediction

Page 25: Naïve Bayes Classifier 1 Adopted from slides by Ke Chen from University of Manchester and YangQiu Song from MSRA.

Naïve Bayes (4)

• Problems– Features may overlapped– Features may not be independent

• Size and weight of tiger– Use a joint distribution estimation (P(X|Y), P(Y)) to solve a

conditional problem (P(Y|X= x))• Can we discriminatively train?

– Logistic regression – Regularization– Gradient ascent