NaΓ―ve Bayes 𝑖 𝜢 - Kangwoncs.kangwon.ac.kr/.../2015_MachineLearning/07_naive_bayes.pdfΒ Β· 2016. 6. 17.Β Β· NaΓ―ve Bayes β€’Bayes ruleμ„μ μš©ν•˜λ©΄λͺ¨λ“ λ°μ΄ν„°μ—λŒ€ν•˜μ—¬κ³ λ €ν•΄μ•Όν•¨

Post on 05-Oct-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

π‘ π‘–π‘”π‘šπ‘Ž 𝜢

Machine Learning

π‘ π‘–π‘”π‘šπ‘Ž 𝜢

2015.08.01.

NaΓ―ve Bayes

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 2

Probability Basics

β€’ Prior, conditional and joint probability for random variables

β€’ Prior probability: 𝑃(𝑋)

β€’ Conditional probability: 𝑃 𝑋1 𝑋2 , 𝑃(𝑋2|𝑋1)

β€’ Joint probability: 𝑿 = 𝑋1, 𝑋2 , 𝑃 𝑿 = 𝑃(𝑋1, 𝑋2)

β€’ Relationship: 𝑃 𝑋1, 𝑋2 = 𝑃 𝑋2 𝑋1 𝑃 𝑋1 = 𝑃 𝑋1 𝑋2 𝑃(𝑋2)

β€’ Independence: 𝑃 𝑋2|𝑋1 = 𝑃 𝑋2 , 𝑃 𝑋1|𝑋2 = 𝑃 𝑋1 ,

𝑃 𝑋1, 𝑋2 = 𝑃 𝑋1 𝑃(𝑋2)

β€’ Bayesian Rule

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 3

Probabilistic Classification

β€’ Establishing a probabilistic model for classification

β€’ Discriminative model

),, , )( 1 n1L X(Xc,,cC|CP XX

),,,( 21 nxxx x

Discriminative

Probabilistic Classifier

1x 2x nx

)|( 1 xcP )|( 2 xcP )|( xLcP

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 4

Probabilistic Classification

β€’ Establishing a probabilistic model for classification (cont.)

β€’ Generative model

β€’ Dataλ“€μ˜ νŒ¨ν„΄μœΌλ‘œ λΆ„λ₯˜

β€’ Label이 μ£Όμ–΄μ‘Œμ„ λ•Œ data듀을 확인 data와 label 관계 νŒŒμ•…

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 5

Bayes`s Theorem

β€’ Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on conditions that might be related to the event.

β€’ 두 ν™•λ₯  λ³€μˆ˜μ˜ 사전 ν™•λ₯ κ³Ό 사후 ν™•λ₯  μ‚¬μ΄μ˜ 관계λ₯Ό λ‚˜νƒ€λƒ„

β€’ μƒˆλ‘œμš΄ κ·Όκ±°κ°€ μ œμ‹œλ  λ•Œ 사후 ν™•λ₯ μ΄ μ–΄λ–»κ²Œ 갱신될지 ꡬ함

β€’ 𝑃 𝐴 = π‘π‘Ÿπ‘–π‘œπ‘Ÿ π‘π‘Ÿπ‘œπ‘π‘Žπ‘π‘–π‘™π‘–π‘‘π‘¦ π‘œπ‘“ β„Žπ‘¦π‘π‘œπ‘‘β„Žπ‘’π‘ π‘–π‘  𝑨

β€’ 𝑃 𝐡 = π‘π‘Ÿπ‘–π‘œπ‘Ÿ π‘π‘Ÿπ‘œπ‘π‘Žπ‘π‘–π‘™π‘–π‘‘π‘¦ π‘œπ‘“ π‘‘π‘Ÿπ‘Žπ‘–π‘›π‘–π‘›π‘” π‘‘π‘Žπ‘‘π‘Ž 𝑩

β€’ 𝑃 𝐴 𝐡 = π‘π‘Ÿπ‘œπ‘π‘Žπ‘π‘–π‘™π‘–π‘‘π‘¦ π‘œπ‘“ 𝑨 𝑔𝑖𝑣𝑒𝑛 𝑩

β€’ 𝑃 𝐡 𝐴 = π‘π‘Ÿπ‘œπ‘π‘Žπ‘π‘–π‘™π‘–π‘‘π‘¦ π‘œπ‘“ 𝑩 𝑔𝑖𝑣𝑒𝑛 𝑨

𝑷 𝑨 𝑩 =𝑷(𝑩|𝑨)𝑷 𝑨

𝑷 𝑩

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 6

Bayes`s Theorem

β€’ MAP classification ruleβ€’ MAP: Maximum A Posterior

β€’ Assign π‘₯ to π‘βˆ— if

𝑃 𝐢 = π‘βˆ— 𝑋 = π‘₯ > 𝑃 𝐢 = 𝑐 𝑋 = π‘₯ 𝑐 β‰  π‘βˆ—, 𝑐 = 𝑐1, … , 𝑐𝐿

β€’ Generative classification with the MAP ruleβ€’ Apply Bayesian rule

𝑃 𝐢 = 𝑐𝑖 𝑋 = π‘₯ =𝑃 𝑋 = π‘₯ 𝐢 = 𝑐𝑖 𝑃 𝐢 = 𝑐𝑖

𝑃 𝑋 = π‘₯

∝ 𝑃 𝑋 = π‘₯ 𝐢 = 𝑐𝑖 𝑃 𝐢 = 𝑐𝑖 βˆ€ 𝑐𝑖

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 7

NaΓ―ve Bayes

β€’ Bayes rule을 μ μš©ν•˜λ©΄ λͺ¨λ“  데이터에 λŒ€ν•˜μ—¬ κ³ λ €ν•΄μ•Ό 함 learning the joint probability 𝑃(𝑋1, … , 𝑋𝑛|𝐢) : Difficulty

β€’ 10개의 Binary feature 210개의 data

β€’ Thus, assumption that all input features are conditionally independent NaΓ―ve Bayes rule

β€’ 각 μžμ§ˆμ— λŒ€ν•˜μ—¬ 쑰건뢀확λ₯ μ΄ 독립적이라 κ°€μ •

β€’ 쑰건뢀 ν™•λ₯ μ— λŒ€ν•œ 경우의 수: 2𝑛 2𝑛

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 8

NaΓ―ve Bayes

β€’ NaΓ―ve Bayes

β€’ MAP classification rule: π‘₯ = (π‘₯1, π‘₯2, … , π‘₯𝑛)

𝑃 𝑋1, 𝑋2, … , 𝑋𝑛 𝐢 = 𝑃 𝑋1 𝑋2, … , 𝑋𝑛, 𝐢 𝑃(𝑋2, … , 𝑋𝑛|𝐢)

= 𝑃 𝑋1 𝐢 𝑃(𝑋2, … , 𝑋𝑛|𝐢)

= 𝑃 𝑋1 𝐢 𝑃 𝑋2 𝐢 …𝑃(𝑋𝑛|𝐢)

ProbabilityChain rule!

𝑃 π‘₯1 πΆβˆ— …𝑃 π‘₯𝑛 𝑐

βˆ— 𝑃 π‘βˆ— > [𝑃 π‘₯1 𝑐 …𝑃 π‘₯𝑛 𝑐)]𝑃(𝑐),

𝑐 β‰  𝑐^ βˆ— , 𝑐 = 𝑐_1, … , 𝑐_𝐿

=

𝑖

𝑃(𝑋𝑖|𝐢)

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 9

Example

β€’ Example: Play Tennis

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 10

Example

β€’ Learning Phase

Outlook Play=Yes Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 11

Example

β€’ Test Phaseβ€’ Given a new instance, predict its label

x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

β€’ Look up tables achieved in the learning phrase

β€’ Decision making with the MAP rule

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Yes|x’) β‰ˆ [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053

P(No|x’) β‰ˆ [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be β€œNo”.

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 12

References

β€’ NaΓ―ve Bayes Classifier - Ke Chen

β€’ Advanced Algorithm(NaΓ―ve Bayes Classifier) - Leeck

β€’ Machine Learning and Its Applications – Harksoo Kim

β€’ Wikipedia

β€’ http://www.leesanghyun.co.kr/Naive_Bayesian_Classifier

β€’ http://darkpgmr.tistory.com/62

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 13

QA

κ°μ‚¬ν•©λ‹ˆλ‹€.

λ°•μ²œμŒ, λ°•μ°¬λ―Ό, 졜재혁

π‘ π‘–π‘”π‘šπ‘Ž 𝜢 , κ°•μ›λŒ€ν•™κ΅

Email: parkce@kangwon.ac.kr

top related