Machine Learning 2015.08.01. Naรฏve Bayes
๐ ๐๐๐๐ ๐ถ
Machine Learning
๐ ๐๐๐๐ ๐ถ
2015.08.01.
Naรฏve Bayes
๐ ๐๐๐๐ ๐ถ 2
Probability Basics
โข Prior, conditional and joint probability for random variables
โข Prior probability: ๐(๐)
โข Conditional probability: ๐ ๐1 ๐2 , ๐(๐2|๐1)
โข Joint probability: ๐ฟ = ๐1, ๐2 , ๐ ๐ฟ = ๐(๐1, ๐2)
โข Relationship: ๐ ๐1, ๐2 = ๐ ๐2 ๐1 ๐ ๐1 = ๐ ๐1 ๐2 ๐(๐2)
โข Independence: ๐ ๐2|๐1 = ๐ ๐2 , ๐ ๐1|๐2 = ๐ ๐1 ,
๐ ๐1, ๐2 = ๐ ๐1 ๐(๐2)
โข Bayesian Rule
๐ ๐๐๐๐ ๐ถ 3
Probabilistic Classification
โข Establishing a probabilistic model for classification
โข Discriminative model
),, , )( 1 n1L X(Xc,,cC|CP XX
),,,( 21 nxxx x
Discriminative
Probabilistic Classifier
1x 2x nx
)|( 1 xcP )|( 2 xcP )|( xLcP
๐ ๐๐๐๐ ๐ถ 4
Probabilistic Classification
โข Establishing a probabilistic model for classification (cont.)
โข Generative model
โข Data๋ค์ ํจํด์ผ๋ก ๋ถ๋ฅ
โข Label์ด ์ฃผ์ด์ก์ ๋ data๋ค์ ํ์ธ data์ label ๊ด๊ณ ํ์
๐ ๐๐๐๐ ๐ถ 5
Bayes`s Theorem
โข Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on conditions that might be related to the event.
โข ๋ ํ๋ฅ ๋ณ์์ ์ฌ์ ํ๋ฅ ๊ณผ ์ฌํ ํ๋ฅ ์ฌ์ด์ ๊ด๊ณ๋ฅผ ๋ํ๋
โข ์๋ก์ด ๊ทผ๊ฑฐ๊ฐ ์ ์๋ ๋ ์ฌํ ํ๋ฅ ์ด ์ด๋ป๊ฒ ๊ฐฑ์ ๋ ์ง ๊ตฌํจ
โข ๐ ๐ด = ๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐ โ๐ฆ๐๐๐กโ๐๐ ๐๐ ๐จ
โข ๐ ๐ต = ๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐ ๐ก๐๐๐๐๐๐๐ ๐๐๐ก๐ ๐ฉ
โข ๐ ๐ด ๐ต = ๐๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐ ๐จ ๐๐๐ฃ๐๐ ๐ฉ
โข ๐ ๐ต ๐ด = ๐๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐ ๐ฉ ๐๐๐ฃ๐๐ ๐จ
๐ท ๐จ ๐ฉ =๐ท(๐ฉ|๐จ)๐ท ๐จ
๐ท ๐ฉ
๐ ๐๐๐๐ ๐ถ 6
Bayes`s Theorem
โข MAP classification ruleโข MAP: Maximum A Posterior
โข Assign ๐ฅ to ๐โ if
๐ ๐ถ = ๐โ ๐ = ๐ฅ > ๐ ๐ถ = ๐ ๐ = ๐ฅ ๐ โ ๐โ, ๐ = ๐1, โฆ , ๐๐ฟ
โข Generative classification with the MAP ruleโข Apply Bayesian rule
๐ ๐ถ = ๐๐ ๐ = ๐ฅ =๐ ๐ = ๐ฅ ๐ถ = ๐๐ ๐ ๐ถ = ๐๐
๐ ๐ = ๐ฅ
โ ๐ ๐ = ๐ฅ ๐ถ = ๐๐ ๐ ๐ถ = ๐๐ โ ๐๐
๐ ๐๐๐๐ ๐ถ 7
Naรฏve Bayes
โข Bayes rule์ ์ ์ฉํ๋ฉด ๋ชจ๋ ๋ฐ์ดํฐ์ ๋ํ์ฌ ๊ณ ๋ คํด์ผ ํจ learning the joint probability ๐(๐1, โฆ , ๐๐|๐ถ) : Difficulty
โข 10๊ฐ์ Binary feature 210๊ฐ์ data
โข Thus, assumption that all input features are conditionally independent Naรฏve Bayes rule
โข ๊ฐ ์์ง์ ๋ํ์ฌ ์กฐ๊ฑด๋ถํ๋ฅ ์ด ๋ ๋ฆฝ์ ์ด๋ผ ๊ฐ์
โข ์กฐ๊ฑด๋ถ ํ๋ฅ ์ ๋ํ ๊ฒฝ์ฐ์ ์: 2๐ 2๐
๐ ๐๐๐๐ ๐ถ 8
Naรฏve Bayes
โข Naรฏve Bayes
โข MAP classification rule: ๐ฅ = (๐ฅ1, ๐ฅ2, โฆ , ๐ฅ๐)
๐ ๐1, ๐2, โฆ , ๐๐ ๐ถ = ๐ ๐1 ๐2, โฆ , ๐๐, ๐ถ ๐(๐2, โฆ , ๐๐|๐ถ)
= ๐ ๐1 ๐ถ ๐(๐2, โฆ , ๐๐|๐ถ)
= ๐ ๐1 ๐ถ ๐ ๐2 ๐ถ โฆ๐(๐๐|๐ถ)
ProbabilityChain rule!
๐ ๐ฅ1 ๐ถโ โฆ๐ ๐ฅ๐ ๐
โ ๐ ๐โ > [๐ ๐ฅ1 ๐ โฆ๐ ๐ฅ๐ ๐)]๐(๐),
๐ โ ๐^ โ , ๐ = ๐_1, โฆ , ๐_๐ฟ
=
๐
๐(๐๐|๐ถ)
๐ ๐๐๐๐ ๐ถ 9
Example
โข Example: Play Tennis
๐ ๐๐๐๐ ๐ถ 10
Example
โข Learning Phase
Outlook Play=Yes Play=No
Sunny 2/9 3/5Overcast 4/9 0/5
Rain 3/9 2/5
Temperature Play=Yes Play=No
Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=Yes Play=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=Yes Play=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
๐ ๐๐๐๐ ๐ถ 11
Example
โข Test Phaseโข Given a new instance, predict its label
xโ=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
โข Look up tables achieved in the learning phrase
โข Decision making with the MAP rule
P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9
P(Temperature=Cool|Play=Yes) = 3/9
P(Huminity=High|Play=Yes) = 3/9
P(Wind=Strong|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Yes|xโ) โ [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|xโ) โ [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|xโ) < P(No|xโ), we label xโ to be โNoโ.
๐ ๐๐๐๐ ๐ถ 12
References
โข Naรฏve Bayes Classifier - Ke Chen
โข Advanced Algorithm(Naรฏve Bayes Classifier) - Leeck
โข Machine Learning and Its Applications โ Harksoo Kim
โข Wikipedia
โข http://www.leesanghyun.co.kr/Naive_Bayesian_Classifier
โข http://darkpgmr.tistory.com/62
๐ ๐๐๐๐ ๐ถ 13
QA
๊ฐ์ฌํฉ๋๋ค.
๋ฐ์ฒ์, ๋ฐ์ฐฌ๋ฏผ, ์ต์ฌํ
๐ ๐๐๐๐ ๐ถ , ๊ฐ์๋ํ๊ต
Email: [email protected]