1 Machine Learning, Chapter 6 CSE 574, Spring 2003 Bayes Theorem and Concept Learning (6.3) • Bayes theorem allows calculating the a posteriori probability of each hypothesis (classifier) given the observation and the training data • This forms the basis for a straightforward learning algorithm • Brute force Bayesian concept learning algorithm
39
Embed
Bayes Theorem and Concept Learning (6.3) - cedar.buffalo.edusrihari/CSE574/ChapBL/ChapBL.Part2.pdf · Bayes Theorem and Concept Learning (6.3) • Bayes theorem allows calculating
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Machine Learning, Chapter 6 CSE 574, Spring 2003
Bayes Theorem and Concept Learning (6.3)
• Bayes theorem allows calculating the a posteriori probability of each hypothesis (classifier) given the observation and the training data
• This forms the basis for a straightforward learning algorithm
• Brute force Bayesian concept learning algorithm
2
Machine Learning, Chapter 6 CSE 574, Spring 2003
Example: Two categories, one binary-valued attribute
• Best hypothesis:• Most probable hypothesis in hypothesis space H given
training data D
• Bayes Theorem: • Method to calculate the posterior probability of h from the
prior probability P(h) together with P(D) and P(D|h)
)()()|()|(
DPhPhDPDhP =
6
Machine Learning, Chapter 6 CSE 574, Spring 2003
Maximum A Posteriori Probability (MAP) hypothesis
• A maximally probable hypothesis is called a maximum a posteriori (MAP) hypothesis
• Can use Bayes to calculate posterior probability of each candidate hypothesis
• hMAP is a MAP hypothesis provided
)|(maxarg DhPhHh
MAP∈
≡
)()()|(maxarg
DPhPhDP
Hh∈=
)()|(maxarg hPhDPHh∈
=
7
Machine Learning, Chapter 6 CSE 574, Spring 2003
Maximum Likelihood Hypothesis
• P(D|h) is called the likelihood of the data D given h• If every hypothesis in H is equally probable a priori
(P(hi) = P(hj) for all hi and hj
• Any hypothesis that maximizes P(D|h) is called a maximum likelihood (ML) hypothesis, hML
)|(maxarg hDPhHh
ML∈
≡
8
Machine Learning, Chapter 6 CSE 574, Spring 2003
Brute-Force Bayes Concept Learning (6.3.1)
• Finite hypothesis space H• To learn a target concept c: X --> {0,1}• Training examples
• <<x1, d1>, <x2,d2>,…<xm,dm>>• where xi is an instance from X• di is the target value of xi, ie, di = c(xi)• Simplify notation, D =(d1,.., dm)
9
Machine Learning, Chapter 6 CSE 574, Spring 2003
Brute-Force Bayes Concept Learning (6.3.1)
Brute-Force MAP Learning Algorithm• For each hypothesis h in H, calculate the posterior
probability
• Output the hypothesis hMAP with the highest posterior probability
• Need to calculate P(h/D) for each hypothesis. Impractical for larger hypothesis spaces!
)()()|()|(
DPhPhDPDhP =
)|(maxarg DhPhHh
MAP∈
≡
10
Machine Learning, Chapter 6 CSE 574, Spring 2003
Choice of P(h) and P(D/h): Assumptions
• The training data D is noise-free (i.e., di=c(xi)).• The target concept c is contained in the hypothesis
space H.• We have no a priori reason to believe that any
hypothesis is more probable than another.
11
Machine Learning, Chapter 6 CSE 574, Spring 2003
Choice of P(h) Given Assumptions
• Given no prior knowledge that one hypothesis (classifier) is more likely than another, same probability is assigned to every hypothesis h in H
• Since target concept is assumed to be contained in H, the prior probabilities should sum to 1
• We should choose,• For all h in H
||1)(H
hP =
12
Machine Learning, Chapter 6 CSE 574, Spring 2003
Choice of P(D/h) Given Assumptions
• Probability of observing the target values D =<d1,..dm> for the fixed set of instances <x1,..,xm> given a world in which hypothesis h holds (ie, h is the correct description of the target concept c)
• Assuming noise-free training data
• ie, Probability of Data D given hypothesis h is 1 if D is consistent with h and 0 otherwise