Top Banner
CHAPTER 6 CHAPTER 6 Naive Bayes Models for Classification
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

CHAPTER 6CHAPTER 6

Naive Bayes Models for Classification

Page 2: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

QUESTION????QUESTION????

Page 3: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Bayes’ Rule in Bayes NetsBayes’ Rule in Bayes Nets

Page 4: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Combining EvidenceCombining Evidence

Page 5: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

General Naïve BayesGeneral Naïve Bayes

Page 6: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Modeling with Naïve BayesModeling with Naïve Bayes

Page 7: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

What to do with Naïve Bayes?What to do with Naïve Bayes?1. Create a Naïve Bayes model:

We need local probability estimates Could elicit them from a human Better: Estimate them from observations! This is called parameter estimation, or more generally learning

2. Use a Naïve Bayes model to estimate probability of causes given observations of effects: This is a specific kind of probabilistic inference Requires just a simple computation (next slide) From this we can also get the most likely cause, which is called prediction, or classification These are the basic tasks of machine learning!

Page 8: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Building Naïve Bayes ModelsBuilding Naïve Bayes Models

• What do we need to specify a Bayesian Network? Directed acyclic graph (DAG) Conditional probability tables (CPDs)

• How do we build a Naïve Bayes model?• We know the graph structure already (Why?)• Estimates of local conditional probability tables

(CPTs) P(C), the prior over causes P(E|C) for each evidence variable These typically come from observed data These probabilities are collectively called the

parameters of the model and denoted by θ

Page 9: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Review: Parameter EstimationReview: Parameter Estimation

Page 10: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

A Spam FilterA Spam Filter

Page 11: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

BaselinesBaselines• First task: get a baseline

Baselines are very simple “straw man” procedures

Help determine how hard the task is Help know what a “good” accuracy is

• Weak baseline: most frequent label classifier Gives all test instances whatever label was most

common in the training set E.g. for spam filtering, might label everything as

ham Accuracy might be very high if the problem is

skewed• For real research, usually use previous work as a (strong)

baseline

Page 12: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Naïve Bayes for TextNaïve Bayes for Text

Page 13: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Example: Spam FilteringExample: Spam Filtering

• Raw probabilities don’t affect the posteriors; relative probabilities (odds ratios) do:

Page 14: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Generalization and OverfittingGeneralization and Overfitting• Relative frequency parameters will overfit the training data!

Unlikely that every occurrence of “minute” is 100% spam Unlikely that every occurrence of “seriously” is 100% ham What about the words that don’t occur in the training set? In general, we can’t go around giving unseen events zero

probability

• As an extreme case, imagine using the entire email as the only feature Would get the training data perfect (if deterministic labeling) Wouldn’t generalize at all Just making the bag-of-words assumption gives us some generalization, but isn’t enough

• To generalize better: we need to smooth or regularize the estimates

Page 15: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Estimation: SmoothingEstimation: Smoothing• Problems with maximum likelihood estimates:

– If I flip a coin once, and it’s heads, what’s the

estimate for P(heads)?– What if I flip it 50 times with 27 heads?– What if I flip 10M times with 8M heads?

• Basic idea:– We have some prior expectation about

parameters (here, the probability of heads)– Given little evidence, we should skew towards prior– Given a lot of evidence, we should listen to the

data

Page 16: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Estimation: SmoothingEstimation: Smoothing

Page 17: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Estimation: Laplace SmoothingEstimation: Laplace Smoothing

Page 18: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Estimation: Laplace SmoothingEstimation: Laplace Smoothing

Page 19: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Estimation: Linear InterpolationEstimation: Linear Interpolation

Page 20: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Real NB: Smoothing

Page 21: CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Spam ExampleSpam Example