Top Banner
CLASSIFICATION – NAIVE BAYES NIKOLA MILIKI Ć [email protected] URO Š KR Č ADINAC [email protected]
31

CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Jun 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

CLASSIFICATION – NAIVE BAYES

NIKOLA MILIKIĆ [email protected]

UROŠ KRČADINAC [email protected]

Page 2: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

WHAT IS CLASSIFICATION?

§  A supervised learning task of determining the class of an instance; it is assumed that: §  feature values for the given instance are known §  the set of possible classes is known and given

§  Classes are given as nominal values; for instance: § classification of email messages: spam, not-spam § classification of news articles: politics, sport, culture i sl.

Page 3: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

ToPlayOtNotToPlay.arff dataset

Example 1

Page 4: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Suppose you know that it is sunny outside

Then 60% chance that Play = no

Sunny weather

Page 5: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

How well does outlook predict play?

Page 6: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

For each attribute…

How well does outlook predict play?

Page 7: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Covert values to ratios

2 occurences of Play = no, where Outlook = rainy 5 occurrences of Play = no

Values to ratios

Page 8: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

0.22 x 0.33 x 0.33 x 0.33 x 0.64 = 0.0053

Calculate the likelihood that: Outlook = sunny (0.22) Temperature = cool (0.33) Humidity = high (0.33) Windy = true (0.33) Play = yes (0.64)

Likelihood of playing under these weather conditions

Likelihood of playing under these weather conditions

Page 9: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

0.60 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0206

Calculate the likelihood that: Outlook = sunny (0.60) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Play = no (0.36)

Likelihood of NOT playing under these weather conditions

Likelihood of NOT playing under these weather conditions

Page 10: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Given these weather conditions: Outlook = sunny Temperature = cool Humidity = high Windy = true

Probability of Play = yes: 0.0053 = 20.5% 0.0053 + 0.0206

Probability of Play = no: 0.0206 = 79.5% 0.0053 + 0.0206

The Bayes Theorem

Page 11: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

0.00 x 0.20 x 0.80 x 0.60 x 0.36 = 0.0000

Calculate the likelihood that: Outlook = overcast (0.00) Temperature = cool (0.20) Humidity = high (0.80) Windy = true (0.60) Play = no (0.36)

Likelihood of NOT playing under these weather conditions

Page 12: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Laplace estimator

Laplace estimator: Add 1 to each count

The original dataset

After the Laplace estimator

Page 13: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Laplace estimator

Convert incremented counts to ratios after implementing the Laplace estimator

Page 14: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Laplace estimator

Outlook = ovecast, Temperature = cool, Humidity = high, Windy = true

Play = no: 0.13 x 0.25 x 0.71 x 0.57 x 0.36 = 0.046 Play = yes: 0.42 x 0.33 x 0.36 x 0.36 x 0.64 = 0.0118

Probability of Play = no: 0.0046 = 28% 0.0046 + 0.0118

Probability of Play = yes: 0.0118 = 72% 0.0046 + 0.0118

Page 15: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Laplace estimator

Under these weather conditions: Outlook = sunny Temperature = cool Humidity = high Windy = true

NOT using Laplace estimator: Play = no: 79.5% Play = yes: 20.5%

Using Laplace estimator: Play = no: 72.0% Play = yes: 28.0%

The effect of Laplace estimator has little effect as sample size grows.

Page 16: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Prediction rules

Repeat previous calculation for all other combinations of weather

conditions.

Calculate the rules for each pair.

Then throw out the rules with p < 0.5

Page 17: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Prediction rules

Calculate probabilities for all 36 combinations

Page 18: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Prediction rules

The instance 6 is missing Rules predicting class for all combinations of attributes

Page 19: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Comparing the prediction with the original data

Page 20: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

•  Waikato Environment for Knowledge Analysis

•  Java Software for data mining

•  Set of algorithms for machine learning and data mining

•  Developed at the University of Waikato, New Zealand

•  Open-source

•  Website: http://www.cs.waikato.ac.nz/ml/weka

Weka

Page 21: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

§  We use datasets from the Technology Forge:

http://www.technologyforge.net/Datasets

Datasets we use

Page 22: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

§  Attribut-Relation File Format – ARFF

§  Text file

@relation TPONTPNom !!@attribute Outlook {sunny, overcast, rainy} !@attribute Temp. {hot, mild, cool} !@attribute Humidity {high, normal} !@attribute Windy {'false', 'true'} !@attribute Play {no, yes} !!@data!sunny, hot, high, 'false', no !sunny, hot, high, 'true', no !overcast, hot, high, 'false', yes!...!

ARFF file

Attributes could be: •  Numerical •  Nominal

Page 23: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Classification in Weka

ToPlayOtNotToPlay.arff dataset

Page 24: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Classification results

The Laplace estimator is automatically applied

Page 25: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Classification results

Instance 6 is marked as a wrong identified instance

Probability of each instance in the dataset

Page 26: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Precision, Recall, and F-Measure

True Positives

Rate

False Positives

Rate

F measure = 2 * Precision * Recall Precision + Recall

Precision = TP (TP + FP)

Recall = TP (TP + NP)

Page 27: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Confusion Matrix

TP = True Positive

FP = False Positive

TN = True Negative

FN = False Negative

Page 28: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Example 2 – Eatable Mushrooms dataset

§  Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

§  Hypothetical samples with descriptions corresponding to 23 species of mushrooms

§  There are 8124 instances with 22 nominal attributes which describe mushroom characteristics; one of which is whether a mushroom is eatable or not

§  Our goal is to predict whether a mushroom is eatable or not

Page 29: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

Weka Tutorials and Assignments @ The Technology Forge

§  Link: http://www.technologyforge.net/WekaTutorials/

Thank you!

Page 30: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

(Anonymous) survey for your comments and suggestions

http://goo.gl/cqdp3I

Page 31: CLASSIFICATION – NAIVE BAYESai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML...Eatable Mushrooms dataset based on “National Audubon Society Field Guide to North American Mushrooms”

ANY QUESTIONS?

NIKOLA MILIKIĆ [email protected]

UROŠ KRČADINAC [email protected]