Top Banner
1 http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Visualization for Classification ROC, AUC, Confusion Matrix Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College of Computing Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Parishit Ram, Alex Gray
13

Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

1

http://poloclub.gatech.edu/cse6242CSE6242 / CX4242: Data & Visual Analytics

Visualization for Classification ROC, AUC, Confusion Matrix

Duen Horng (Polo) Chau Associate ProfessorAssociate Director, MS AnalyticsMachine Learning Area Leader, College of Computing Georgia Tech

Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Parishit Ram, Alex Gray

Page 2: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Visualizing Classification PerformanceConfusion matrix

!2 https://en.wikipedia.org/wiki/Confusion_matrix

Page 3: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf!3

Hard to spot trends and patterns

Much easier!

Page 4: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Very important: Find out what “positive” means

!4

Predicated

Cat Dog

ActualCat 5 3

Dog 2 4

Page 5: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Very important: Find out what “positive” means

!5 https://en.wikipedia.org/wiki/Confusion_matrix

“False Alarm” easy to remember

in security applications

Page 6: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Visualizing Classification Performance using

ROC curve (Receiver Operating Characteristic)

Page 7: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Polonium’s ROC CurvePositive class: malwareNegative class: benign

85% True Positive Rate 1% False Alarms

Ideal

True Positive Rate% of bad correctly labeled

False Positive Rate (False Alarms)% of good labeled as bad �7

Page 8: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Measuring Classification Performance using AUC (Area under the curve)

85% True Positive Rate 1% False Alarms

Ideal

Page 9: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

If a machine learning algorithm achieves 0.9 AUC (out of 1.0),

that’s a great algorithm, right?

!9

Page 10: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Be Careful with AUC!

!10

Page 11: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Weights in combined models

Bagging / Random forests• Majority voting

Let people play with the weights?

!11

Page 12: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

EnsembleMatrix

http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf

!12

Page 13: Visualization for Classification › cse6242-2019spring-campus › ...Duen Horng (Polo) Chau Associate Professor Associate Director, MS Analytics Machine Learning Area Leader, College

Improving performance

• Adjust the weights of the individual classifiers

• Data partition to separate problem areaso Adjust weights just for

these individual parts

• Caveat: evaluation used one dataset

http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf

!13