Top Banner
How effective is your classifier? Revisiting the role of metrics in machine learning SANMI KOYEJO CS @ ILLINOIS Joint work with Ran Li, Xiaoyan Wang, Gaurush Hiranandani, Shant Boodaghians, and Ruta Mehta
35

SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Apr 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

How effective is your classifier?Revisiting the role of metrics in machine learningSANMI KOYEJOCS @ ILLINOIS

Joint work with Ran Li, Xiaoyan Wang, Gaurush Hiranandani, Shant Boodaghians, and Ruta Mehta

Page 2: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Image Source: https://davepannell.com/public/2016/03/Email-marketing-vs-spam.jpg

Page 3: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

■ Users complain that most real emails are labelled spam

■ ~90% of all email is spam*

■ Suggests that accuracy is the wrong metric as it gives equal weight to all errors

■ Accuracy = 95%

■ $$$

Image Source: https://becominghuman.ai/deep-learning-made-easy-with-deep-cognition-403fbe445351*Source: Symantec circa 2008; https://www.theatlas.com/charts/NJipnKmq

Page 4: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Error analysis

■ Accuracy

Ground truth

Spam Not Spam

Predicted Spam TP FP

Not Spam FN TN

■ To improve user calibration, try evaluating and/or optimizing weighted accuracy e.g.

Page 5: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

The confusion matrix

Beyond Accuracy, more general metrics are nested functions

Ground truth

Y = 1 Y = 0

Predicted h(x) = 1 TP FP

h(x) = 0 FN TN

■ Metrics are used to compare classifiers, or can be optimized directly

■ The classifier performance metric can be approximated from data.

Page 6: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Lots of real world examples

Page 7: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Metrics in ranking and recommendation“Results show that improvements in RMSE often do not translate into [top-N ranking] accuracy improvements. In particular, a naive non-personalized algorithm can outperform some common recommendation approaches and almost match the accuracy of sophisticated algorithms”

P. Cremonesi, Y. Koren, and R. Turrin. "Performance of recommender algorithms on top-n recommendation tasks." Recsys, 2010.

Page 8: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Metric choice has a large impact on real-world machine learning performance.

Given a complex metric, how can we efficiently construct classifiers that (approximately) optimize it?

1Given a new classification problem, which metric should you use to measure performance?

2

Page 9: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

One simple trick… A RE-WEIGHTING

STRATEGY

Page 10: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Multiclass classification

Standard metric is Accuracy

Page 11: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Standard Prediction Strategy

e.g. logistic regression, RF, DNN, …

Page 12: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Proposed Postprocessing Strategy

e.g. logistic regression, RF, DNN, …

Narasimhan, H., et al. "Consistent multiclass algorithms for complex performance measures." ICML. 2015.

Page 13: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

A small experiment

1. Generate random data from model

2. Fit a logistic regression model

3. Post-process predictions

Page 14: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Simple re-weighting can have a huge effect!

Page 15: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Same strategy works for more complex metrics

any calibrated classifier

Page 16: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Applies to more general settings

NIPS 2014, ICML 2016/2017/2018 ICML 2016

NIPS 2015 In prep.

Page 17: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

An application to recommender systemsUser assigns rating to each item.

Solve this as simultaneous (over items) multiclass classification problem i.e. multioutput classification

Page 18: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Postprocessed OrdRec

Koren, Yehuda, and Joe Sill. "OrdRec: an ordinal model for predicting personalized item rating distributions." Recsys2011.

Page 19: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

When & Whydoes re-

weighting work?

THE GEOMETRY OF CONFUSION

Page 20: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

■ Set of feasible confusion matrices is a bounded convex set

■ Optimization properties will depend on how gradient field of the metric interacts with the feasible set

■ Any monotonic metric will be optimized at the boundary

Page 21: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

■ All points on the boundary are determined by the support function

■ This characterization is exhaustive i.e. characterizes ALL metrics that are consistently optimizable via linear post-processing

Page 22: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

This classification strategy is consistent

Page 23: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Binary classification with general metrics

Logistic regression w/ MLEHolder densities w/ kernel approx. Threshold searchPlug-in classifier

Yan, K., Zhong, Ravikumar (2018)

Page 24: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Which metric should you use?THE BINARY CLASSIFICATION CASE

Page 25: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Recall: Lots of real world examples

Page 26: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Limited formal guidanceAcademia: Use the standard metric in your application area◦ Accuracy◦ Top-K accuracy◦ F1 measure

Industry:Hire a consultant or economist◦ User survey◦ A/B tests

Image Sources: http://all-free-download.com, https://financesonline.com

Page 27: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Our ApproachQuery an “expert” to determine the real-world value of a classifier i.e. the ideal evaluation metric

Pairwise queriesExperts give inaccurate results for value queries

More accurate results for comparison queries

Page 28: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Speed Matters!THE “ORACLE” CARES ABOUT WORST CASE QUERY COMPLEXITY

Page 29: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Exploiting the geometry…■ Only need to query

classifiers on the boundary – since we already know optimal is within this subset

■ Boundary is one-dimensional, parameterized by “angle”

Page 30: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Using binary search■ Under weak conditions,

metric is unimodal with respect to boundary

■ Thus, can simple binary search to find the optimal confusion matrix

■ Simultaneously recovers gradient of the optimal metric

Page 31: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Guaranteed recovery with finite queriesFor the linear case, when algorithm terminates, we recover

Guaranteed to be accurate after steps

If no additional assumptions, this matches lower bound

Stable to system noise e.g. noisy responses from the “expert”

Page 32: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Conclusion

Page 33: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Metric choice has a large impact on real-world machine learning performance.

Re-weighted post-processing is efficient for optimizing complex metrics.

1Can reduce metric elicitation for binary classifiers to binary search with bounded query complexity.

2

Page 34: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Measurement is at the core of empirical research

Extensions to other machine learning problems e.g. ranking, regression, …

Faster elicitation using alternative query mechanisms

Noise tolerance, robust elicitation

Page 35: SANMI KOYEJO CS @ ILLINOIS€¦ · recommender algorithms on top-n recommendation tasks." Recsys, 2010. Metric choice has a large impact on real- ... Fit a logistic regression model

Thank youQUESTIONS?

[email protected]