Top Banner
Performance-Aligned Learning Algorithms using Distributionally Robustness Principle Rizal Fathony Post-Doctoral Fellow @ Carnegie Melon University Joint work with: Anqi Liu, Kaiser Asif, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian Ziebart, Zico Kolter.
22

Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Mar 07, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Performance-Aligned Learning Algorithms using Distributionally Robustness Principle

Rizal FathonyPost-Doctoral Fellow @ Carnegie Melon University

Joint work with: Anqi Liu, Kaiser Asif, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian Ziebart, Zico Kolter.

Page 2: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Data

DataDistribution𝑃(𝒙, 𝑦)

𝒙1 𝑦1

𝒙2 𝑦2

𝒙𝑛 𝑦𝑛

Training

Supervised Learning | Classification

𝒙𝑛+1 ො𝑦𝑛+1

Testing

𝒙𝑛+2

Loss / Performance Metrics:loss ො𝑦, 𝑦 / metric( ො𝑦, 𝑦)

ො𝑦𝑛+2

Examples (depend on the task)

• Zero one loss / accuracy metric• Absolute loss (for ordinal regression)

• F1-score• Precision@k• Hamming loss (sum of 0-1 loss)

Page 3: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Example: Digit Recognition

1

2

3

accuracy ෝ𝒚, 𝒚 =1

𝑛

𝑖

𝐼( ො𝑦𝑖 = 𝑦𝑖)

Performance Metric: Accuracy

loss ෝ𝒚, 𝒚 =1

𝑛

𝑖

𝐼( ො𝑦𝑖 ≠ 𝑦𝑖)

Loss Metric: Zero-One Loss

Binary/Multiclass Classification

Page 4: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

absloss ෝ𝒚, 𝒚 =1

𝑛

𝑖

| ො𝑦𝑖 − 𝑦𝑖|

Loss Metric: Absolute Loss

1

2

5

Predicted vs Actual Label:

Distance Loss

Example: Movie Rating Prediction

Ordinal Regression/Classification

Page 5: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

F1𝑠𝑐𝑜𝑟𝑒 ෝ𝒚, 𝒚 =2 TP

AP + PP

Performance Metric: F1 Score

------

--

------

--

-

-----

--

------

--

------

--

------

--

--

-

---

-

--

+

+

+

+

+--

--

--- -

--

------

----

----

------

--

--

-

Confusion Matrix

Classification with Imbalance Datasets

Page 6: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Learning Tasks & Loss/Performance Metric

Machine Learning Tasks Popular Loss/Performance Metrics

Imbalance Datasets - F1-Score- Area under ROC Curve (AUC)- Precision vs Recall

Medical classification tasks - Specificity- Sensitivity- Bookmaker Informedness

Information retrieval tasks - Precision@k- Mean Average Precision (MAP)- Discounted cumulative gain (DCG)

Weighted classification tasks - Cost-sensitive loss metric

Rating tasks - Cohen’s kappa score- Fleiss' kappa score

Computational biology tasks - Precision-Recall curve- Matthews correlation coefficient (MCC)

Page 7: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Learning Framework How to design a learning algorithm?

Page 8: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Standard Approach for Learning Algorithms

Empirical Risk Minimization (ERM) [Vapnik, 1992]

• Assume a family of parametric hypothesis function 𝑓 (e.g. linear discriminator)

• Find the hypothesis 𝑓∗ that minimize the empirical risk:

Intractable optimization!

Since most of loss/performance metrics (e.g. Accuracy, F1-score) are discrete & non-continuous

Page 9: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Evaluation Metrics: Accuracy (Zero-One Loss)

Example: Binary Classification

Surrogate Losses

ERM: prescribes the use of convex surrogate loss to avoid intractability

Support Vector Machine (SVM) Logistic Regression (LR)

Page 10: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Adversarial PredictionA distributionally robust learning framework

Page 11: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Original discrete loss metric:

Adversarial Prediction (Asif et.al, 2015; Fathony et.al, 2018a)

A Distributionally Robust Approach

Adversarial Prediction:

Uncertainty set: moment matching on features

Predictor Adversary

Features Features

ERM: e.g. Logistic Regression

Operates on the conditional distribution 𝑃(𝑦|𝐱) rather than 𝑃(𝐱, 𝑦)

Page 12: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Primal:

Adversarial Prediction: Dual Formulation

Dual:

Lagrange multiplier for the constraints

discrete loss metric

Convex w.r.t. 𝜃

Lagrange duality, minimax duality

Page 13: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Decomposable Metrics(Asif et.al, 2015; Fathony et.al, 2016, 2017, 2018a)

More complex losses: Reformulation as a Linear Program

Examples: Multiclass, Ordinal, Taxonomy-based, and Cost-sensitive Classification

where: 𝐋 is the loss in a matrix form, e.g:

for a 4-class zero-one loss

# of variable = 𝑘 + 1, where 𝑘 = # of class

Decomposable metrics:

Simple loss metrics: Analytical solution(e.g. Zero-One, Absolute Loss); by analyzing the equilibrium solution of zero-sum game.

Page 14: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Non-Decomposable Metrics

Page 15: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Non-Decomposable Metric

Dual:

Example: Binary Classification with F1-score metric

Dual | Marginal Formulation:

Size: 2𝑛

Size: 𝑛2

Intractable!

Page 16: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Generic Non-Decomposable Performance MetricsFathony & Kolter (in-submission)

More complex performance metric

= Cover most popular metrics: e.g. Precision, Recall, F𝛽-score, Balanced Accuracy,Specificity, Sensitivity, Informednes, Kappa score, etc…

Dual | Marginal Formulation:

Size: 2𝑛2

𝑓 is a linear function over TP and TN

Page 17: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Integration with Deep Learning PipelineFathony & Kolter (in-submission)

Programming Interface Enable programmers to easily incorporate custom performance metric into their deep learning pipeline

Leaning using binary cross entropy Leaning using AP formulation for F2-metric

*) The codes are written in Julia

F𝛽 scoredefinition

Page 18: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Integration with Deep Learning PipelineFathony & Kolter (in-submission)

Code examples for other performance metrics:

*) The codes are written in Julia

Cohen’s kappa score metric

Geometric mean of precision and recall

Page 19: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Conclusion

Page 20: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Conclusion

Computationally efficientvia marginalization technique

Align with the loss/performance metricby incorporating the metric into its learning objective

Perform well in practice

Adversarial Prediction Framework

A distributionally robust learning frameworkwith uncertainty set defined over the conditional distributions

Easy to integrate with deep learning pipeline

Page 21: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

References

• Adversarial Cost-Sensitive ClassificationKaiser Asif, Wei Xing, Sima Behpour, and Brian D. Ziebart.Conference on Uncertainty in Artificial Intelligence (UAI), 2015.

• Adversarial Multiclass Classification: A Risk Minimization PerspectiveRizal Fathony, Anqi Liu, Kaiser Asif, Brian D. Ziebart. Advances in Neural Information Processing Systems 29 (NeurIPS), 2016.

• Adversarial Surrogate Losses for Ordinal RegressionRizal Fathony, Mohammad Bashiri, Brian D. Ziebart. Advances in Neural Information Processing Systems 30 (NeurIPS), 2017.

• Consistent Robust Adversarial Prediction for General Multiclass ClassificationRizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart. ArXiv preprint, 2018.

• AP-Perf: Incorporating Generic Performance Metrics in Differentiable LearningRizal Fathony and Zico KolterIn submission, 2019.

Page 22: Performance-Aligned Learning Algorithms using Distributionally ... · Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart.

Thank You