Top Banner
Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019
19

Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

ManifoldModel-agnostic visual debugging tool for MLLezhi Li, Yunfeng Bai, Yang WangOpML’19 May 20, 2019

Page 2: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Manifold: model-agnostic visual debugging tool for ML

Page 3: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

01 Motivation02 Workflow03 Integration

Manifoldhttps://eng.uber.com/manifoldMichelangelohttps://eng.uber.com/scaling-michelangelo

Agenda

Page 4: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Importance of Model Debugging

20%building initial models

80%improving model performance

Page 5: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Inadequacy of performance metrics

1.000.00 0.25 0.50

0.00

0.75

0.25

0.50

0.75

1.00

ROC

FPR

TPR

AUC: 0.81Great! but what’s Next?

not actionable...

Page 6: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Model Interpretability?

Internal StructureIntermediate StateMimic LearningInteractive Hyperparameter tuningModel Performance Visualization…

Focus on single (family of) model(s)Too much love for deep models: /

Interpretable Machine Learning

Shap [Lundberg 2017]Lime [Ribeiro 2016]

DQNViz [Wang 2019] RetinaVis [Kwon 2019]

Seq2Seq-Vis [Strobelt 2019] GAN Lab [Kahng 2019]

Page 7: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Model-agnostic?Machine Learning at Uber

Predicting Supply & DemandOne-click ChatRestaurant RecommendationSensor Intelligence and LocationAutonomous VehiclesSupport Ticket RoutingIncentive OptimizingFraud DetectionFinancial Planning…

Interpretable Machine Learning

Internal StructureIntermediate StateMimic LearningInteractive Hyperparameter tuningModel Performance Visualization…

Model-specific interpretability

will not scalefrom the operational perspective

Page 8: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

ML Revisited

y = f*(x)+ε

Page 9: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Model space => Data space

What went wrong with the model?Which data subset did the model(s) make mistake?

Why the model made such mistake?Which feature has contributed to the mistakes?

Page 10: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Manifold workflow1. CompareGiven a dataset with the output from one or more ML models, Manifold compares and highlights performance differences across models or data subsets

2. SliceUsers can select data subsets of interest based on model performance for further inspection

3. AttributeManifold then highlights feature distribution differences between the selected data subsets, helping users find the reasons behind the performance outcomes

lower delivery radius

late lunch and dinner

faster delivery

City AOther cities

Page 11: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Model 0

Model 1

Model 2

Model 3

Model 4

Cluster 0

Cluster 1

Cluster 2

raw

0 1

0 2

0 1+2

1 2

Encoder 0

Encoder 1

Encoder 2

Encoder 3

clustering segmentation new

Manifold workflow1. CompareGiven a dataset with the output from one or more ML models, Manifold compares and highlights performance differences across models or data subsets

2. SliceUsers can select data subsets of interest based on model performance for further inspection

3. AttributeManifold then highlights feature distribution differences between the selected data subsets, helping users find the reasons behind the performance outcomes

Page 12: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Challenges

01 Interactivity √02 Scalability ?

OperationalComputational

=> Michelangelo

Page 13: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Integration with Michelangelo

Page 14: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Integration with Michelangelo

Train models

Prepare scored dataset

Analyze with Manifold

Gained insights

Page 15: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Integration with Michelangelo

Case Study:Identify useful features

- A team at Uber is evaluating the value of an extra set of features

- Adding new features did not change model’s overall performance

- Are the new features still worth adding?

- Yes, improvement on hard-to-predict data

Page 16: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

ArchitectureMichelangelo UI

Manifold

Michelangelo API services

Workflow execution

Dataset storage

Fetch data for display

Start workflow

Report status

Start dataset preparation

Store scored datasets

Page 17: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Manifold Workflow

● Expressing Manifold comparison dataset generation with MA physical workflow

Start

Load models

Feature process

Get Data

Downsample*

Score Data

Union Data

Score Data

Score Data

Upload Data for UX

Single task Multiple tasks for parallelism Single task

Filter Data

*: consistent sampling

Page 18: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Thank You!

Questions?

Page 19: Manifold - USENIX...Manifold Model-agnostic visual debugging tool for ML Lezhi Li, Yunfeng Bai, Yang Wang OpML’19 May 20, 2019 Manifold: model-agnostic visual debugging tool for

Consistent Sampling

How to consistently downsample rows across Dataframes?

● No native support for consistent sampling in Spark● Filter based on hash value

Column1 Column2 Column3 Column4

Column1 Column2 Column3 Column4 Hashed Column

Column1 Column2 Column3 Column4

Append hashed concatenated columns

Filter by hashed column and remove it