Page 1
Complexityvs.Performance:EmpiricalAnalysisofMachineLearningas
aService
Yuanshun Yao,Zhujun Xiao,BolunWang*,Bimal Viswanath,Haitao ZhengandBenY.Zhao
TheUniversityofChicago*UniversityofCalifornia,SantaBarbara
[email protected]
Page 2
MLinNetworkResearch
congestioncontrolprotocols
• Sivaraman etal.,SIGCOMM’14
• Winstein &Balakrishnan,SIGCOMM’13
networklinkprediction
• Liuetal.,IMC’16• Zhaoetal.,IMC’12
userbehavioranalysis
• Wangetal.,IMC’14• Zannettouet al.,IMC’17
…
Page 3
RunningMLisHard
dataset
model
Solution:MachineLearningasaService
(ML-as-a-Service)
Page 4
ML-as-a-Service
ML-as-a-Service
trainingdata
userinput(model,parameteretc.)
Page 5
Ismymodelgoodenough?
WhyStudyML-as-a-Service?
Q:Howwelldotheyperform?
Q:HowmuchdoestheamountofusercontrolimpactMLperformance?
Page 6
ML-as-a-ServicePlatforms
GooglePrediction
AmazonML
MicrosoftML
PIOABM BigML
less amountofuserinput more
Page 7
ControlinML
trainingdata trainedmodel
?
Page 8
ControlinML
trainingdata trainedmodel
DataCleaning• Invalid/dup/missingdata
?
Page 9
ControlinML
trainingdata trainedmodel
DataCleaning• Invalid/dup/missingdata
FeatureSelection• MutualInfo, Pearson,Chi…
?
Page 10
ControlinML
trainingdata
ClassifierChoice• LogisticRegression,DecisionTree,kNN…
trainedmodel
DataCleaning• Invalid/dup/missingdata
FeatureSelection• MutualInfo, Pearson,Chi_square…
?
Page 11
ControlinML
trainingdata
ClassifierChoice• LogisticRegression,Decision Tree,kNN…
trainedmodel
DataCleaning• Invalid/dup/missingdata
FeatureSelection• MutualInfo, Pearson,Chi_square…
ParameterTuning• LogisticRegression:L1,L2,max_iter…
Page 12
ControlinML-as-a-Service
Google ABM
✖
✖
✖
✖
✖
✖
✖
✖
Amazon
✔
✖
✖
✖
PIO BigML
✔
✖
✖
✔
✔
✖
✖
✔
✖
Microsoft
✔
✖
✔
✔
low usercontrol/complexity high
DataCleaning
FeatureSelection
ClassifierChoice
ParameterTuning
Complexity vs.Performance?
Page 13
PerformanceMeasurement
Page 14
CharacterizingPerformance• Theoreticalmodelingishard• OutputofMLmodeldependsondataset• Noaccesstoimplementationdetails
• Empiricaldata-drivenanalysis• Simulateareal-worldscenariofromendtoend• Needalargenumberofdiversedatasets
• Focusonbinaryclassification
Page 15
Dataset• 119datasets• Fromdiverseapplicationdomains• Samplesize:15- 245K,numberoffeatures:1- 4K• 79%ofthemarefromUCIMLRepository
LifeScience37%
ComputerApplications15%
ArtificialTest14%
SocialScience9%
PhysicalScience8%
Financial&Business6%
Other11%
Page 16
Methodology• Tuneallavailablecontroldimensions
trainingdata
trainedmodel
Feature Selection Classifier Choice Parameter Tuning
✖✔ ✔API
• LogisticRegression• KNN• SVM• … API
• L1_reg• L2_reg• Max_iter• … API
Page 17
Methodology• Tuneallavailablecontroldimensions
trainingdata
trainedmodel
Feature Selection Classifier Choice Parameter Tuning
✖✔ ✔API
testingdata
API
Page 18
Trade-offsbetweenComplexityandPerformance
Page 19
Complexityvs.Performance
complexitylow high
• Q:Howdoesthecomplexitycorrelatewithperformance?• Highcomplexity->highperformance
0.5
0.6
0.7
0.8
0.9
1
ABM Google Amazon BigML PIO Microsoft Scikit
AverageF-Score Optimized
Page 20
Complexityvs.Risk• Q:Howdoestheriskcorrelatewithcomplexity?• Highcomplexity->highrisk
complexitylow high
0
0.1
0.2
0.3
0.4
0.5
ABM Google Amazon BigML PIO Microsoft Scikit
Perfo
rmanceVariance
(F-Score)
Page 21
UnderstandingServer-sideOptimization
Page 22
Reverse-engineeringOptimization
-1
0
1
2
-1.5 -1 -0.5 0 0.5 1 1.5
Feat
ure
#2
Feature #1
Class 0 Class 1
-6
-3
0
3
6
-3 -2 -1 0 1 2 3
Fe
atu
re #
2
Feature #1
Class 0
Class 1
Circular Linear
• Q:Doesserver-sideadapttodifferentdatasets?
• Reverser-engineeringusingdatasets• Createsyntheticdatasets• Usepredictionresultstoinferclassifierinformation
Page 23
UnderstandingOptimizationGoogledecisionboundaries
-1
0
1
2
-1.5 -1 -0.5 0 0.5 1 1.5
Feat
ure
#2
Feature #1
Class 0 Class 1
-6
-3
0
3
6
-3 -2 -1 0 1 2 3
Feat
ure
#2
Feature #1
Class 0Class 1
• Googleswitchesbetweenclassifiersbasedonthedataset
• Usesupervisedlearningtoinferclassifierfamilyused
Page 24
Takeaways•ML-as-a-Serviceisanattractivetooltoreduceworkload
• Butusercontrolstillhasalargeimpactonperformance
• Fullyautomatedsystemsarelessrisky
Page 25
Thankyou!Questions?