Copyright © 2016 Splunk Inc. Dr. Adam Oliner Director of Engineering, Data Science, Splunk Using the Splunk Machine Learning Toolkit to Create Your Own Custom Models Manish Sainani Principal Product Manager, Splunk
Copyright©2016Splunk Inc.
Dr.AdamOlinerDirectorofEngineering,DataScience,Splunk
UsingtheSplunk MachineLearningToolkittoCreateYourOwnCustomModels
ManishSainaniPrincipalProductManager,Splunk
Disclaimer
2
Duringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfutureeventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose
containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor
functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.
Whoarewe?
3
Dr.AdamOliner– DirectorofEngineering,DataScience&MachineLearning– Splunker for2years– Embarrassinglyovereducated
ManishSainani– PrincipalProductManager,MachineLearning– Splunker for2years– FirstMLhireatSplunk!
Whatarewedoinghere?
4
OverviewofMachineLearningTheAssistants:GuidedMachineLearning– Prepare– Fit– Validate– Deploy
Examples– DIYAnomalyDetector– CustomerApplications
OverviewofMLatSplunk
CorePlatformSearch PackagedPremiumSolutions CustomML
PlatformforOperationalIntelligence
SplunkMachineLearningToolkit
Assistants: Guidemodelbuilding,testing,&deployingforcommonobjectivesShowcases: InteractiveexamplesfortypicalIT,security,business,IoTusecases
Algorithms: 25+standardalgorithmsavailableprepackagedwiththetoolkitSPLMLCommands:Newcommandstofit,testandoperationalizemodelsPythonforScientificComputingLibrary:300+opensourcealgorithmsavailableforuse
Buildcustomanalyticsforanyusecase
ExtendsSplunkplatformfunctionsandprovidesaguidedmodelingenvironment
What’sNewsinceour0.9BetaRelease(lastyear’s.conf)?
7
• Newnameandabbreviation;-)• Noeventlimits(removalof50Klimitonfittingmodels)
• Configurableresourcecapsviamlspl.conf
• Searchheadclusteringsupport• Distributed/streamingapply• Scheduledfit• Newalgorithms(nextslide)
– Featureengineeringandselection– Stochasticgradientdescent(e.g.)– ARIMA
• Multi-algorithmsupportacrossAssistants
• Scatterplotmatrixviz• Alerting• Tooltips• In-apptours• ClusterNumericEventsassistant• VideosvideosvideosforeachassistantacrossIT,Security,IoT andBusinessAnalytics
• ML-SPLCheatSheet
MachineLearning
10
AprocessforgeneralizingfromexamplesExamples– A,B,…→ # (regression)– A,B,... → a (classification)– Xpast → Xfuture (forecasting)– likewithlike (clustering)– |Xpredicted – Xactual|>>0 (anomalydetection)
MachineLearningProcess
11
CollectData
Explore/Visualize
Model
Evaluate
Clean/Transform
Publish/Deploy
MachineLearningProcesswithSplunk
12
CollectData
Explore/Visualize
Model
Evaluate
Clean/Transform
Publish/Deploy
props.conf,transforms.conf,DatamodelsAdd-onsfromSplunkbase,etc.
Pivot,TableUI,SPLMLToolkit
Alerts,Dashboards,Reports
DomainExpertise(IT,Security,…)
DataScienceExpertise
SplunkExpertise
CustomMachineLearning– SuccessFormula
Identifyusecases
Drivedecisions
Setbusiness/opspriorities
SPL
Dataprep
Statistics/mathbackground
Algorithmselection
Modelbuilding
SplunkMLToolkitfacilitatesandsimplifiesviaexamples&guidance
Operationalsuccess
GuidedMLwiththeAssistants
14
Guidesyouthroughvariousanalytics– Prepare,fit,validate,anddeploy
AutomaticallygeneratesalltherelevantSPL
TheAssistants
18
1. PredictNumericFields2. PredictCategoricalFields3. DetectNumericOutliers4. DetectCategoricalOutliers5. ForecastTimeSeries6. ClusterNumericEvents
PredictNumericFields
19
Algorithms– LinearRegression
ê …includingLasso,Ridge,andElasticNet– KernelRidge– DecisionTreeRegressor– RandomForestRegressor– SGDRegressor
Validation– Fourvisualizationsofpredictionerror– R2 andRMSE
PredictCategoricalFields
20
Algorithms– LogisticRegression– DecisionTreeClassifier– RandomForestClassifier– SGDClassifier– SVM– NaïveBayes
ê BernoulliNB andGuassianNB
Validation– Precision,recall,accuracy,F1– Confusionmatrix
DetectNumericOutliers
21
Methods– Standarddeviation– Medianabsolutedeviation– Interquartilerange
Validation:
ClusterNumericEvents
24
Algorithms– KMeans– DBSCAN– Birch– SpectralClustering
Validation– ScatterplotMatrixviz
Splunk!
27
Leadingplatformforcollecting,cleaning,andtransformingdataInteractiveFieldExtractorDatamodelsHundredsofadd-onsfromSplunkbasetransforms.confprops.confetc.
FeatureEngineeringTFIDF(term-frequencyxinversedocument-frequency)– Transformfree-formtextintonumericattributes
StandardScaler (i.e.normalization)FieldSelector (i.e.choosekbestfeaturesforregression/classification)PCAandKernelPCA
Fit:What’sNew
31
NoeventlimitsConfigurableresourcecaps(ml-spl.conf)SearchheadclusteringsupportScheduledfitNewalgorithms
Validate/Apply:What’sNew
34
ConfigurableresourcecapsSearchheadclusteringsupportDistributed/streamingapplyScatterplotmatrixviz
Let’sBuildanAnomalyDetector!
42
We’llusetwoAssistants– PredictNumericFields– DetectNumericOutliers
Showautomatically-generatedintermediateSPL
YouBuiltanAnomalyDetector!
54
YoubuiltapredictivemodelofACPowerWhenthepredictionerrorfromthismodelisanoutliercomparedtopasterrors,yougenerateanalertThispredictivemodelautomaticallyretrainsitselfonascheduleyoucontrolYoudidn’thavetotypeanySPL
MachineLearningCustomerSuccess
NetworkOptimizationDetect&PreventEquipmentFailure Security/FraudPrevention
PrioritizeWebsiteIssuesandPredictRootCause
PredictGamingOutagesFraudPrevention
MachineLearningConsultingServices AnalyticsAppbuiltonMLToolkit
Optimizingoperationsandbusinessresults
PreventCellTowerFailureOptimizeRepairOperations
Entertainment Company
15
MachineLearningToolkitCustomerUseCases
57
Speedingwebsiteproblemresolutionbyautomaticallyrankingactionsforsupportengineers
Reducingcustomerservicedisruptionwithearlyidentificationofdifficult-to-detectnetworkincidents
Minimizingcelltowerdegradationanddowntimewithimprovedissuedetectionsensitivity
Improvinguptimeandloweringcostsbypredicting/preventingcelltowerfailuresandoptimizing repairtruckrolls
Predictingandavertingpotentialgamingoutageconditionswithfiner-graineddetection
EnsuringmobiledevicesecuritybydetectinganomaliesinIDauthentication
PreventingfraudbyIdentifyingmaliciousaccountsandsuspiciousactivitiesEntertainment Company
DetectNetworkOutliersReduceddowntime+increasedserviceavailability=bettercustomersatisfaction
58
MLUseCase Monitornoiserisefor20,000+celltowerstoincreaseserviceanddeviceavailability,reduceMTTR
Technicaloverview • Acustomizedsolutiondeployedinproductionbasedonoutlierdetection.• Leveragepreviousmonthdataandvotingalgorithms
“TheabilitytomodelcomplexsystemsandalertondeviationsiswhereITandsecurityoperationsareheaded…SplunkMachineLearninghasgivenusaheadstart...”
ReliablewebsiteupdatesProactivewebsitemonitoringleadstoreduceddowntime
59
“SplunkMLhelpsusrapidlyimproveend-userexperiencebyrankingissue severitywhichhelpsusdeterminerootcausesfasterthusreducingMTTRandimprovingSLA”
• Veryfrequentcodeandconfig updates(1000+daily)cancausesiteissues• Finderrorsinserverpools,thenprioritizeactionsandpredictrootcause
• CustomoutlierdetectionbuiltusingMLToolkitOutlierassistant• BuiltbySplunkArchitectwithnoDataSciencebackground
MLUseCase
Technicaloverview
WhatNow?
60
GettheMachineLearningToolkitfromSplunkbaseGowatchMachineLearningVideosonSplunkYoutube Channelhttp://tiny.cc/splunkmlvideosGotoMachineLearningstalks:– AdvancedMachineLearninginSPLwiththeMachineLearningToolkitbyJacobLeverich– ExtendingSPLwithCustomSearchCommandsandtheSplunkSDKforPythonbyJacobLeverich
SeveralCustomersandPartnerTalks– Cisco,Scianta Analytics,AsianTelco,etc.EarlyAdopterAndCustomerAdvisoryProgram:[email protected]:[email protected]:[email protected]
http://tiny.cc/splunkmlapp