© Hortonworks Inc. 2011 – 2017. All Rights Reserved Enterprise Data Science at Scale: Introducing Data Science Experience (DSX) Future of Data – Princeton Meetup 14-November-2017
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Enterprise Data Science at Scale: Introducing Data Science Experience (DSX)
Future of Data – Princeton Meetup14-November-2017
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Presenter
Tim Spann
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
à #1 Pure Open Source Hadoop Distribution
à 1000+ customers and 2100+ ecosystem partners
à Employs the original architects, developers and operators of Hadoop from Yahoo!
à Best-in-class 24x7 customer support
à Leading professional services and training
à #1 Data Science Platform (Source: Gartner)
à OpenPOWER performance leadership
à Flexible, software defined storage
à #1 SQL Engine for complex, analytical workloads
à Leader in On-premise and Hybrid Cloud solutions
+
IBM + Hortonworks = Unlocking Actionable Insights
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science In Action
Data ScientistsResponsible for “The Math”
Data EngineersResponsible for “The Data”
Business AnalystResponsible for “The Business”
The Team The Process
Corporate ITResponsible for “Technology”
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science Challenges
Data Scientists“I like my own tools”“How can I productionize my model”
Data Engineers“I need a central place for data”“How can I efficiently transform data”
Business Analyst”I need to visualize the shape of data”“How can we fail fast and prototype quickly”
The Team The Process Productionizing with data
So many tools & limited compute resources
Data Discovery
Model detioriation & data evolution
Corporate IT“How do I govern and secure this?”“I can’t support all of these tools”
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
The IBM + HWK Data Science Experience
Data ScientistsTools: R Studio, Juypter, Zeppelin, H20, etcModel management
Data EngineersPlace all data assets in one placeProductionize models with REST endpoints
Business AnalystRich data visualizationCommunity and collaboration of knowledge
The Team The Process
Corporate ITRun secure & governed data scienceOne experience to support many tools
Collaboration
Community
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Data Science Solution
Community Open Source Scale & Enterprise Security
• Find tutorials and datasets• Connect with Data Scientists• Ask questions• Read articles and papers• Fork and share projects
• Code in Scala/Python/R/SQL• Zeppelin & Jupyter Notebooks• RStudio IDE and Shiny• Apache Spark• Your favorite libraries
• Data Science at Scale• Run Spark Jobs on HDP Cluster• Secure Hadoop Support• Ranger Atlas Support for Data• Support for ABAC
Model Management
• Data Shaping Pipeline UI• Auto-data preparation & modeling• Advanced Visualizations• Model management & deployment• Documented Model APIs
Data Science Experience
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Use Case
à All industries are effected by churn.à Being able to predict churn helps
companies take action and keep customers longer.
à The more historical data, the better the model
à Data collected and labeled over time based on churn.
à Using a Random Forest we will predict future churners.
Customer Churn Architecture
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo ScenarioAssessing Customer Churn Probability in Real Time
• Stored long term data on customer churn behavior
• New real time data coming in
• Predict a customers churn probability before they churn
• Alert the proper departments | manager
• Business monitors customer retention outlook & performance
© Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo ScenarioProblems Solved
• Data Scientist collaborate, learn new tools & frameworks
• Choice of tools, notebooks and languages
• Run favorite notebook on all data in the HDP Cluster
• Deploy the model to production
• Leverage the production model to deliver insights to business
• Monitor models and retrain models as new data comes in