Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to Fail your Data Lab Implementation"

Post on 08-Jan-2017

122 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

Transcript

Three ways to Fail your Data LabImplementation

Dataiku DSS

Data Labs

10 M€ in 2014121 499 M€ in 2014 3 029 M€ in 2015

5 454 M€ in 2014816 M€ in 201410 M€ in 2008

Marketing / Webü Behavioral segmentation

ü Churn predictionü Sales forecast

ü Dynamic Pricing

Industrie & Infrastructureü Predictive maintenanceü Logistic Optimization

ü Smart Cities

Bank & Insuranceü Fraud detection

ü Risk anticipation ü Lifetime moment detection

Why a data Lab?

• 1 single Workflow : from a segmentated workflow to a transversal one• Several use cases: Ability to adress many different data centric topics within a

single unit• Multiple competences: Business focused approached mixing many different

competences• End to end projects : combining data from different sources to handle several

aspects on a single topic

Deployment of the predictions

Dataiku DSS for fraud prediction

Client service

Sensor data

Garage data

Administration

• 1 Project Owner (IT)• 1 Project Manager (Business)• 1 Data scientist in house• 3 data scientist sfrom 3 different firms• 3 consultants from 3 different firms• 1 architect (external)

Accepted file

INVESTIGATE !

The transactions are blockeddepending on their gap with the

business rules and behavioralpatterns

Welcome to Technoslavia !

6

Focus on the framework, not on the input

Data Acquisition &

Understanding

Data Preparation Model Creation

Evaluation Deployment

Scored dataset

Scored dataset

Iteration 1

Iteration 2

Iteration n

✓ Read and import raw data✓ Detect schemas and structure

✓ Analyze distributions✓ Assess quality: outliers,

missing values...

✓ Performance metrics✓ Robustness & generalization

(cross validation)✓ Insights (eg variable importance)

✓ Create derived and aggregated variables→ Analytical dataset

→ Report

✓ Feature selection✓ Compare algorithms

✓ Scoring engine✓ Publish predictions✓ Monitor performance

✓ API

Business Understanding

Adapted from the CRISP-DM methodology

Dataset 1

Dataset 2

Dataset n

People and Governance

?Polyglott VS dictator

Problems : • Collaboration between

technical and non technical profiles insidea single project• Nécessary

collaboration betweenbusiness and techteams to adresstransversal projectsaccurately

Focus :• Promote diversity• …within a workflow

centric environment

End to end, from prototyping into production Do it you way …

…and scale!

Data Lab OrganisationData Lab

Lab Environment

Multydisciplinary Team:

Direction / Project Management

Business Analysts

Data Miners / Data Scientists

Production Environment

Business needs

Internal Data sources

External datasources

Missions :

Priorisation of the business needs

Prototyping / Agile solution engineering

Support for Apps deployment

Business Applications

Marketing Campaign Automation

Reporting webanalytics

Data as A Service Platform

Conception of “DATA PRODUCTS”

Integration of Data Products

Optimisation Engine

Real Time Scoring

Data Flow

Insights & Services

Processing chain

API Deployment

Thank you !

top related