Dec 16, 2015
THE DATA LAKE DREAMEdd Dumbill • @edd
[email protected] • svds.com/StrataNY2014
2 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
3 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
A scalable, accessible repository of data
(in its natural or processed state)
WHAT IS A DATA LAKE?
4 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
CLEAN VALIDATE CONTROL PROTECT
CONVENTIONAL DATA STRATEGY“WHAT YOU DO TO DATA”
5 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
MODERN DATA STRATEGY“WHAT YOU DO WITH DATA”
TARGET VIP CUSTOMERS ATTRACT NEW CUSTOMERS
AUTOMATE
6 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
7 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
TOWARDS THE “DATA LAKE” — Step 1
8 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
TOWARDS THE “DATA LAKE” — Step 2
9 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
TOWARDS THE “DATA LAKE” — Step 3
10 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
TOWARDS THE “DATA LAKE” — Step 4
11 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
UP vs. OUT — Enterprise Edition
Different use cases put different demands on the data infrastructure.
Increasing cost per unit of capability from scale-up architectures causes rationing of resources. Only the most valuable use cases are pursued.
US D
olla
rs
Data Resource Usage
Scale-up cost
Scale-out cost
UC1
UC2
UC3
UC4
UC5
12 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
THE DATA VALUE CHAINDRAW VALUE FROM YOUR STRATEGIC DATA ASSETS
Discover Ingest Process Persist Integrate Analyze Expose
1313 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
• Make it cheap
• Failure as a feature
• Ask good questions
• Make it quick
• Both learning and adaptation
• Enable the feedback loop
• Don’t break things
• Make operations a platform for innovation
• APIs, platforms, simulation
BUILD FOR EXPERIMENTS
14 © 2014 SILICON VALLEY DATA SCIENCE LLC. ALL RIGHTS RESERVED.
THE EXPERIMENTAL ENTERPRISE
We need to both support investigative work and build a solid layer for
production.
Data science allows us to observe our experiments and respond to the
changing environment.
The foundation of the experimental enterprise focuses on making
infrastructure readily accessible.
15
Edd Dumbill
@edd
@SVDataScienceYes, we’re hiring!
Want these slides? Go to:
svds.com/StrataNY2014