1 © Copyright 2016 EMC Corporation. All rights reserved. 1 © Copyright 2016 EMC Corporation. All rights reserved. MODERNISE YOUR EDW – DATA LAKE CHARLES SEVIOR, CTO EMERGING TECHNOLOGY DIVISION
Feb 08, 2017
1© Copyright 2016 EMC Corporation. All rights reserved. 1© Copyright 2016 EMC Corporation. All rights reserved.
MODERNISE YOUR EDW – DATA LAKECHARLES SEVIOR, CTO EMERGING TECHNOLOGY DIVISION
2© Copyright 2016 EMC Corporation. All rights reserved.
ALL ORGANISATIONS ARE ON A JOURNEY TO…
1000XMORE DATA
REAL TIMEOPERATION
ANALYTICINSIGHTS
PERSONALISATION & ENHANCED SERVICES
3© Copyright 2016 EMC Corporation. All rights reserved.
THE JOURNEY TO DIGITAL BREAKS TRADITIONAL IT INFRASTRUCTURE
Gartner IT Budget Growth
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
TRADITIONAL DATA
NEW DATASOURCES
4© Copyright 2016 EMC Corporation. All rights reserved.
CHALLENGES WITH ENTERPRISE DATA WAREHOUSES
1. Expensive storage– 70% of data in a typical EDW is unused
2. Expensive processing – On average 55% of EDW CPU utilisation is low value ETL
3. Expensive licensing…4. New data sources
– Traditional systems are unable to capture and use new data sources, such as unstructured or semi-structured data
5© Copyright 2016 EMC Corporation. All rights reserved.
COST DRIVERS
OPERATIONS 50%
ANALYTICS 20%
ETL/ELT 30%
COLD DATA 70%
HOT DATA30%
ENTERPRISE DATA WAREHOUSE
HADOOP WITH ENTERPRISE GRADE STORAGE SOLUTION
ETL/ELT OFFLOADACTIVE ARCHIVE
> $16 K per TB
< $1 K per TB
Cost Comparison
Vs.
6© Copyright 2016 EMC Corporation. All rights reserved.
Throw Data Away1
Waste capacity on low value workloads2
Unable to leverage new data sources3
ARCHIVE
ELT
CHALLENGES WITH EXISTING EDW INFRASTRUCTURE
7© Copyright 2016 EMC Corporation. All rights reserved.
DATA ARCHITECTURE OPTIMISATION WITH HADOOP
Don’t throw data away1
Reclaim Enterprise Data Warehouse for high value BI2
Leverage new data sources3
ARCHIVE
ETL+ELT+BI
8© Copyright 2016 EMC Corporation. All rights reserved.
ALBERT wants to: Optimise the existing
data infrastructure spend Enable analytics on all
data, structured and unstructured
Lay the solid foundation of Self-Service BI
• Albert has an existing 1 PB Enterprise Data Warehouse Infrastructure. With rapid growth in data volume, he needs to add 500 TB of capacity to his existing EDW Infrastructure.
2013
6.5M
2014 2015 2016
EDW Cost
SAMPLE PROBLEM SCENARIO
• At Average Cost of $13,000 Per TB of EDW Storage, the expansion is estimated to cost $6.5 Million to add 500 TB of capacity.
9© Copyright 2016 EMC Corporation. All rights reserved.
MODERNISE YOUR EDW
10© Copyright 2016 EMC Corporation. All rights reserved.
Data Management
DATA LAKE SOLUTION FOR EDW MODERNISATION
Clickstream
Web & Social
Geolocation
Sensor & Machine
Server Logs
EXIS
TIN
G S
OU
RCES
ERP
CRM
Commodity Compute
DATA SERVICES
OPERATIONAL SERVICES
HORTONWORKS DATA PLATFORM
HADOOP CORE
Business Analytics
Visualization& Dashboards
IT Applications
NEW
SO
URC
ES
2
3
1
ETL/ELT OFFLOAD
ACTIVE ARCHIVE
ENRICH WITH NEW DATA TYPES
MULTI-PROTOCOLACCESS
ENTERPRISE-GRADE DATA MANAGEMENT
5NFS, SMB,HTTP, Swift
1
2
3
4
5
4
New Data Flow
Current Data FlowLegend
OFFLOAD
Isilon
11© Copyright 2016 EMC Corporation. All rights reserved.
ENTERPRISE EVOLUTION PROCESS
COST DRIVERS REVENUE DRIVERS
Enterprise Data Warehouse is
Processing Limited
Enterprise Data Warehouse is
Capacity Limited
Need to add new data
source Types
Typical Evolution Process (Every customer journey is different)
HADOOP WITH ENTERPRISE GRADE STORAGE SOLUTION
ETL/ELT OFFLOADACTIVE ARCHIVE ENRICH WITH NEW DATA TYPES
12© Copyright 2016 EMC Corporation. All rights reserved.
DATA SILO CONSOLIDATION
12© Copyright 2016 EMC Corporation. All rights reserved.
13© Copyright 2016 EMC Corporation. All rights reserved.
DATA SILO CONSOLIDATION
Home Directories & File SharesSurveillance
Next-Gen Application
Hadoop & Analytics
TransactionLogs
BLOBSEDW
ContentShares
Marketing M&E
Social & Next-Gen
Archive &Backup Target
Data Monetization
Design, Test & Manufacture
Application Test
13© Copyright 2016 EMC Corporation. All rights reserved.
14© Copyright 2016 EMC Corporation. All rights reserved.
DATA SILO CONSOLIDATION
Home Directories & File SharesSurveillance
Next-Gen Application
Hadoop & Analytics
TransactionLogs
BLOBSEDW
ContentShares
Marketing M&E
Social & Next-Gen
Archive &Backup Target
Data Monetization
Design, Test & Manufacture
Application Test
14© Copyright 2016 EMC Corporation. All rights reserved.
15© Copyright 2016 EMC Corporation. All rights reserved.
DATA SILO CONSOLIDATION
DATA LAKE
Home Directories & File SharesSurveillance
Next-Gen Application
Hadoop & Analytics
TransactionLogs
BLOBSEDW
ContentShares
Marketing M&E
Social & Next-Gen
Archive &Backup Target
Data Monetization
Design, Test & Manufacture
Application Test
15© Copyright 2016 EMC Corporation. All rights reserved.
16© Copyright 2016 EMC Corporation. All rights reserved.
DATA LAKE
SCALE-OUT SINGLE REPOSITORY
IN-PLACE ANALYTICS
MULTI-PROTOCOL / WORKLOAD TIERS
16
ENTERPRISE FEATURES
MANAGEPBs
© Copyright 2016 EMC Corporation. All rights reserved.
17© Copyright 2016 EMC Corporation. All rights reserved.
EMC INFOARCHIVEAn Enterprise Information Archiving Platform that
unlocks data of all types, trapped in siloed applications, lowering IT costs, preserving
compliance and putting application data to work.Leave No Application Data Behind
18© Copyright 2016 EMC Corporation. All rights reserved.
Eliminate costly old legacy apps
and systems while still retaining data
and content for compliance purposes
Make data hungry applications run
more efficiently by archiving static information in a
governed manner
Enable better strategic decisions by leveraging all
the formerly siloed information in your
enterprise
REDUCE COSTS OPTIMIZE ANALYZECONTROLEnsure compliance
with regulatory and legal
mandates by applying
necessary retention and eDiscovery
policies
VALUE PROPOSITIONS
19© Copyright 2016 EMC Corporation. All rights reserved.
INFOARCHIVE WITH A DATA LAKE
Hadoop
Applications built using Hadoop & 3rd party tools
InfoArchive
Storage (Isilon) Storage (Isilon)
Big Data AnalyticsCompliant Preservation
A solution for scalable big data analyticsA compliant solution for application
decommissioning, active archiving & data reuse
Data shared by InfoArchive to
enable analytics
20© Copyright 2016 EMC Corporation. All rights reserved.
Who IS USING infoarchive?Key Customers
Financial Services Health Sciences Manufacturing Other
21© Copyright 2016 EMC Corporation. All rights reserved.
1. Active Archive– Optimise Enterprise Data Warehouse storage by archiving cold data and still
analyse it as needed
2. ETL Offload– Improve EDW performance by offloading ETL processing to Hadoop
3. Semi/Unstructured Data Analytics– Increase confidence in business decisions with new data sources
4. Multi-protocol Access – Enable applications to access/update Hadoop data using NFS, SMB, HTTP, Swift
and other file/object based access methods
5. Data Management– Enterprise-grade data management at Hadoop economics
DATA LAKE BENEFITS
Unique to Isilon
22© Copyright 2016 EMC Corporation. All rights reserved.
EMC CONSULTING SERVICES
Big Data Technology Advisory
Big Data Proof of Technology
Big Data Technology
Implementation
Assess Prove Deploy
Big Data Vision Workshop
Big Data Proof Of Value
Big Data Applied
Analytics Implementatio
nBusiness
Technology
23© Copyright 2016 EMC Corporation. All rights reserved. 23© Copyright 2016 EMC Corporation. All rights reserved.