Top Banner
1 © Copyright 2012 EMC Corporation. All rights reserved. Create Your Big Data Vision And Hadoop-ify Your Data Warehouse Jeff Kelly, Principal Research Contributor The Wikibon Project Bill Schmarzo, CTO EIMA Practice, EMC Professional Services
25

Create your Big Data vision and Hadoop-ify your data warehouse

Nov 17, 2014

Download

Technology

Jeff Kelly

How to get your data warehouse and Hadoop to play nice together.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Create your Big Data vision and Hadoop-ify your data warehouse

1 © Copyright 2012 EMC Corporation. All rights reserved.

Create Your Big Data Vision And Hadoop-ify Your Data Warehouse

Jeff Kelly, Principal Research Contributor

The Wikibon Project

Bill Schmarzo, CTO EIMA Practice, EMC Professional

Services

Page 2: Create your Big Data vision and Hadoop-ify your data warehouse

2 © Copyright 2012 EMC Corporation. All rights reserved.

Agenda �  Current Market Observations

�  The Big Data Business Maturity Index and How to Identify Your Best Use Case

� Get Started With Hadoop and Other New Technologies

� What Should You Look For in a Vendor?

� Q&A

Page 3: Create your Big Data vision and Hadoop-ify your data warehouse

3 © Copyright 2012 EMC Corporation. All rights reserved.

Current Market Observations Jeff Kelly

Page 4: Create your Big Data vision and Hadoop-ify your data warehouse

4 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data Market Size

2012 $11.4b

2013 $18.2b

2017 $48b

ü  59% Growth Y-o-Y 2011 to 2012

ü  Forecast 60%+ Growth in 2013

ü  31% CAGR Forecast 2012 through 2017

2014 $28b

2015 $37.9b

2016 $43.7b

Source: Wikibon Big Data Vendor Revenue and Market Forecast, 2012-2017

Page 5: Create your Big Data vision and Hadoop-ify your data warehouse

5 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data Market Segmentation, 2012 Services Leading the Way

Professional Services $3,784m

34%

Cloud and SaaS $608m

5% Pro. Services Compute Storage Networking Database Applications Data mgt. Cloud

n = $11,400m

Page 6: Create your Big Data vision and Hadoop-ify your data warehouse

6 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data Growth Drivers ü  Increased Awareness and Investments

By Large Enterprises Beyond the Web ü  Retailers like Sears leveraging Big Data for

price optimization. ü  Financial services firms, including JPMC, Morgan

Stanley and BoA, conduct fraud analysis, risk profiling and more.

ü  Pharmaceutical including Bristol Myers Squibb makers use Big Data to support drug development.

ü  Continued Investment by Web Pioneers and Three Letter Agencies ü  Google alone spent $1b+ on infrastructure in Q4 2012. ü  “Everything we do is a Big Data problem.” – Jay Parikh, VP of Engineering, Facebook ü  CIA CTO Ira Hunt: Our mission is to “collect everything and hang on to it forever.”

Page 7: Create your Big Data vision and Hadoop-ify your data warehouse

7 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data Growth Drivers, Cont. ü  Increasingly Sophisticated Professional Services

ü  Professional services building on experience of assisting early adopters. ü  Some (but not all) are vendor and product agnostic. ü  Focusing on identifying use cases, improving communication, and leveraging

existing assets.

ü  Technology Maturation ü  Open source community and vendors making

Hadoop enterprise-ready, easier to use. ü  Better integration between Big Data and

existing IT infrastructure. ü  Extending Big Data accessibility to business

users via BI and data visualization tools.

Consulting

Training & Educations Integration

Page 8: Create your Big Data vision and Hadoop-ify your data warehouse

8 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data Growth Inhibitors ü  Lack of Data Scientists and Big Data

Practitioners

ü  Big Data Technology Still Complex, Difficult to Manage/Use

ü  Organizational Resistance to Data-Driven Decision Making

ü  Confusion Due to Vendor Marketing and “Big Data Washing”

Big Data [Your Product Name Here]

Page 9: Create your Big Data vision and Hadoop-ify your data warehouse

9 © Copyright 2012 EMC Corporation. All rights reserved.

The Big Data Business Maturity Index and How to Identify Your Best Use Case Bill Schmarzo

Page 10: Create your Big Data vision and Hadoop-ify your data warehouse

10 © Copyright 2012 EMC Corporation. All rights reserved.

Business Metamorphosis

Data Monetization

Business Optimization

Business Insights Business

Monitoring

Monitoring business performance to flag

areas of interest

Big Data Business Model Maturation Index

Integrate insights & recommendations

into existing business processes

Embed analytics to optimize select

business processes

Leverage insights to identify new revenue

opportunities

Transform customer and product insights to

move into new markets

Measures the degree to which the organization has integrated big data and advanced analytics into their business model

Page 11: Create your Big Data vision and Hadoop-ify your data warehouse

11 © Copyright 2012 EMC Corporation. All rights reserved.

How to Identify Your Best Use Case The Big Data strategy document ensures a tight linkage between your organization’s business initiatives and your big data strategy

•  Big  data  business  cases,  ROI  and  analy4c  requirements  

•  Key  Performance  Indicators  and  leading  metrics    

•  Business  ques4ons  with  metrics,  dimensions,  hierarchies  

•  Business  decisions,  decision  flow/process  and  UEX  requirements  

•  Analy4c  algorithms  and  modeling  requirements  

•  Required  data  sources  

Business Strategy: Provide Unique Starbucks Customer Experience

Business Initiatives: •  Increase number of “Gold Card” customers •  Increase “Gold Card” customer revenue & engagement (store visits, spend per visit, advocacy)

Mobile App •  • 

Social Media •  • 

Store Sales •  • 

Customer Loyalty •  • 

Collect customer engagement information through multiple channels (store, web, mobile)

Profile and micro-segment customers to improve marketing and offers effectiveness

Analyze social media data to identify and monitor brand advocates

Monitor and adjust customer engagement effectiveness (visits, revenue, margin, advocacy)

Tasks

Develop intimate knowledge of “Gold Card” customers life stage, behaviors and interests

Act upon intimate knowledge of “Gold Card” customers to increase store revenues

•  Expand customer data collection points •  Leverage “gold card” member transactions, feedback (surveys) and social data •  Integrate customer-specific insights back into operational, management and loyalty systems

Outcomes & CSF’s

Page 12: Create your Big Data vision and Hadoop-ify your data warehouse

12 © Copyright 2012 EMC Corporation. All rights reserved.

Get Started With Hadoop and Other New Technologies Bill Schmarzo

Page 13: Create your Big Data vision and Hadoop-ify your data warehouse

13 © Copyright 2012 EMC Corporation. All rights reserved.

A Playbook For Modernizing Your Data Warehouse With New Big Data Technologies And Capabilities

#1) Enhance data warehouse with new unstructured data metrics

#2) Data virtualization to extend existing data warehouse environment

#3) MPP RDBMS to increase data platform scalability and agility

#4) In-database analytics to accelerate analytic development

#5) Hadoop to create the next generation Operational Data Store

Page 14: Create your Big Data vision and Hadoop-ify your data warehouse

14 © Copyright 2012 EMC Corporation. All rights reserved.

#1) Enhance Data Warehouse With New Unstructured Data Metrics Leverage HDFS to provide a single platform that supports your traditional SQL-based BI environment plus your growing unstructured data needs at scale

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Apache

Pivotal HD

Configure, Deploy, Monitor, Manage

Command Center

Hadoop Virtualization (HVE)

DataLoader

Xtension Framework

Catalog Services

Query Optimizer

Dynamic Pipelining

ANSI SQL + Analytics

HAWQ – Advanced Database Services

Page 15: Create your Big Data vision and Hadoop-ify your data warehouse

15 © Copyright 2012 EMC Corporation. All rights reserved.

ETL

Cached Streaming Data

Unified Data Platform

Data Source

Real-Time Visualization

Advanced Analytics and Modeling

Data Source

CEP/ Workflow

Data Federation Tool

Semantic Master

Data Discovery /

Data Mapping

Data Source

Data Source

#2) Extend Existing Data Warehouse Via Data Virtualization Leverage data federation tools to speed data discovery and analysis via virtual, on-demand access to data sources outside your EDW

Page 16: Create your Big Data vision and Hadoop-ify your data warehouse

16 © Copyright 2012 EMC Corporation. All rights reserved.

•  Massively Parallel Processing (MPP), scale-out architectures provide cost effective options for managing and analyzing massive data volumes

•  MPP data warehouses provide linear scalability on general purpose, commodity systems (e.g., fault-tolerant scale out environment; automatic parallelization; I/O optimized)

#3) Massively Parallel Processing (MPP) Relational Databases

Page 17: Create your Big Data vision and Hadoop-ify your data warehouse

17 © Copyright 2012 EMC Corporation. All rights reserved.

#4) In-Database Processing And Analytics

Conventional: A Data Scientist needs to move 1 TB of data from a 5-processor database server to the analytical server at 1 gigabytes per second (Gbs)

In-Database: A Data Scientist leaves the 1 TB data in the 5-processor database server and runs the same algorithm directly in the database

0 20 40 60 80 100 120 140 180 160 200

Data Movement Time = (1TB x 8) / 1Gbs / 60 s = 133.3 minutes Processing Time = 60 minutes

12 minutes

Total Time = 193.3 minutes

Time (minutes)

Conventional

In-Database

Page 18: Create your Big Data vision and Hadoop-ify your data warehouse

18 © Copyright 2012 EMC Corporation. All rights reserved.

Hadoop Data Store Analytics Environment

Data Preparation and Enrichment

ALL data fed into Hadoop Data Store

EDW ETL

Analytic Sandbox

BI Environment

•  Production •  Predictable load •  SLA-drive •  Standard tools

•  Exploratory, Ad Hoc •  Unpredictable load •  Experimentation •  Best tool for the job

#5) Next Gen Operational Data Store/Data Prep With Hadoop

Feeds production BI and Enterprise Data Warehouse environment and high-velocity Analytics Sandbox

Page 19: Create your Big Data vision and Hadoop-ify your data warehouse

19 © Copyright 2012 EMC Corporation. All rights reserved.

How To Get Started

Page 20: Create your Big Data vision and Hadoop-ify your data warehouse

20 © Copyright 2012 EMC Corporation. All rights reserved.

EMC Big Data Analytics Strategy And Implementation Services

Analytics Operationalization

Identify current state, determine required state and conduct gap analysis to develop analytics implementation roadmap

Analytics Lab

Deploy analytics sandbox to quantify the business case

Vision Workshop

Identify big data analytics business use cases

Repeat the process for identified business cases

Page 21: Create your Big Data vision and Hadoop-ify your data warehouse

21 © Copyright 2012 EMC Corporation. All rights reserved.

What Should You Look For in a Vendor? Jeff Kelly

Page 22: Create your Big Data vision and Hadoop-ify your data warehouse

22 © Copyright 2012 EMC Corporation. All rights reserved.

Advice for Selecting Big Data Vendors ü  Balance short-term goals with long-term vision.

ü  Objectives are:

ü  Quick, demonstrable ROI.

ü  Sustainable Big Data practice.

ü  Don’t get hung up on “speeds and feeds” or feature-by-feature comparisons.

ü  Focus on substance, flexibility, commitment and experience.

Page 23: Create your Big Data vision and Hadoop-ify your data warehouse

23 © Copyright 2012 EMC Corporation. All rights reserved.

Selecting Big Data Vendors, Cont. ü  Evaluate products portfolios based on:

ü  Ability to monetize existing and future data assets.

ü  Ability to integrate with and compliment existing data management technology.

ü  Accessibility to power users and business users alike (depending on use case).

ü  Ability to apply information governance and security best practices.

ü  Select service providers with track records of assisting enterprises adopt data-driven culture as well as technology.

Page 24: Create your Big Data vision and Hadoop-ify your data warehouse

24 © Copyright 2012 EMC Corporation. All rights reserved.

To type a question via WebEx, click on the Q&A tab Please select “Ask: All Panelists”

to ensure your questions reach us. Thank you!

Questions and Answers

Page 25: Create your Big Data vision and Hadoop-ify your data warehouse

25 © Copyright 2012 EMC Corporation. All rights reserved.

Learn More… � See us at…

–  EMC World, May 5-9 www.emc.world.com

� Contact Jeff Kelly –  Email: [email protected] –  LinkedIn: http://www.linkedin.com/in/jeffreyfkelly/ –  Twitter: @jeffreyfkelly –  Research: http://www.wikibon.org/bigdata

� Contact Bill Schmarzo –  Email: [email protected] –  LinkedIn: http://www.linkedin.com/in/schmarzo –  Twitter: @schmarzo –  Blog: http://infocus.emc.com/author/william_schmarzo/