De-Mystifying Big Data Prasad Mavuduri American Institute of Big Data Professionals
Jan 28, 2015
De-Mystifying Big Data Prasad Mavuduri
American Institute of Big Data Professionals
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Agenda
Analyze &
Define
• Progression of Analytics• The new phenomenon - Big Data• Big Data Defined
Technology
Discussion
• Big Data Technology – Hadoop• Big Data – Big Savings – Hadoop
Use Cases
• What can we solve with Big Data – example
• What is next ? Where are the opportunities
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Progression of Analytics
Structured – Known Data
Traditional – ETL, Data Marts, DW, RDBMS
Growth – Normal Incremental – Archive
Less Cross Functional Integration
More Tactical than Strategic
Sizes GBs to TBs
Data Architects vs. Functional
So Far…..
RIG
HT FO
CU
S A
ND
ON
TAR
GET
The new phenomenon - Big Data
Growing Pains ??!!!
Big Data ?!!!
Is it just data ?
RIG
HT FO
CU
S A
ND
ON
TAR
GET
The new phenomenon - Big Data
1. No to “fit-for-all” but Yes to “fit-for-purpose”
2. Proliferation of data sources – variety of data
3. Proliferation of volume of data 4. The demand for the speed (velocity) of
data5. Demand for high value & accuracy
( veracity) of info 6.Massive Parallel processing7. Commodity servers vs. Specialized
servers
DATA DRIVEN BUSINESS
isTHE SMART BUSINESS
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Big Data Definition
• High volume of data which is growing every year more than 50 % every year
• High Speed Streaming, Machine generated data etc
• Different Data sources In-the-enterprise and external data around the enterprise data
• Data collected taking huge memory (typically 100 TB or more) where RDBMS is inefficient
Value Variety
VolumeVelocity
VERACITY
Meaningful
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Big Data Definition
VERACITY
Big Data is the new art and science, using Massive Parallel Processing (MPP) technology, of collection, storage, processing, distribution, and analysis of data with any of the attributes – high volume, high velocity, high variety to extract high value and greater accuracy (veracity).
IBM Says, BIG DATA means 1.Volume (Terabytes --‐> Zettabytes)2. Variety (Structured
--‐> Semi--‐structured --‐> Unstructured)3. Velocity (Batch --‐> Streaming Data)
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Big Data Technologies – Typical Stack
Big Data Infrastructure
Data Manipulation & Management
Data Analysis & Mining
Predictive & Prescriptive Analysis
Process Automation& Decision Support Systems
Big Data Stack
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Big Data Technologies – SMAQ
User-friendly Analytics1. PIG ( simple Query Language), 2. HIVE ( Similar to SQL)3. Cascading ( Workflow) 4. Mahout ( Machine Learning)5. Zookeeper (Coordination Service)
Data Distribution & Management across nodes in Batch Mode
1. Hadoop MapReduce 2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M), Strom, HPCC (LexisNexis)
Distributed Non-Relational 1. HBase ( columnar DB)2. HDFS – Hadoop Distributed File System
Query
Map Reduce
Storage
SMAQ Stack
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Big Data – Big Savings – Economics
ROI on Big Data Approach (with Hadoop)Source : American Institute for Analytics
1TB of RDBMS TCO $37,000 - Traditional RDBMS $2,000 only !!!! HadoopSource :American Institute for Analytics
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Where is the market on Big Data
Infrastructure / Framework / Analytics software
Horizontal Solutions like EDW etc
Healt
h C
are
Reta
il In
dust
ry
Govern
ment
/ Public
sect
or
Edu
cati
on &
H
um
an C
ap
ital
Healt
h
Sci
ence
s /
Genom
ics
Tele
com
munic
ati
ons
/ Serv
ices
Energ
y &
U
tilit
ies
E-C
om
merc
e /
M
ark
eti
ng
Media
&
Ente
rtain
ment
Source: IDC 2011 2010 2011 2012 2013 2014 20150
4
8
12
16
Big Data Market In $B
Current
State
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Web Logs
Images & VideosSocial Media
Documents
Structured Data
Big Data /
Hadoop etc.
Existing EDW
Prescriptive
Predictive
Reporting
OLAP
Modeling
Integrated Big data Implementation - Architecture
Coexistence of Big Data with existing EDW
Connectors /
Adapters
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Web Logs
Images & Videos
Social Media
Documents
Structured Data
Big Data /
Hadoop etc.
Prescriptive
Predictive
Reporting
OLAP
Modeling
Pure Big data Implementation - Architecture
Pure Big Data
Connectors /
Adapters
BarriersDisruption to existing Analytics ?!Roadmap / MethodologyCertainty of costs
HADOOP / Big Table can replace traditional EDWs !!
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Big Data Landscape
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Big Data Landscape
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Applied BIG Data
RIG
HT FO
CU
S A
ND
ON
TAR
GET
BIG Data Opportunities
Some Gaps & opportunities
•Real-time Analysis ( may be use SAP HANA etc !!)
•User interface (UI) frameworks
•App development Big Data on Cloud (multi-Tenancy)
•Security & Data Governance
•Cross Application Integration
•Industry Standards
RIG
HT FO
CU
S A
ND
ON
TAR
GET
AIBDP – Contribution to Big Data
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Business Focus Identify data needsIdentify Business Issues Layout data dependencies between functions Resolve Competing priorities Clearly lay out the levels of data, cross-functional requirements
Stakeholder Focus Identify the stake holders Align best practices with the project Plan out the objectives, scope, and timelinesIdentify the KPIs, Reports, Dashboards, Predictive & Prescriptive Analysis to be delivered
Technology Focus Synergies in current technology Take stock of existing “technology assets” towards Big DataAssess your current capabilities and architecture Identify the resources and minimize “specialties” to exploit synergies with existing resource pool Lay out a development methodology to streamline delivery
Process Focus Establish clear data flows Identify Data Governance execution process – People, Processes, Mechanisms Design the process to be more Business focused than IT Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability)
Our Big Data Strategy at a glance
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Our Execution Approach – AGILE methodology
Agile Approach to reduce risks
• Close coordination between the customer and the developer
• Small incremental steps makes testing easier and manageable & avoid surprises
• Early recovery from expectation mismatch
• Clarity on Design understanding and regular communication with user.
• Early warning about risks regular status reports.
• Full Knowledge Transfer
RIG
HT FO
CU
S A
ND
ON
TAR
GET
Thank You !!
Please contact us for any enquiries at:
Prasad [email protected] 828 9909
Q & A