Top Banner
De-Mystifying Big Data Prasad Mavuduri American Institute of Big Data Professionals
21

"Demystifying Big Data by AIBDP.org

Jan 28, 2015

Download

Business

AIBDP

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: "Demystifying Big Data by AIBDP.org

De-Mystifying Big Data Prasad Mavuduri

American Institute of Big Data Professionals

Page 2: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Agenda

Analyze &

Define

• Progression of Analytics• The new phenomenon - Big Data• Big Data Defined

Technology

Discussion

• Big Data Technology – Hadoop• Big Data – Big Savings – Hadoop

Use Cases

• What can we solve with Big Data – example

• What is next ? Where are the opportunities

Page 3: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Progression of Analytics

Structured – Known Data

Traditional – ETL, Data Marts, DW, RDBMS

Growth – Normal Incremental – Archive

Less Cross Functional Integration

More Tactical than Strategic

Sizes GBs to TBs

Data Architects vs. Functional

So Far…..

Page 4: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

The new phenomenon - Big Data

Growing Pains ??!!!

Big Data ?!!!

Is it just data ?

Page 5: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

The new phenomenon - Big Data

1. No to “fit-for-all” but Yes to “fit-for-purpose”

2. Proliferation of data sources – variety of data

3. Proliferation of volume of data 4. The demand for the speed (velocity) of

data5. Demand for high value & accuracy

( veracity) of info 6.Massive Parallel processing7. Commodity servers vs. Specialized

servers

DATA DRIVEN BUSINESS

isTHE SMART BUSINESS

Page 6: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Big Data Definition

• High volume of data which is growing every year more than 50 % every year

• High Speed Streaming, Machine generated data etc

• Different Data sources In-the-enterprise and external data around the enterprise data

• Data collected taking huge memory (typically 100 TB or more) where RDBMS is inefficient

Value Variety

VolumeVelocity

VERACITY

Meaningful

Page 7: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Big Data Definition

VERACITY

Big Data is the new art and science, using Massive Parallel Processing (MPP) technology, of collection, storage, processing, distribution, and analysis of data with any of the attributes – high volume, high velocity, high variety to extract high value and greater accuracy (veracity).

IBM Says, BIG DATA means 1.Volume (Terabytes --‐> Zettabytes)2. Variety (Structured

--‐> Semi--‐structured --‐> Unstructured)3. Velocity (Batch --‐> Streaming Data)

Page 8: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Big Data Technologies – Typical Stack

Big Data Infrastructure

Data Manipulation & Management

Data Analysis & Mining

Predictive & Prescriptive Analysis

Process Automation& Decision Support Systems

Big Data Stack

Page 9: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Big Data Technologies – SMAQ

User-friendly Analytics1. PIG ( simple Query Language), 2. HIVE ( Similar to SQL)3. Cascading ( Workflow) 4. Mahout ( Machine Learning)5. Zookeeper (Coordination Service)

Data Distribution & Management across nodes in Batch Mode

1. Hadoop MapReduce 2. Alternative – BashReduce, Disco Project, Spark, GraphLab (C&M), Strom, HPCC (LexisNexis)

Distributed Non-Relational 1. HBase ( columnar DB)2. HDFS – Hadoop Distributed File System

Query

Map Reduce

Storage

SMAQ Stack

Page 10: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Big Data – Big Savings – Economics

ROI on Big Data Approach (with Hadoop)Source : American Institute for Analytics

1TB of RDBMS TCO $37,000 - Traditional RDBMS $2,000 only !!!! HadoopSource :American Institute for Analytics

Page 11: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Where is the market on Big Data

Infrastructure / Framework / Analytics software

Horizontal Solutions like EDW etc

Healt

h C

are

Reta

il In

dust

ry

Govern

ment

/ Public

sect

or

Edu

cati

on &

H

um

an C

ap

ital

Healt

h

Sci

ence

s /

Genom

ics

Tele

com

munic

ati

ons

/ Serv

ices

Energ

y &

U

tilit

ies

E-C

om

merc

e /

M

ark

eti

ng

Media

&

Ente

rtain

ment

Source: IDC 2011 2010 2011 2012 2013 2014 20150

4

8

12

16

Big Data Market In $B

Current

State

Page 12: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Web Logs

Images & VideosSocial Media

Documents

Structured Data

Big Data /

Hadoop etc.

Existing EDW

Prescriptive

Predictive

Reporting

OLAP

Modeling

Integrated Big data Implementation - Architecture

Coexistence of Big Data with existing EDW

Connectors /

Adapters

Page 13: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Web Logs

Images & Videos

Social Media

Documents

Structured Data

Big Data /

Hadoop etc.

Prescriptive

Predictive

Reporting

OLAP

Modeling

Pure Big data Implementation - Architecture

Pure Big Data

Connectors /

Adapters

BarriersDisruption to existing Analytics ?!Roadmap / MethodologyCertainty of costs

HADOOP / Big Table can replace traditional EDWs !!

Page 14: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Big Data Landscape

Page 15: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Big Data Landscape

Page 16: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Applied BIG Data

Page 17: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

BIG Data Opportunities

Some Gaps & opportunities

•Real-time Analysis ( may be use SAP HANA etc !!)

•User interface (UI) frameworks

•App development Big Data on Cloud (multi-Tenancy)

•Security & Data Governance

•Cross Application Integration

•Industry Standards

Page 18: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

AIBDP – Contribution to Big Data

Page 19: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Business Focus Identify data needsIdentify Business Issues Layout data dependencies between functions Resolve Competing priorities Clearly lay out the levels of data, cross-functional requirements

Stakeholder Focus Identify the stake holders Align best practices with the project Plan out the objectives, scope, and timelinesIdentify the KPIs, Reports, Dashboards, Predictive & Prescriptive Analysis to be delivered

Technology Focus Synergies in current technology Take stock of existing “technology assets” towards Big DataAssess your current capabilities and architecture Identify the resources and minimize “specialties” to exploit synergies with existing resource pool Lay out a development methodology to streamline delivery

Process Focus Establish clear data flows Identify Data Governance execution process – People, Processes, Mechanisms Design the process to be more Business focused than IT Clearly establish measures to achieve – Accuracy, Repeatability, Agility, and accountability ( reconcilability)

Our Big Data Strategy at a glance

Page 20: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Our Execution Approach – AGILE methodology

Agile Approach to reduce risks

• Close coordination between the customer and the developer

• Small incremental steps makes testing easier and manageable & avoid surprises

• Early recovery from expectation mismatch

• Clarity on Design understanding and regular communication with user.

• Early warning about risks regular status reports.

• Full Knowledge Transfer

Page 21: "Demystifying Big Data by AIBDP.org

RIG

HT FO

CU

S A

ND

ON

TAR

GET

Thank You !!

Please contact us for any enquiries at:

Prasad [email protected] 828 9909

Q & A