Top Banner
BIG DATA = BIG BIG DATA = BIG DECISIONS DECISIONS Bob Zurek | SVP Products | Epsilon | www.epsilon.com
40

Big Data = Big Decisions

Jan 24, 2015

Download

Technology

InnoTech

Presented on April 17th for InnoTech Dallas.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data = Big Decisions

BIG DATA = BIG DECISIONSBIG DATA = BIG DECISIONS

Bob Zurek | SVP Products | Epsilon | www.epsilon.com

Page 2: Big Data = Big Decisions

BIG DATA APPROACHING

Page 3: Big Data = Big Decisions

Consider the following:• New model for data • Accessible over TCP/IP and variety of languages• Initially difficult to understand• Capable of processing thousands of ops/sec• Very different from old model• Threatening as much was invested in old model• Changing course seems ridiculous

Source: Eben Hewitt

Page 4: Big Data = Big Decisions

What are we talking about?

Page 5: Big Data = Big Decisions

Source: IBM

IBM IMS

“IMS is IBM's premier transaction and hierarchical database management system, virtually unsurpassed in database and transaction processing availability and speed” – IBM 2013

“Mission-critical processing that requires unparalleled performance is best served by a hierarchical model. Analytics and business intelligence are best served by a relational model. Most Fortune 100 companies use both.”

Page 6: Big Data = Big Decisions

A New Model Is Invented

A Disruptive Model

A Threatening Model

A Competitive Model

Data evolution

Source: Eben Hewitt

Page 7: Big Data = Big Decisions

A HUGE industry success

The relational model & SQL

Page 8: Big Data = Big Decisions

So now what?

Page 9: Big Data = Big Decisions

We have a problem

Page 10: Big Data = Big Decisions

confusion

innovation

Sound familiar?

complexity

disruptiona new model

fierce competition

Page 11: Big Data = Big Decisions

Source: McKinsey

Big data – a growing torrent

$600 to buy a disk drive that canstore all of the world’s music

5 billion mobile phonesin use in 2010

30 billion

pieces of content sharedon Facebook every month

40% projected growth in global data

generated per year vs. 5%growth in globalIT spending235 terabytes data collected by the

U.S. Library of Congress by April 2011

15 out of 17sectors in the United States have more datastored per company than the U.S. Library of Congress

Page 12: Big Data = Big Decisions
Page 13: Big Data = Big Decisions

What is What is big data, big data, exactly?exactly?

Industry buzz

Page 14: Big Data = Big Decisions

Big data confusion?

Source: IBM

A greater scope of information

New kinds of data and analysis

Real-time information

Data influx from new technologies

Non-traditional forms of media

Large volumes of data

The latest buzzword

Social media data

18%

16% 15% 13% 13%

10%

8%

7%

What do business executivesthink “big data” is?

Page 15: Big Data = Big Decisions

Source: McKinsey

Big data is…

Large pools of data Large pools of data that can be captured, that can be captured, communicated, communicated, aggregated, stored, aggregated, stored, and analyzedand analyzed

Page 16: Big Data = Big Decisions

Source: TDWI

Another way of looking at it

Page 17: Big Data = Big Decisions

Is it time to lookfor an alternative?

Page 18: Big Data = Big Decisions

It’s not that simple,

is it?

Page 19: Big Data = Big Decisions

• Vertical scaling = throw hardware at it• Optimize the application = sql, indexes, access• Employ caching layers = MemcacheD, Coherence• Denormalization = reduce joins• Sharding/Shared Nothing = split the data up• Innovation = columnar

How are we solving (historically)?

Page 20: Big Data = Big Decisions

What’s driving change and innovation?

Page 21: Big Data = Big Decisions

102556397102556397

Page 22: Big Data = Big Decisions
Page 23: Big Data = Big Decisions

Doug Cutting = Nutch

Google = GFS and GMR

A search engine project at Yahoo

Big data innovation incubatedBig data innovation incubated

Page 24: Big Data = Big Decisions

“Hadoop is an amazing technology stack. We now depend on it to run eBay.”

Bob Page, Vice President of Analytics, eBay

Source: http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop/

eBay erected a Hadoop cluster spanning 530 servers – now five times the size!

Page 25: Big Data = Big Decisions

It can get complex and confusing

“It replaced our need for ETL”

“It is great for batch processing in parallel”

“A beautiful platform for all of problems”

Page 26: Big Data = Big Decisions

What it’s not good for

• High volume transactional data

• Structured data with low latency

“Note that Hadoop is not an Extract-Transform-Load (ETL) tool. It is a platform that supports running ETL processes in parallel. The data integration vendors do not compete with Hadoop; rather, Hadoop is another channelfor use of their data transformation modules. “

Teradata/Cloudera Presentation

Page 27: Big Data = Big Decisions

What it’s really good for

• Index building

• Pattern recognitions

• Sentiment analysis

• Machine generated data

• Log processing

• Web scale = Google, Twitter, YouTube

Page 28: Big Data = Big Decisions

Use Cases

Online Travel Reservations

Mobile Data

E-Commerce

Energy Discovery

Energy SavingsInfrastructure Management

Image Processing

Fraud Detection

IT Security

HealthCare

Analyze machine generated data

Semantic analysis for relevance

Suggest ways customers save money

Spot fraud anomolies

Process mobile data

Large marketplaces

Sort and process seismic data

Detecting patterns in sat imagery

Travel booking

Collecting device logs

Page 29: Big Data = Big Decisions

Source: Teradata/Cloudera

Page 30: Big Data = Big Decisions

Source: Teradata/Cloudera

Page 31: Big Data = Big Decisions

Many shades of grey and lots of great innovations

Page 32: Big Data = Big Decisions

Relational is still in play

Some innovations worth a look

Dynamically Scaling OLTP = “No Need To Shard”

Page 33: Big Data = Big Decisions

The NoSQL generation

• Document Storage Model• Allows MTV to store

hierarchical data• Flexible schema to model

structure/data by brand• Needed to have ability

to query nested content• No need for a shared

disk storage

• Released by NSA to open source• Apache Accumulo• Based on Google Big Table• Built on top of Hadoop• Fine-grained access control• Cell level security • Server side programming

Page 34: Big Data = Big Decisions

• Schemaless model = Easy to to add fields • Document oriented = Json format (think objects)• Built from the ground up to be distributed• Auto sharding • Distributed querying capabilities

Why NoSQL?

Page 35: Big Data = Big Decisions

NoSQL Use Case

1. Click/Event into Hadoop

2. Data Analyzed via Map Reduce jobs; generates 100M profiles based on campaigns running

3. Selected profiles loaded into Couch

4. Ad targeting logic query Couch with sub-second latency to optimizedecision and real-time ad placement

Source: Couchbase

Page 36: Big Data = Big Decisions

Hadoop Augmentation• Side-by-Side will be commonplace• ETL solutions support Hadoop • Relational Databases

• Provide ETL interfaces to Hadoop• Execute map/reduce jobs inside DBMS

• NoSQL supports ETL

Page 37: Big Data = Big Decisions

Example Hybrid DBMS SystemsOracle Endeca Server• Hybrid Search/Analytic Database• Supports structured, unstructured, semi-structured• No schema required. Records stacked.• Columnar

Page 38: Big Data = Big Decisions

Trends• SQL On Hadoop – Hadapt, Clodera Impala, EMC• Unified Support of Structured, Unstructured, Semi• Embedding Search• Expanded ETL/ELT Support• Big Data In Motion Takes Hold• Added Data Mining and Analytic Functions In NoSQL• Embedding R Language = gain in popularity• Data Scientists instrumental in business success

Page 39: Big Data = Big Decisions
Page 40: Big Data = Big Decisions

Bob Zurek | [email protected]