Top Banner
1 ©MapR Technologies - Confidential The Power of Hadoop to Transform Business
55
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The power of hadoop in business

1©MapR Technologies - Confidential

The Power of Hadoop to Transform Business

Page 2: The power of hadoop in business

2©MapR Technologies - Confidential

My Background

University, Startups– Aptex, MusicMatch, ID Analytics, Veoh– big data since before it was big

Open source– even before the internet– Apache Hadoop, Mahout, Zookeeper, Drill– bought the beer at first HUG

MapR Founding member of Apache Drill

Page 3: The power of hadoop in business

3©MapR Technologies - Confidential

MapR Technologies

Silicon Valley Startup– Top investors– Top technical and management team• Google, Microsoft, EMC, NetApp, Oracle

Enterprise quality distribution for Hadoop

Many extensions to basic Hadoop function Strong supporter of Apache Drill

Page 4: The power of hadoop in business

4©MapR Technologies - Confidential

Philosophy First

What is History?

Page 5: The power of hadoop in business

5©MapR Technologies - Confidential

The study of the past

(what came before now)

Page 6: The power of hadoop in business

6©MapR Technologies - Confidential

What is the future?

(it comes after now)

Page 7: The power of hadoop in business

7©MapR Technologies - Confidential

Page 8: The power of hadoop in business

8©MapR Technologies - Confidential

Page 9: The power of hadoop in business

9©MapR Technologies - Confidential

Page 10: The power of hadoop in business

10©MapR Technologies - Confidential

But the future also has a past!

Page 11: The power of hadoop in business

11©MapR Technologies - Confidential

Do you remember the future?

Page 12: The power of hadoop in business

12©MapR Technologies - Confidential

Page 13: The power of hadoop in business

13©MapR Technologies - Confidential

Page 14: The power of hadoop in business

14©MapR Technologies - Confidential

Page 15: The power of hadoop in business

15©MapR Technologies - Confidential

Page 16: The power of hadoop in business

16©MapR Technologies - Confidential

Page 17: The power of hadoop in business

17©MapR Technologies - Confidential

Some things

turned out as

expected

Page 18: The power of hadoop in business

19©MapR Technologies - Confidential

Many things are different!

Page 19: The power of hadoop in business

20©MapR Technologies - Confidential

Hadoop has a history

Page 20: The power of hadoop in business

21©MapR Technologies - Confidential

Hadoop also has a

future

Page 21: The power of hadoop in business

22©MapR Technologies - Confidential

The Old Future of Hadoop

Map-reduce and HDFS– more and more, but not really different

Eco-system additions– Simpler programming (Hive and Pig)– Key-value store– Ad hoc query

Stands apart from other computing– Required by HDFS and other limitations

Page 22: The power of hadoop in business

23©MapR Technologies - Confidential

The New Future of Hadoop

Real-time processing– Combines real-time and long-time

Integration with traditional IT– No need to stand apart

Integration with new technologies– Solr, Node.js, Twisted all should interface directly

Fast and flexible computation– Drill logical plan language

Page 23: The power of hadoop in business

24©MapR Technologies - Confidential

Example #1Search Abuse

Page 24: The power of hadoop in business

25©MapR Technologies - Confidential

History matrix

One row per user

One column per thing

Page 25: The power of hadoop in business

26©MapR Technologies - Confidential

Recommendation based on cooccurrence

Cooccurrence gives item-item mapping

One row and column per thing

Page 26: The power of hadoop in business

27©MapR Technologies - Confidential

Cooccurrence matrix can also be implemented as a search index

Page 27: The power of hadoop in business

28©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Page 28: The power of hadoop in business

29©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Page 29: The power of hadoop in business

30©MapR Technologies - Confidential

Objective Results

At a very large credit card company

History is all transactions, all web interaction

Processing time cut from 20 hours per day to 3

Recommendation engine load time decreased from 8 hours to 3 minutes

Page 30: The power of hadoop in business

31©MapR Technologies - Confidential

Example #2Web

Technology

Page 31: The power of hadoop in business

32©MapR Technologies - Confidential

Fast analysis(Storm)

Analytic output

Real-timedata

Raw logs

Page 32: The power of hadoop in business

33©MapR Technologies - Confidential

Large analysis(map-reduce)

Analytic output Raw logs

Page 33: The power of hadoop in business

34©MapR Technologies - Confidential

Presentation tier (d3 + node.js)

Analytic output

Browser query

Raw logs

Page 34: The power of hadoop in business

35©MapR Technologies - Confidential

Objective Results

Real-time + long-time analysis is seamless

Web tier can be rooted directly on Hadoop cluster

No need to move data

Page 35: The power of hadoop in business

36©MapR Technologies - Confidential

Example #3Apache Drill

Page 36: The power of hadoop in business

37©MapR Technologies - Confidential

Big Data Processing – Hadoop

Batch processing

Query runtime Minutes to hours

Data volume TBs to PBs

Programming model

MapReduce

Users Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Page 37: The power of hadoop in business

38©MapR Technologies - Confidential

Big Data Processing – Hadoop and Storm

Batch processing Stream processing

Query runtime Minutes to hours Never-ending

Data volume TBs to PBs Continuous stream

Programming model

MapReduce DAG (pre-programmed)

Users Developers Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Storm or Apache S4

Page 38: The power of hadoop in business

39©MapR Technologies - Confidential

Big Data Processing – The missing part

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Never-ending

Data volume TBs to PBs Continuous stream

Programming model

MapReduce DAG (pre-programmed)

Users Developers Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Storm and S4

Page 39: The power of hadoop in business

40©MapR Technologies - Confidential

Big Data Processing – The missing part

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes

Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model

MapReduce Queries(ad hoc)

DAG (pre-programmed)

Users Developers Analysts and developers

Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Storm and S4

Page 40: The power of hadoop in business

41©MapR Technologies - Confidential

Big Data Processing

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes

Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model

MapReduce Queries DAG

Users Developers Analysts and developers

Developers

Google project MapReduce Dremel

Open source project

Hadoop MapReduce

Storm and S4

Page 41: The power of hadoop in business

42©MapR Technologies - Confidential

Big Data Processing

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes

Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model

MapReduce Queries DAG

Users Developers Analysts and developers

Developers

Google project MapReduce Dremel

Open source project

Hadoop MapReduce

Storm and S4

Apache Drill

Page 42: The power of hadoop in business

43©MapR Technologies - Confidential

Design Principles

Flexible• Pluggable query languages• Extensible execution engine• Pluggable data formats

• Column-based and row-based• Schema and schema-less

• Pluggable data sources

Easy• Unzip and run• Zero configuration• Reverse DNS not needed• IP addresses can change• Clear and concise log messages

Dependable• No SPOF• Instant recovery from crashes

Fast• C/C++ core with Java support

• Google C++ style guide• Min latency and max throughput

(limited only by hardware)

Page 43: The power of hadoop in business

44©MapR Technologies - Confidential

Simple Architecture

Page 44: The power of hadoop in business

45©MapR Technologies - Confidential

Standard Interfaces

Page 45: The power of hadoop in business

46©MapR Technologies - Confidential

query:[ { op:"sequence", do:[ { op: "scan", memo: "initial_scan", ref: "donuts", source: "local-logs", selection: {data: "activity"} }, { op: "transform", transforms: [ { ref: "donuts.quanity", expr: "donuts.sales”} ] }, { op: "filter", expr: "donuts.ppu < 1.00" }, …

Logical Plan Syntax:

Page 46: The power of hadoop in business

47©MapR Technologies - Confidential

Logical Streaming Example

{ @id: <refnum>, op: “window-frame”, input: <input>, keys: [ <name>,... ], ref: <name>, before: 2, after: here}

0 1 2 3 4

0 0 10 1 2 1 2 32 3 4

Page 47: The power of hadoop in business

48©MapR Technologies - Confidential

Logical Plan

Page 48: The power of hadoop in business

49©MapR Technologies - Confidential

Execution Plan

Page 49: The power of hadoop in business

50©MapR Technologies - Confidential

Representing a DAG

{ @id: 19, op: "aggregate", input: 18, type: <simple|running|repeat>, keys: [<name>,...], aggregations: [ {ref: <name>, expr: <aggexpr> },... ]}

Page 50: The power of hadoop in business

51©MapR Technologies - Confidential

Non-SQL queries

Page 51: The power of hadoop in business

52©MapR Technologies - Confidential

Design Principles

Flexible• Pluggable query languages• Extensible execution engine• Pluggable data formats

• Column-based and row-based• Schema and schema-less

• Pluggable data sources

Easy• Unzip and run• Zero configuration• Reverse DNS not needed• IP addresses can change• Clear and concise log messages

Dependable• No SPOF• Instant recovery from crashes

Fast• C/C++ core with Java support

• Google C++ style guide• Min latency and max throughput

(limited only by hardware)

Page 52: The power of hadoop in business

53©MapR Technologies - Confidential

The future is not what we thought it would be

Page 53: The power of hadoop in business

54©MapR Technologies - Confidential

It is better!

Page 54: The power of hadoop in business

55©MapR Technologies - Confidential

Get Involved!

Tweet:#hcj13w#mapr

@ted_dunning

Page 55: The power of hadoop in business

56©MapR Technologies - Confidential

Get Involved!

Download these slides– http://www.mapr.com/company/events/hcj-01-21-2013

Join the Drill project– [email protected] – #apachedrill

Contact me:– [email protected][email protected]– @ted_dunning

Join MapR (in Japan!)– [email protected]