Top Banner
1 ©MapR Technologies - Confidential MapR: The Next Generation Big Data Platform
41
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big data, why now?

1©MapR Technologies - Confidential

MapR: The Next Generation Big Data Platform

Page 2: Big data, why now?

2©MapR Technologies - Confidential

Big is the next big thing

Big data and Hadoop are exploding

Companies are being funded

Books are being written

Applications sprouting up everywhere

2

Page 3: Big data, why now?

3©MapR Technologies - Confidential

Slow Motion Explosion

3

Page 4: Big data, why now?

4©MapR Technologies - Confidential

Hadoop Explosion

4

Page 5: Big data, why now?

5©MapR Technologies - Confidential

Why Now?

But Moore’s law has applied for a long time

Why is Hadoop exploding now?

Why not 10 years ago?

Why not 20?

56/1/2012

Page 6: Big data, why now?

6©MapR Technologies - Confidential

Size Matters, but …

If it were just availability of data then existing big companies would adopt big data technology first

6

Page 7: Big data, why now?

7©MapR Technologies - Confidential

Size Matters, but …

If it were just availability of data then existing big companies would adopt big data technology first

They didn’t

7

Page 8: Big data, why now?

8©MapR Technologies - Confidential

Or Maybe Cost

If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte

8

Page 9: Big data, why now?

9©MapR Technologies - Confidential

Or Maybe Cost

If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte

They didn’t

9

Page 10: Big data, why now?

10©MapR Technologies - Confidential

Backwards adoption

Under almost any threshold argument startups would not adopt big data technology first

10

Page 11: Big data, why now?

11©MapR Technologies - Confidential

Backwards adoption

Under almost any threshold argument startups would not adopt big data technology first

They did

11

Page 12: Big data, why now?

12©MapR Technologies - Confidential

Everywhere at Once?

Something very strange is happening

– Big data is being applied at many different scales

– At many value scales

– By large companies and small

12

Page 13: Big data, why now?

13©MapR Technologies - Confidential

Everywhere at Once?

Something very strange is happening

– Big data is being applied at many different scales

– At many value scales

– By large companies and small

Why?

13

Page 14: Big data, why now?

14©MapR Technologies - Confidential

More data is being produced more quickly

Data sizes are bigger than even a very large computer can hold

Cost to create and store continues to decrease

The Conventional Answer

Page 15: Big data, why now?

15©MapR Technologies - Confidential

Analytics Scaling Laws

Analytics scaling is all about the 80-20 rule

– Big gains for little initial effort

– Rapidly diminishing returns

The key to net value is how costs scale

– Old school – exponential scaling

– Big data – linear scaling, low constant

Cost/performance has changed radically

– IF you can use many commodity boxes

Page 16: Big data, why now?

16©MapR Technologies - Confidential

We knew that

We should have known that

We didn’t know that!

You’re kidding, people do that?

Page 17: Big data, why now?

17©MapR Technologies - Confidential

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue

Anybody with eyes

Intern with a spreadsheet

In-house analytics

Industry-wide data consortium

NSA, non-proliferation

Page 18: Big data, why now?

18©MapR Technologies - Confidential

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue Net value optimum has a

sharp peak well before maximum effort

Page 19: Big data, why now?

19©MapR Technologies - Confidential

But scaling laws are changing both slope and shape

Page 20: Big data, why now?

20©MapR Technologies - Confidential

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue

More than just a little

Page 21: Big data, why now?

21©MapR Technologies - Confidential

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue

They are changing a LOT!

Page 22: Big data, why now?

22©MapR Technologies - Confidential

Page 23: Big data, why now?

23©MapR Technologies - Confidential

Page 24: Big data, why now?

24©MapR Technologies - Confidential

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue

Page 25: Big data, why now?

25©MapR Technologies - Confidential

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue

Page 26: Big data, why now?

26©MapR Technologies - Confidential

2,0000 500 1000 1500

1

0

0.25

0.5

0.75

Scale

Va

lue

Initially, linear cost scaling actually makes things worse

A tipping point is reached and things change radically …

Page 27: Big data, why now?

27©MapR Technologies - Confidential

Pre-requisites for Tipping

To reach the tipping point,

Algorithms must scale out horizontally

– On commodity hardware

– That can and will fail

Data practice must change

– Denormalized is the new black

– Flexible data dictionaries are the rule

– Structured data becomes rare

Page 28: Big data, why now?

28©MapR Technologies - Confidential

But there is more

Especially for large enterprises

Page 29: Big data, why now?

29©MapR Technologies - Confidential

Physics of startup companies

Page 30: Big data, why now?

30©MapR Technologies - Confidential

For startups

History is always small

The future is huge

Must adopt new technology to survive

Compatibility is not as important

– In fact, incompatibility is assumed

Page 31: Big data, why now?

31©MapR Technologies - Confidential

Startup phase

Absolute growth still very large

Physics of large companies

Page 32: Big data, why now?

32©MapR Technologies - Confidential

For large businesses

Present state is always large

Relative growth is much smaller

Absolute growth rate can be very large

Must adopt new technology to survive

– Cautiously!

– But must integrate technology with legacy

Compatibility is crucial

Page 33: Big data, why now?

33©MapR Technologies - Confidential

The startup technology picture

Old computersand software

Current computersand software

Expected hardwareand software growth

No compatibility requirement

Page 34: Big data, why now?

34©MapR Technologies - Confidential

The large enterprise picture

Proof of concept Hadoop cluster

Long-term Hadoop cluster

Current hardwareand software

?

Must worktogether

Page 35: Big data, why now?

35©MapR Technologies - Confidential

So that is why and why now

35

Page 36: Big data, why now?

36©MapR Technologies - Confidential

So that is why, and why now

What can you do with it?

And how?

36

Page 37: Big data, why now?

37©MapR Technologies - Confidential

Scale-free Computing

Map-reduce

– pure functions for practical batch parallel computation

– high level languages like Hive and Pig available

– MapR provides standard access systems via NFS and ODBC

BSP

– pure functions for synchronous iterative actor-based compute

– Apache Giraph provides practical implementation

Actors

– tuple passing with transformations

– Storm provides practical implementation

Page 38: Big data, why now?

38©MapR Technologies - Confidential

Future Proof Schemas

Denormalize data where possible to avoid seeks

– use embedded lists

– duplicate data

Flexible Schemas

– use standard system for data serialization

– must provide protocol migration without versioning

– Protobufs (Google), Avro (Apache) and Thrift can all be used

Page 39: Big data, why now?

39©MapR Technologies - Confidential

Open Compute and Storage

Big data has mass and inertia

– once it lands, it should not move

Computation must move to the data

– map-reduce, Storm, Giraph … all OK

– conventional relational models … not OK

One model is not enough

– must allow access by multiple models of computation

Page 41: Big data, why now?

41©MapR Technologies - Confidential

Thank You