Top Banner
Roman Nikitchenko, 22.02.2015 SUBJECTIVE BIG DATA NO BOOK FOR YOU ANYMORE FRAMEWORKS
56
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big data & frameworks: no book for you anymore

Roman Nikitchenko, 22.02.2015

SUBJECTIVE

BIG DATANO BOOK FOR YOU

ANYMORE

FRAMEWORKS

Page 2: Big data & frameworks: no book for you anymore

2

WHAT WE WANT

CHEAPER No bike reinventions anymore

FASTER time to marked — part of job

is done

BETTER Quality of proven approaches

FRAMEWORKS

Page 3: Big data & frameworks: no book for you anymore

3

WHAT WE GETFRAMEWORKS

OFTEN

Page 4: Big data & frameworks: no book for you anymore

4

CAN CHIMPS DO BIG DATA?

Real shocking title book available for pre-order. This is exactly what happens now in Big Data industry.

Roses are red.

Violets are blue.

We do Hadoop

What about YOU?

Page 5: Big data & frameworks: no book for you anymore

5

SCALEBIG DATA IS ABOUT...

GET CHMIPS OUT OF

DATACENTER

Page 6: Big data & frameworks: no book for you anymore

6

BIG DATASO HOW TO DO FRAMEWORKING...

WHEN YOU DO

Page 7: Big data & frameworks: no book for you anymore

7

YARNwe do Big Data with Hadoop

Page 8: Big data & frameworks: no book for you anymore

8

FRAMEWORKIs an essential supporting structure of a building, vehicle, or object.

In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software.

Page 9: Big data & frameworks: no book for you anymore

9

FRAMEWORKS DICTATE APPROACH

Frameworks are to lower amount of job by reusing. The more you can reuse the better. But complex framework are

too massive to be flexible. They limit your solutions.

Doing Big Data you usually build unique solution.

Page 10: Big data & frameworks: no book for you anymore

10

SO DO I NEED UNIQUE FRAMEWORKS

FOR EVERY BIG DATA PROJECT?

Page 11: Big data & frameworks: no book for you anymore

11

x MAX+

=

BIG DATA

BIG DATA

BIG DATA

HADOOP as INFRASTRUCTURE

Page 12: Big data & frameworks: no book for you anymore

12

LOOKS LIKE THIS

Page 13: Big data & frameworks: no book for you anymore

13

OPEN SOURCE framework for big data. Both distributed storage and processing.

Provides RELIABILITY and fault tolerance by SOFTWARE design. Example — File system as replication factor 3 as default one.Horisontal scalability from

single computer up to thousands of nodes.

INFRASTRUCTURE

3 SIMPLE HADOOP PRINCIPLES

Page 14: Big data & frameworks: no book for you anymore

14

HADOOP INFRASTRUCTURE AS

A FRAMEWORK

● Is formed from large number of unified nodes.

● Nodes are replaceable.

● Simple hardware without sophisticated I/O.

● Reliability by software.

● Horizontal scalability.

Page 15: Big data & frameworks: no book for you anymore

15

FRAMEWORKS INFRASTRUCTURE

APPROACH COMPLEXITY

LIMITATIONSOVERHEAD

Page 16: Big data & frameworks: no book for you anymore

16

How everyone (who usually sells something) depicts

Hadoop complexity

GREAT BIG INFRASTRUCTURE AROUND

SMALL CUTE CORE

YOUR APPLICATION

SAFE and FRIENDLY

Page 17: Big data & frameworks: no book for you anymore

17

How it looks from the real user point of view

Feeling of something wrong

CORE HADOOPC

OM

PLETELY

UN

KN

OW

N

INFR

AS

TR

UC

TU

RE

SO

METH

ING

YO

U

UN

DER

STA

ND

YOUR APPLICATION

FEAR OF

Page 18: Big data & frameworks: no book for you anymore

18

But... imagine we have BIG DATA bricks. How should they look like?

Page 19: Big data & frameworks: no book for you anymore

19

WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION?

● We should build unique solutions using the same approaches.

● So bricks are to be flexible.

Page 20: Big data & frameworks: no book for you anymore

20

WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION?

● We should build robust solution with high reliability.

● Bricks are to be simple and replacable.

Page 21: Big data & frameworks: no book for you anymore

21

WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION?

● We should be able to change our solution over the time.

● Bricks are to be small.

Page 22: Big data & frameworks: no book for you anymore

22

WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION?

● As flexible as it is possible.

● Focused on specific aspect without large infrastructure required.

● Simple and interchangable.

Page 23: Big data & frameworks: no book for you anymore

23

HADOOP 2.x CORE AS A FRAMEWORK BASIC BLOCKS

● ZooKeeeper as coordinational service.● HDFS as file system layer.● YARN as resource management.● MapReduce as basic distributed processing option.

Page 24: Big data & frameworks: no book for you anymore

24

HADOOP HAS LAYERS

RESOURCE MANAGEMENT

DISTRIBUTED PROCESSING

FILE SYSTEM

COORDINATION

HADOOP 2.x CORE

Page 25: Big data & frameworks: no book for you anymore

25

PACKAGING ...

RUBIK's CUBE

STYLE

● Hadoop packaging is non-trivial task.

● It gets more complex when you add Apache Spark, SOLR or Hbase indexer.

Page 26: Big data & frameworks: no book for you anymore

26

Hadoop: don't do it yourself

REUSE AS IS● BASIC infrastructure is pretty reusable to build

with it. At least unless you know it well.

● Do you have manpower to re-implement it? You'd beeeter contribute in this case.

Page 27: Big data & frameworks: no book for you anymore

27

WHERE TO GO FROM HERE?

Page 28: Big data & frameworks: no book for you anymore

28

HERE PEOPLE START TO ADD EVERY FRAMEWORK THEY

KNOW ABOUT...

Page 29: Big data & frameworks: no book for you anymore

29

YARNAT LEAST WE DO IT ONE BY ONE

Page 30: Big data & frameworks: no book for you anymore

30

WHAT DO WE USUALLY EXPECT FROM NEW FRAMEWORK?

BETTER

CHEAPER

FASTER frameworks provide

higher layer of abstraction so

coding go faster

some part of work is

already done

top framework contributors are

usually top engineers

Page 31: Big data & frameworks: no book for you anymore

31

OOOPS...

BETTER

CHEAPER

FASTER frameworks provide

higher layer of abstraction so

coding go faster

some part of work is

already done

top framework contributors are

usually top engineersAdditional cost of

new framework maintenance

Additional time of learning new approach

Lot of defects due to lack of experience with new framework

Page 32: Big data & frameworks: no book for you anymore

32

BETTER

CHEAPER

FASTER frameworks provide

higher layer of abstraction so

coding go faster

some part of work is

already done

top framework contributors are

usually top engineersAdditional cost of

new framework maintenance

Additional time of learning new approach

Lot of defects due to lack of experience with new frameworkNONEXISTENT

ONLY TWO?

Page 33: Big data & frameworks: no book for you anymore

33

JUST FEW EXAMPLES

● Spring batch — main thread who started spring context forgot to check task accomplishment status.

● Apache Spark — persistence to disk was limited to 2GB due to ByteBuffer int limitation.

● Apaceh Hbase has by now no effective guard against client RPC timeout.

● What about binary data like hashes? No effective out-of-the-box support by now.

ONLY R

EAL

EXPERIENCE

NEW FRAMEWORKS ARE ALWAYS HEADACHE

Page 34: Big data & frameworks: no book for you anymore

34

%^#@#^&@#&#%@ !!!

Page 35: Big data & frameworks: no book for you anymore

35

JUST LONGER PERSPECTIVE?

When you use the same approach for a long time you do it more and more

effective.

Page 36: Big data & frameworks: no book for you anymore

36

JAVA MESSAGE SERVICE

APACHE SPARK

1.0.2b (June 25, 2001)

1.1 (April 12, 2002)

2.0 (May 21, 2013)

0.9.0 (Feb 2, 2014)

1.0 (May 30, 2014)

1.1 (Sep 11, 2014)

1.2 (Dec 18, 2014)

JUST FEEL SPEED DIFFERENCEBUT

Page 37: Big data & frameworks: no book for you anymore

37

FULL DATA PROCESSING PLATFORM SUPPORTING YARN

Page 38: Big data & frameworks: no book for you anymore

38

SO BIG DATA TECHNOLOGY BOOKS ARE ALWAYS OUTDATED

Great books but when they are printed they are already old. Read original E-books with updates.

Page 39: Big data & frameworks: no book for you anymore

39

DO NOT HIDE YOUR EXPERIENCE

Page 40: Big data & frameworks: no book for you anymore

40

FRAMEWORKS IN BIG DATA HAMSTERS vs HIPSTERS

We hate frameworks! Only

hardcore, only JDK!

Give me framework for every step!

Page 41: Big data & frameworks: no book for you anymore

41

FRAMEWORKS IN BIG DATA HAMSTERS vs HIPSTERS

Significant overhead even comparing to MapReduce

access

Most simple way to access your Hbase data for analytics.

Apache Hbase is top OLTP solution for Hadoop. Hive can provide SQL connector to it.

Hbase direct RPC for OLTP, MapReduce or Spark when you need performance and Hive when you need faster implementation.

Crazy idea: Hive running over Hbase table snapshots.

Page 42: Big data & frameworks: no book for you anymore

42

FAST FEATURE DEVELOPMENT

ACTIVE COMMUNITY

STABLE REUSABLE ARCHITECTURE

OUR BIG DATA FRAMEWORKS CRITERIA

Page 43: Big data & frameworks: no book for you anymore

43

ETL: FRAMEWORKS COST

● We do object transformations when we do ETL from SQL to NoSQL objects.

● Practically any ORM framework eats at least 10% of CPU resource.

● Is it small or big amount? Depends who pays...

SQLserver

JOIN

Table1

Table2

Table3

Table4 BIG DATA shard

BIG DATA shard

BIG DATA shardETL stream

ETL stream

ETL stream

ETL stream

Page 44: Big data & frameworks: no book for you anymore

44

10% overhead...

● Single desktop application - computers usually have unused CPU power. 10% overhead is not so notable for user so user accepts it.

● User pays for electricity and hardware.

Page 45: Big data & frameworks: no book for you anymore

45

● Lot of mobile clients. Can tolerate 10% performance degradation. Application still works.

● All users pay for your 10% performance overhead.

10% overhead...

Page 46: Big data & frameworks: no book for you anymore

46

● Single server solution. OK, usually you have 10% spare.

● So you pay for overhead but you don't notice it before it is needed. You have the same 1 server.

10% overhead...

Page 47: Big data & frameworks: no book for you anymore

47

● 10% overhead of 1000 servers with properly distributed job means up to 100 servers additionaly needed.

● This is your direct maintenance costs.

10% overhead...

IN CLUSTERS YOU DIRECTLY PAY FOR OVERHEAD WITH ADDITIONAL

CLUSTER NODES.

Page 48: Big data & frameworks: no book for you anymore

48

WHAT FRAMEWORK IS REALLY GOOD FOR YOU?

● If you know amount (and cost) of job to replace framework, this is really good for you.

Page 49: Big data & frameworks: no book for you anymore

49

MAKING YOUR OWN FRAMEWORK

● Most common reason for your own framework is … growing complexity and support cost.

● New framework development and migration can be cheeper than support of existing solutions.

● You don't want to depend on existing framework development.

Page 50: Big data & frameworks: no book for you anymore

50

MAKING FRAMEWORK LAZY STYLE

● First do multiple solutions than integrate them into single approach.

● GOOD You only integrate what is already used so less unused work.

● BAD Your act reactive.

Page 51: Big data & frameworks: no book for you anymore

51

MAKING FRAMEWORK PROACTIVE STYLE

● You improve framework before actual need.

● GOOD You are guided by approach, not need, so usually you have more clear design.

● BAD Your have more probability to do not needed things.

Page 52: Big data & frameworks: no book for you anymore

52

OUTSIDE YOUR TEAM

● Great, you have additional workforce. But from now you have external support tickets.

● Usually you can control your users so major changes are yet possible but harder.

● Pay more attention to documentation and trainings for other teams. It pays back.

Page 53: Big data & frameworks: no book for you anymore

53

OUTSIDE YOUR COMPANY

● You receive additional workforce. People start contributing into your framwork. Don't be so optimistic.

● Community support is good but you need to support community applications.

● You are no longer flexible. You don't control users of your framework.

Page 54: Big data & frameworks: no book for you anymore

54

LESSONS LEARNEDCORE

● Avoid inventing unique approach for every Big Data solution. It is critical to have good relatively stable ground.

● Your Big Data CORE architecture is to be layered infrastructure constructed from small, simple, unified, replaceable components (UNIX way).

● Be ready for packaging issues but try to reuse as maximum as possible on CORE layer.

Page 55: Big data & frameworks: no book for you anymore

55

LESSONS LEARNED● Selecting frameworks to extend your big

data core prefer solutions with stable approach, flexible functionality and healthy community. Revise your approaches as world changes fast.

● Prefer to contribute to good existing solution rather than start your own.

● The more frequent you change something, the more higher layer tool you need for this. But in big data you directly pay for any performance overhead.

● If you have started your own framework, the more popular it is, the fewer freedom to modify you have so the only flexibility is bad reason to start.

BEYOND THE

CORE

Page 56: Big data & frameworks: no book for you anymore

56

Questions and discussion