Top Banner
Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz / 123RF Stock Photo
47

Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

May 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

Ghislain Fourny

Big Data for Engineers Spring 20199. Resource Management

artjazz / 123RF Stock Photo

Page 2: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

2

Data Technology Stack

Storage

Encoding

Syntax

Data models

Validation

Processing

Indexing

Data stores

User interfaces

Querying

Page 3: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

3

Where we are

Storage

Encoding

Syntax

Data models

Validation

Processing

Indexing

Data stores

User interfaces

Querying

Page 4: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

4

Last week: MapReduceInput data

Output data

Intermediate data (shuffled)

Map Map Map Map Map Map Map Map

Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce

Page 5: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

5

Hadoop infrastructure (version 1)Namenode

+JobTracker

/dir/file

Datanode+

TaskTracker

Datanode+

TaskTracker

Datanode+

TaskTracker

Datanode+

TaskTracker

Datanode+

TaskTracker

Datanode+

TaskTracker

Page 6: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

6

Responsibilities of the MapReduce JobTracker

Resource Management

MonitoringJob lifecycle

Fault-tolerance

Scheduling

Page 7: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

7

Issue 1: scalability

M M MM M MM M MM M M

< 4,000 nodes < 40,000 tasks

Page 8: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

8

Issue 2: bottleneck

TaskTracker

JobTracker

TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

8

Bottleneck

Page 9: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

9

Issue 3: Jack of all trades

9

Scheduling

Monitoring

Page 10: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

10

Issue 4: Utilization (task slots)

10

Static(Decide on M/R at configuration time)

Fixed-size

Page 11: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

11

Issue 5: Not fungible

11

Map Reduce

Working atmaximum capacity

Idle

Page 12: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

12

YARN

kirtchanut / 123RF Stock Photo

Page 13: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

13

YARN

YetAnotherResourceNegotiator

Page 14: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

14

YARN

Scheduling

Applicationmanagement

Monitoring

Resource Manager Application MasterApplication MasterApplication MasterApplication MasterApplication Master

Page 15: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

15

Framework-specific application masters

MapReduce

DAG distributed processing

Message Passing Interface

Graph processing

Page 16: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

16

Scales more

M

10,000 nodes 100,000 tasks

M M M M

M M M M M

M M M M M

M M M M M

M M M M M

M M M M M

Page 17: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

17

YARN architecture

NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager

ResourceManager

Page 18: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

18

YARN architecture

NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager

ResourceManager

ContainerContainerContainer

Page 19: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

19

Resources

Memory

CPU

Disk

NetworkWor

k in

pro

gres

s

Page 20: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

20

Container

X GB

Y TB

W cores, U GHz

Z MBps

Page 21: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

21

Container

Page 22: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

22

YARN

ResourceManager

NodeManager NodeManager NodeManager NodeManager NodeManager

Container

ContainerContainer

Page 23: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

23

YARN

ResourceManager

NodeManager NodeManager NodeManager NodeManager NodeManager

Container

ContainerContainer

Client

Job

Page 24: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

24

YARN

ResourceManager

NodeManager NodeManager NodeManager NodeManager NodeManager

Container

ContainerContainer

Client

Job

Schedules

Page 25: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

25

YARN

ResourceManager

NodeManager NodeManager NodeManager NodeManager NodeManager

ContainerContainer

Client

Job

Schedules

Application Master

Page 26: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

26

YARN

ResourceManager

NodeManager NodeManager NodeManager NodeManager NodeManager

ContainerContainer

Client

Job

Application Master

Page 27: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

27

YARN

ResourceManager

NodeManager NodeManager NodeManager NodeManager NodeManager

ContainerContainer

Client

Job

Application Master

Page 28: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

28

Application Master communicates with containers

Application Master

Container

Container

Container

ContainerExecuteMonitorRestart

Page 29: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

29

Pure scheduler

Does not monitor tasks.

Does not restart upon failure.

ResourceManager

Page 30: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

30

Scheduling strategies: pluggable scheduler

Page 31: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

31

Scheduling strategies: pluggable scheduler

FIFO scheduler

Page 32: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

32

Scheduling strategies: pluggable scheduler

Capacity scheduler

Queue 1

Queue 2

Page 33: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

33

Scheduling strategies: pluggable scheduler

Capacity scheduler

Queue 1

Queue 2

Page 34: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

34

Scheduling strategies: pluggable scheduler

Capacity scheduler

Queue 1

Queue 2

Page 35: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

35

Scheduling strategies: pluggable scheduler

Capacity scheduler

Queue 1

Queue 2

Page 36: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

36

Scheduling strategies: pluggable scheduler

Capacity scheduler

Queue 1

Queue 2

Page 37: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

37

Scheduling strategies: pluggable scheduler

Capacity scheduler

Queue 1

Queue 2

Page 38: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

38

Steady Fair Share

40% 10% 50%Math Physics CS

Total: 500 GB

200 GB 50 GB 250 GB

Page 39: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

39

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 40: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

40

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 41: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

41

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 42: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

42

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 43: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

43

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 44: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

44

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 45: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

45

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 46: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

46

Scheduling strategies: pluggable scheduler

Fair scheduler

Page 47: Ghislain Fourny Big Data for Engineers Spring 2019 · Ghislain Fourny Big Data for Engineers Spring 2019 9. Resource Management artjazz/ 123RF Stock Photo. 2 Data Technology Stack

47

Summary

Separation between scheduling and monitoring

Scalability

Availability

Multi-tenancy