Ghislain Fourny
Big Data for Engineers Spring 20199. Resource Management
artjazz / 123RF Stock Photo
2
Data Technology Stack
Storage
Encoding
Syntax
Data models
Validation
Processing
Indexing
Data stores
User interfaces
Querying
3
Where we are
Storage
Encoding
Syntax
Data models
Validation
Processing
Indexing
Data stores
User interfaces
Querying
4
Last week: MapReduceInput data
Output data
Intermediate data (shuffled)
Map Map Map Map Map Map Map Map
Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce
5
Hadoop infrastructure (version 1)Namenode
+JobTracker
/dir/file
Datanode+
TaskTracker
Datanode+
TaskTracker
Datanode+
TaskTracker
Datanode+
TaskTracker
Datanode+
TaskTracker
Datanode+
TaskTracker
6
Responsibilities of the MapReduce JobTracker
Resource Management
MonitoringJob lifecycle
Fault-tolerance
Scheduling
8
Issue 2: bottleneck
TaskTracker
JobTracker
TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker
8
Bottleneck
14
YARN
Scheduling
Applicationmanagement
Monitoring
Resource Manager Application MasterApplication MasterApplication MasterApplication MasterApplication Master
15
Framework-specific application masters
MapReduce
DAG distributed processing
Message Passing Interface
Graph processing
16
Scales more
M
10,000 nodes 100,000 tasks
M M M M
M M M M M
M M M M M
M M M M M
M M M M M
M M M M M
17
YARN architecture
NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager
ResourceManager
18
YARN architecture
NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager
ResourceManager
ContainerContainerContainer
22
YARN
ResourceManager
NodeManager NodeManager NodeManager NodeManager NodeManager
Container
ContainerContainer
23
YARN
ResourceManager
NodeManager NodeManager NodeManager NodeManager NodeManager
Container
ContainerContainer
Client
Job
24
YARN
ResourceManager
NodeManager NodeManager NodeManager NodeManager NodeManager
Container
ContainerContainer
Client
Job
Schedules
25
YARN
ResourceManager
NodeManager NodeManager NodeManager NodeManager NodeManager
ContainerContainer
Client
Job
Schedules
Application Master
26
YARN
ResourceManager
NodeManager NodeManager NodeManager NodeManager NodeManager
ContainerContainer
Client
Job
Application Master
27
YARN
ResourceManager
NodeManager NodeManager NodeManager NodeManager NodeManager
ContainerContainer
Client
Job
Application Master
28
Application Master communicates with containers
Application Master
Container
Container
Container
ContainerExecuteMonitorRestart