Hadoop YARN ServicesSteve Loughran– Hortonworks
stevel at hortonworks.com
@steveloughran
ApacheCon EU, November 2014
Apache Hadoop + YARN:
An OS for data
An OS can do more than SQL
statements
An OS can do more than run
admin-installed apps
An OS lets you run whatever
you want!
An OS Offers
• Persistent Storage
• Execution of code
• jobs & services
• scheduling
• Communications
• Security
YARN Services:
Long lived applicationswithin a Hadoop cluster
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
“The RM”
HDFS
YARN Node Manager
• Servers run YARN Node Managers (NM)
• NM's heartbeat to Resource Manager (RM)
• RM schedules work over cluster
• RM allocates containers to apps
• NMs start containers
• NMs report container health
Background: YARN
Client creates App Master
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
“The RM”
HDFS
YARN Node Manager
ClientApplication Master
“AM” requests containers
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
Application Master
Container
Container
Container
Short lived applications
• failure: clean restart
• logs: collect at end
• placement: by data
• security: Kerberos delegation tokens
• discovery: launcher app can track
Long-lived services
• failure: stay available
• logs: ongoing collection
• placement: availability, performance
• security: ??
• discovery: ???
YARN-896Support for YARN services:
Log aggregation
Service registration & discovery
Windowed failure tracking
Anti-affinity placement
Gang scheduling
Applications to continue over AM restart
Container resource flexingContainer reuse
Kerberos token renewal
Container signalling
Net & Disk resources
Labelled nodes & queues
YARN-896
REST
Log aggregation
Service registration & discovery
Windowed failure tracking
Anti-affinity placement
Gang scheduling
Applications to continue over AM restart
Container resource flexingContainer reuse
Kerberos token renewal
Container signalling
Net & Disk resources
Labelled nodes & queues
Hadoop 2.6
(Docker)
REST
YARN-913 Service Registry
$ slider resolve --path \~/services/org-apache-slider/storm1
{ "type" : "JSONServiceRecord","external" : [ {
"api" : "http://","addressType" : "uri","protocolType" : "webui","addresses" : [ {
"uri" : "http://nn.example.com:46132"} ]
}, {"api" : "classpath:org.apache.slider.publisher.configurations","addressType" : "uri","protocolType" : "REST","addresses" : [ {
"uri" : "http://nn.example.com:46132/ws/v1/slider/publisher/slider"}]
} } ] }
Internal and external
"internal" : [ {"api" : "classpath:org.apache.slider.agents.secure","addressType" : "uri","protocolType" : "REST","addresses" : [ {
"uri" : "https://nn.example.com:47749/ws/v1/slider/agents"} ]
} ]
Failures
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
HDFS
YARN Node Manager
Application Master
Container
Container
Container
Failures
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
Container
Container
Failures
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource Manager
Application Master
Container
Container
container 1
container 2
lost: container 3
Easy: enabling
// Client
amLauncher.setKeepContainersOverRestarts(true);
amLauncher.setMaxAppAttempts(8);
// Server
List<Container> liveContainers =
amRegistrationData.getContainersFromPreviousAttempts();
Harder: rebuilding state
Node Map
Placement History
Specification
Container QueuesComponent Map
Event History
Persisted Rebuilt Transient
<property><name>yarn.log-aggregation-enable</name><value>true</value>
</property>
Log Aggregation
$ yarn rmadmin...
-addToClusterNodeLabels [label1,label2,label3] -removeFromClusterNodeLabels [label1,label2,label3]
-replaceLabelsOnNode [node1:port,label1,label2]-directlyAccessNodeLabelStore
Labels
Labels offer
• Separation of workloads
• Separation of service roles
• Separation of production & dev code
• Allocation to specific hardware classes
Security
• Token expiry a core Kerberos feature
• Token expiry inimical to service longevity
• Specifically: token delegation
Security
YARN:
AM/RM token renewal
NM HDFS access for AM container relaunch
You: embrace keytabs, test lots
…so you can now
• Write long lived apps
• with failure resilience
• centralised log viewing
• labelled/isolated placement
• in secure clusters
Why not just use Mesos?
Hadoop is everywhere!
Log aggregation
Service registration & discovery
Windowed failure tracking
Anti-affinity placement
Gang scheduling
Applications to continue over AM restart
Container resource flexingContainer reuse
Kerberos token renewal
Container signalling
Net & Disk resources
Labelled nodes & queues
Hadoop 2.7+
REST
Docker
Questions?
http://hadoop.apache.org