Transcript

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

HDP2 and

YARN operations pointRyu Kobayashi

Treasure Data Tech Talk 11 and 12 Mar 2015

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Who am I?• Ryu Kobayashi • @ryu_kobayashi • https://github.com/ryukobayashi

• Treasure Data, Inc. • Software Engineer

• Background • Hadoop, Cassandra, Machine Learning, ... • I developed Huahin(Hadoop) Framework.

http://huahinframework.org/

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

What is YARN?

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

YARN(Yet Another Resource Negotiator) Architecture

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

• MRv1

• JobTracker

• TaskTracker

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

• YARN

• ResourceManager

• NodeManager

• ApplicationMaster

• Job History Server

• YARN Timeline Server

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

• MRv1

• JobTracker

• TaskTracker

• YARN

• ResourceManager

• NodeManager

• ApplicationMaster

• Job History Server                                          (We  can  not  see  the  log  job  history  If  it  do  not  install)  

• YARN Timeline Server                                          (We  can  not  see  the  log  YARN  history  If  it  do  not  install)

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

YARN Timeline Server • It is included container info

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Note!!!

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Use the Hadoop 2.4.0 and later!!!

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

• The versions which must not be used

• Apache Hadoop 2.2.0

• Apache Hadoop 2.3.0

• HDP 2.0(2.2.0 based)

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

• Currently

• Apache Hadoop 2.6.0

• CDH 5.3.2(2.5.0 based and patch)

• HDP 2.2(2.6.0 based and patch)

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

• Why should not use?

• Capacity Scheduler

• There is a bug

• Fair Scheduler

• There is a bug

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

• Any bugs?

• Each Scheduler will cause a deadlock

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

•In fact, there is a bug in 2.4.0 and 2.6.0…

•It is better to use the new version.

•Note: 2.7.0 and later is a different thing

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Backport Patch

• I was backport some patch

• https://github.com/ryukobayashi/patches

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Backport Patch

• Included dead lock patch

• Format of the counter

• Application kill in Web UI.

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Format of the counter

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Format of the counter

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Application kill in Web UI

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Application kill in Web UI

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Application kill in Web UI • Job kill in Web UI •

(default false)

• Application kill in Web UI •

(default true)

mapreduce.jobtracker.webinterface.trusted

yarn.resourcemanager.webapp.ui-actions.enabled

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Backport Patch

• We want the next…

• Job task attempt kill in Web UI patch (in development)

• Currently, only command line

$ mapred job -kill-task attempt_*

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources

• total container = 4

• concurrent application = 2

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 2

Cluster  Application

App  Master Container

Application

App  Master Container

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources

• total container = 4

• concurrent application = 4

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 4

Cluster  Application

App  Master

Application

App  Master

Application

App  Master

Application

App  Master

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 4

Cluster  Application

App  Master

Application

App  Master

Application

App  Master

Application

App  Master

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 4

Cluster  Application

App  Master

Application

App  Master

Application

App  Master

Application

App  Master

Livelock!

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 4

Cluster  Application

App  Master

Application

App  Master

Application

App  Master

Container

Application

App  Master

Kill

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 4

• ^ squeeze the number of applications

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 4

• ^ squeeze the number of applications

• set the root maxRunningApps

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Matter of resources • total container = 4

• concurrent application = 4

• root maxRunningApps = 2

Cluster  Application

App  Master Container

Application

App  Master

Application

App  Master Container

Application

App  Master

Pending

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

YANR Resource Managementyarn-site.xml

• yarn.nodemanager.resource.memory-mb • (yarn.nodenamager.vmem-pmem-ratio) • (yarn.scheduler.minimum-allocation-mb)

mapred-site.xml • yarn.app.mapreduce.am.resource.mb • mapreduce.map.memory.mb • mapreduce.reduce.memory.mb

fair-scheduler.xml • maxResources, minResources

etc…

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

e.g. Use hdp-configuration-utils.py script http://goo.gl/L2hxyq

Use Ambari http://ambari.apache.org/

See the Cloudera’s document http://goo.gl/EBreca

YANR Resource Management

Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.

Thanks!!!

top related