HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform insights at scale

(Big Data)2How YARN Timeline Service v.2 Unlocks 360-

Degree Platform Insights at Scale

Sangjin Lee @sjlee (Twitter)Joep Rottinghuis @joep (Twitter)

Outline• Why v.2?

• Highlights

• Developing for Timeline Service v.2

• Setting up Timeline Service v.2

• Milestones

• Demo

Why v.2?• YARN Timeline Service v 1.x

• Gained good adoption: Tez, HIVE, Pig, etc.

• Keeps improving with v 1.5 APIs and storage implementation

• Still facing some fundamental challenges...

Why v.2?• Scalability and reliability challenges

• Single instance of Timeline Server

• Storage (single local LevelDB instance)

• Usability

• Flow

• Metrics and configuration as first-class citizens

• Metrics aggregation up the entity hierarchy

Highlightsv.1 v.2

Single writer/reader Timeline Server Distributed writer/collector architecture

Single local LevelDB storage* Scalable storage (HBase)

v.1 entity model New v.2 entity model

No aggregation Metrics aggregation

REST API Richer query REST API

Architecture• Separation of writers (“collectors”) and readers

• Distributed collectors: one collector for each app

• Dedicated RM collector for RM-generated data

• Collector discovery via RM

• Pluggable storage with HBase as default storage

Distributed collectors & readers

What is a flow?• A flow is a group of YARN

applications that are launched as parts of a logical app

• Oozie, Scalding, Pig, etc.• name:

“frequent_visitor_stat”• run id: 1466097809000• version: “b9b9068”

Configuration and metrics• Now explicit top-level attributes

of entities• Fine-grained updates and

queries made possible• “update metric A to value x”

• “query entities where config A = B”

Configuration and metrics• Now explicit top-level attributes

of entities• Fine-grained updates and

queries made possible• “update metric A to value x”

• “query entities where config A = B”

HBase Storage• Scalable backend• Row Key structure

• efficient range scans

• KeyPrefixRegionSplitPolicy

• Filter pushdown• Coprocessors for flow aggregation (“readless” aggregation)

• Cell tags for metadata (application id, aggregation operation)• Cell timestamps generated during put

• left shifted with app id added to avoid overwrites

Tables in HBase• flow run

• application

• entity

• flow activity

• app to flow

table: flow runRow key: clusterId!userName!flowName!inverted(flowRunId)

• most recent flow run stored first• coprocessor enabled

table: applicationRow key: clusterId!userName!flowName!inverted(flowRunId)!AppId

• applications within a flow run stored together

• most recent flow run stored first

table: entityRow key: userName!clusterId!flowName!inverted(flowRunId)!AppId!entityType!entityId

• entities within an application within a flow run stored together per type• for example, all containers within a yarn application will

be stored together• pre-split table• stores information per entity run like info, relatesTo,

relatedTo, events, metrics, config

table: flow activityRow key: clusterId!inverted(TopOfTheDay)!userName!flowName

• shows the flows that ran on that day• stores information per flow like

number of runs, the run ids, versions

table: appToFlowRow key: clusterId!appId

- stores mapping of appId to flowName and flowRunId

Metrics aggregation• Application level

• Rolls up sub-application metrics

• Performed in real time in the collectors in memory

• Flow run level• Rolls up app level metrics

• Performed in HBase region servers via coprocessors

• Offline aggregation (TBD)

• Rolls up on user, queue, and flow offline periodically

• Phoenix tables

FlowRun Aggregation

via the HBaseCoprocessor

AppMetrics

Cells in

HBase

FlowRun

MetricSum

AppMetrics

Cellsin

HBase

FlowRun

MetricSum

FlowRun Aggregation

via the HBaseCoprocessor

Reader REST API: paths• URLs under /ws/v2/timeline

• Canonical REST style URLs: /ws/v2/timeline/clusters/cluster_name/users/user_name/flows/flow_name/runs/run_id

• Path elements may be omitted if they can be inferred

• flow context can be inferred by app id

• default cluster is assumed if cluster is omitted

Setting up Timeline Service v.2• Set up the HBase cluster (1.1.x)

• Add the timeline service jar to HBase

• Install the flow run coprocessor

• Create tables via TimelineSchemaCreator utility

• Configure the YARN cluster

• Enable Timeline Service v.2

• Add hbase-site.xml for the timeline collector and readers

• Start the timeline reader daemon

Milestone 1 ("Alpha 1")• Merge discussion (YARN-2928) in progress as we

speak!✓ Complete end-to-end read/write flow

✓ Real time application and flow aggregation

✓ New entity model

✓ HBase Storage

✓ Rich REST API

✓ Integration with Distributed Shell and MapReduce

✓ YARN generic events and system metrics

Milestones - Future• Milestone 2 (“Alpha 2”)

• Integration with new YARN UI

• Integration with more frameworks

• Beta• Freeze API and storage

schema• Security• Collectors as containers• Storage fault tolerance• Production-ready• Migration-ready

Contributors• Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks)

• Varun Saxena, Naganarasimha G. R. (Huawei)

• Sangjin Lee, Vrushali Channapattan, Joep Rottinghuis (Twitter)

• Zhijie Shen (now at Facebook)

• The HBase and Phoenix community!

Thank you!

HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform insights at scale

Engineering