(Big Data) 2 How YARN Timeline Service v.2 Unlocks 360-Degree Platform Insights at Scale Sangjin Lee @sjlee (Twitter) Joep Rottinghuis @joep (Twitter)
Jan 06, 2017
(Big Data)2How YARN Timeline Service v.2 Unlocks 360-
Degree Platform Insights at Scale
Sangjin Lee @sjlee (Twitter)Joep Rottinghuis @joep (Twitter)
Outline• Why v.2?
• Highlights
• Developing for Timeline Service v.2
• Setting up Timeline Service v.2
• Milestones
• Demo
Why v.2?• YARN Timeline Service v 1.x
• Gained good adoption: Tez, HIVE, Pig, etc.
• Keeps improving with v 1.5 APIs and storage implementation
• Still facing some fundamental challenges...
Why v.2?• Scalability and reliability challenges
• Single instance of Timeline Server
• Storage (single local LevelDB instance)
• Usability
• Flow
• Metrics and configuration as first-class citizens
• Metrics aggregation up the entity hierarchy
Highlightsv.1 v.2
Single writer/reader Timeline Server Distributed writer/collector architecture
Single local LevelDB storage* Scalable storage (HBase)
v.1 entity model New v.2 entity model
No aggregation Metrics aggregation
REST API Richer query REST API
Architecture• Separation of writers (“collectors”) and readers
• Distributed collectors: one collector for each app
• Dedicated RM collector for RM-generated data
• Collector discovery via RM
• Pluggable storage with HBase as default storage
Distributed collectors & readers
What is a flow?• A flow is a group of YARN
applications that are launched as parts of a logical app
• Oozie, Scalding, Pig, etc.• name:
“frequent_visitor_stat”• run id: 1466097809000• version: “b9b9068”
Configuration and metrics• Now explicit top-level attributes
of entities• Fine-grained updates and
queries made possible• “update metric A to value x”
• “query entities where config A = B”
Configuration and metrics• Now explicit top-level attributes
of entities• Fine-grained updates and
queries made possible• “update metric A to value x”
• “query entities where config A = B”
HBase Storage• Scalable backend• Row Key structure
• efficient range scans
• KeyPrefixRegionSplitPolicy
• Filter pushdown• Coprocessors for flow aggregation (“readless” aggregation)
• Cell tags for metadata (application id, aggregation operation)• Cell timestamps generated during put
• left shifted with app id added to avoid overwrites
Tables in HBase• flow run
• application
• entity
• flow activity
• app to flow
table: flow runRow key: clusterId!userName!flowName!inverted(flowRunId)
• most recent flow run stored first• coprocessor enabled
table: applicationRow key: clusterId!userName!flowName!inverted(flowRunId)!AppId
• applications within a flow run stored together
• most recent flow run stored first
table: entityRow key: userName!clusterId!flowName!inverted(flowRunId)!AppId!entityType!entityId
• entities within an application within a flow run stored together per type• for example, all containers within a yarn application will
be stored together• pre-split table• stores information per entity run like info, relatesTo,
relatedTo, events, metrics, config
table: flow activityRow key: clusterId!inverted(TopOfTheDay)!userName!flowName
• shows the flows that ran on that day• stores information per flow like
number of runs, the run ids, versions
table: appToFlowRow key: clusterId!appId
- stores mapping of appId to flowName and flowRunId
Metrics aggregation• Application level
• Rolls up sub-application metrics
• Performed in real time in the collectors in memory
• Flow run level• Rolls up app level metrics
• Performed in HBase region servers via coprocessors
• Offline aggregation (TBD)
• Rolls up on user, queue, and flow offline periodically
• Phoenix tables
FlowRun Aggregation
via the HBaseCoprocessor
AppMetrics
Cells in
HBase
FlowRun
MetricSum
AppMetrics
Cellsin
HBase
FlowRun
MetricSum
FlowRun Aggregation
via the HBaseCoprocessor
Reader REST API: paths• URLs under /ws/v2/timeline
• Canonical REST style URLs: /ws/v2/timeline/clusters/cluster_name/users/user_name/flows/flow_name/runs/run_id
• Path elements may be omitted if they can be inferred
• flow context can be inferred by app id
• default cluster is assumed if cluster is omitted
Setting up Timeline Service v.2• Set up the HBase cluster (1.1.x)
• Add the timeline service jar to HBase
• Install the flow run coprocessor
• Create tables via TimelineSchemaCreator utility
• Configure the YARN cluster
• Enable Timeline Service v.2
• Add hbase-site.xml for the timeline collector and readers
• Start the timeline reader daemon
Milestone 1 ("Alpha 1")• Merge discussion (YARN-2928) in progress as we
speak!✓ Complete end-to-end read/write flow
✓ Real time application and flow aggregation
✓ New entity model
✓ HBase Storage
✓ Rich REST API
✓ Integration with Distributed Shell and MapReduce
✓ YARN generic events and system metrics
Milestones - Future• Milestone 2 (“Alpha 2”)
• Integration with new YARN UI
• Integration with more frameworks
• Beta• Freeze API and storage
schema• Security• Collectors as containers• Storage fault tolerance• Production-ready• Migration-ready
Contributors• Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks)
• Varun Saxena, Naganarasimha G. R. (Huawei)
• Sangjin Lee, Vrushali Channapattan, Joep Rottinghuis (Twitter)
• Zhijie Shen (now at Facebook)
• The HBase and Phoenix community!
Thank you!