Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – The Data Operation System Tom Benton
Dec 05, 2014
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN – The Data Operation System Tom Benton
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce Largely Batch Processing
2006
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Tradi5onal Hadoop Traditional Hadoop allowed early adopters to deal with data at scale via: • Single purpose clusters, specific data sets
• Primarily batch-oriented applications using MapReduce
However… • No direct way to integrate interactive and real-time
applications
• Limited enterprise capabilities: Operations, Security & Governance
In the beginning…
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce Largely Batch Processing
2006 JAN 2008
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Tradi5onal Hadoop
MAPREDUCE-279 Outlines a NEW architecture for Hadoop which allows for efficient use of resources across many types of apps
…with increased adoption and breadth of use cases, a new approach was needed
2011 Hortonworks Founded Work accelerates on Hadoop’s next-gen architecture
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional Hadoop, challenges & limitations
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce Largely Batch Processing
SOU
RC
ES
EXISTING Systems
Clickstream Web &Social Geoloca5on Sensor & Machine
Server Logs Unstructured
Architectural Limitations • Primarily a batch system using MapReduce • Single purpose clusters, specific data sets
Enterprise Challenges • Limited enterprise capabilities:
Operations, Security & Governance • Created additional Silos
Interoperability Challenges • Difficult to natively integrate existing applications
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
RDBMS EDW MPP
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN Has Fundamentally Changed Hadoop
YARN enables: • More Workloads
From batch to interactive & real-time
• More Data Multiple data sets of varying types and structures
• More Value Hosting multiple business cases in a single Hadoop cluster
Enterprise Hadoop Enables…
• More Workloads From batch to interactive & real-time
• More Data Multiple data sets of varying types and structures
• More Value Hosting multiple business cases in a single Hadoop cluster
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
2008 2006
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File System)
MapReduce Largely Batch Processing
Tradi5onal Hadoop
MAPREDUCE-‐279
2011
Enterprise Hadoop Era Begins October 23, 2013
Hadoop 2 & YARN
YARN : Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
° N
HDFS (Hadoop Distributed File System)
Batch Interactive Real-Time
Core of Enterprise Hadoop
Architected & led development of YARN to enable the Modern Data Architecture
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Benefits Enabled by MDA and YARN SOLUTION: A single set of data across the entire cluster with multiple access methods using “zones” for processing
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° ° ° ° ° ° ° n
Interactive Hive
Storm Real Time Streams
Single Cluster, Multiple Workloads • Maximize compute
resources to lower TCO
• No standalone, siloed clusters
• Simple management & operations
…all enabled by YARN
Batch Pig
Real Time HBase
Spark In Memory
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN and HDP Enables the Modern Data Architecture
HDP Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS
YARN: Data Operating System
DATA MANAGEMENT
SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
GOVERNANCE & INTEGRATION
Authentication Authorization Accounting
Data Protection
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Other ISVs
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
In-Memory
Spark
YARN is the architectural center of Hadoop and HDP • YARN enables a common data set
across all applications
• Batch, interactive & real-time workloads
• Support multi-tenant access & processing
HDP enables Apache Hadoop to become Enterprise Viable Data Platform with centralized services • Security
• Governance
• Operations
• Productization
Enabled broad ecosystem adoption
Tez Tez
Hortonworks drove this innovation of Hadoop through YARN
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank You! Questions?
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
Interactive Real-Time Batch
1 ° ° °
° ° ° °
HDFS (Hadoop Distributed File System)