Top Banner
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – The Data Operation System Tom Benton
10

YARN - Strata 2014

Dec 05, 2014

Download

Software

Hortonworks

Part of the core Hadoop project, YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics. It is the foundation of the new generation of Hadoop and is enabling organizations everywhere to realize a Modern Data Architecture.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: YARN - Strata 2014

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN – The Data Operation System Tom Benton

Page 2: YARN - Strata 2014

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

1   °   °   °   °   °  

°   °   °   °   °   N  

HDFS    (Hadoop  Distributed  File  System)  

MapReduce  Largely  Batch  Processing  

2006

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tradi5onal  Hadoop Traditional Hadoop allowed early adopters to deal with data at scale via: •  Single purpose clusters, specific data sets

•  Primarily batch-oriented applications using MapReduce

However… •  No direct way to integrate interactive and real-time

applications

•  Limited enterprise capabilities: Operations, Security & Governance

In the beginning…

Page 3: YARN - Strata 2014

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

1   °   °   °   °   °  

°   °   °   °   °   N  

HDFS    (Hadoop  Distributed  File  System)  

MapReduce  Largely  Batch  Processing  

2006 JAN  2008

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tradi5onal  Hadoop

MAPREDUCE-279 Outlines a NEW architecture for Hadoop which allows for efficient use of resources across many types of apps

…with increased adoption and breadth of use cases, a new approach was needed

2011 Hortonworks Founded Work accelerates on Hadoop’s next-gen architecture

Page 4: YARN - Strata 2014

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional Hadoop, challenges & limitations

1 ° ° ° ° °

° ° ° ° ° N

HDFS (Hadoop Distributed File System)

MapReduce Largely Batch Processing

SOU

RC

ES

EXISTING  Systems  

Clickstream   Web  &Social   Geoloca5on   Sensor  &  Machine  

Server  Logs   Unstructured  

Architectural Limitations •  Primarily a batch system using MapReduce •  Single purpose clusters, specific data sets

Enterprise Challenges •  Limited enterprise capabilities:

Operations, Security & Governance •  Created additional Silos

Interoperability Challenges •  Difficult to natively integrate existing applications

APP

LIC

ATIO

NS

DAT

A S

YSTE

M

Business Analytics

Custom Applications

Packaged Applications

RDBMS EDW MPP

Page 5: YARN - Strata 2014

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN Has Fundamentally Changed Hadoop

YARN enables: •  More Workloads

From batch to interactive & real-time

•  More Data Multiple data sets of varying types and structures

•  More Value Hosting multiple business cases in a single Hadoop cluster

Enterprise Hadoop Enables…

•  More Workloads From batch to interactive & real-time

•  More Data Multiple data sets of varying types and structures

•  More Value Hosting multiple business cases in a single Hadoop cluster

Page 6: YARN - Strata 2014

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

2008 2006

1   °   °   °   °   °  

°   °   °   °   °   N  

HDFS    (Hadoop  Distributed  File  System)  

MapReduce  Largely  Batch  Processing  

Tradi5onal  Hadoop

MAPREDUCE-­‐279

2011

Enterprise Hadoop Era Begins October 23, 2013

Hadoop 2 & YARN

YARN : Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

°

° N

HDFS (Hadoop Distributed File System)

Batch Interactive Real-Time

Core of Enterprise Hadoop

Architected & led development of YARN to enable the Modern Data Architecture

Page 7: YARN - Strata 2014

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Benefits Enabled by MDA and YARN SOLUTION: A single set of data across the entire cluster with multiple access methods using “zones” for processing

1   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   n  

Interactive Hive

 Storm  Real  Time  Streams  

Single Cluster, Multiple Workloads • Maximize compute

resources to lower TCO

• No standalone, siloed clusters

• Simple management & operations

…all enabled by YARN

Batch Pig

Real Time HBase

 Spark  In  Memory  

Page 8: YARN - Strata 2014

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN and HDP Enables the Modern Data Architecture

HDP Hortonworks Data Platform

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume NFS

WebHDFS

YARN: Data Operating System

DATA MANAGEMENT

SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

GOVERNANCE & INTEGRATION

Authentication Authorization Accounting

Data Protection

Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon

Cluster: Knox

OPERATIONS

Script

Pig

Search

Solr

SQL

Hive HCatalog

NoSQL

HBase Accumulo

Stream

Storm

Other ISVs

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

In-Memory

Spark

YARN is the architectural center of Hadoop and HDP •  YARN enables a common data set

across all applications

•  Batch, interactive & real-time workloads

•  Support multi-tenant access & processing

HDP enables Apache Hadoop to become Enterprise Viable Data Platform with centralized services •  Security

•  Governance

•  Operations

•  Productization

Enabled broad ecosystem adoption

Tez Tez

Hortonworks drove this innovation of Hadoop through YARN

Page 9: YARN - Strata 2014

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Page 10: YARN - Strata 2014

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thank You! Questions?

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch

1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)