Top Banner
Unleash your cluster with YARN Ferran Galí i Reniu @ferrangali
97
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unleash your cluster with YARN

Unleash your cluster with YARN

Ferran Galí i Reniu@ferrangali

Page 2: Unleash your cluster with YARN

About me

@ferrangali

Page 3: Unleash your cluster with YARN

Data

Page 4: Unleash your cluster with YARN

Sensors

Page 5: Unleash your cluster with YARN

Smartphones

Page 6: Unleash your cluster with YARN

User behavior

Page 7: Unleash your cluster with YARN

Social Networks

Page 8: Unleash your cluster with YARN

Text

Page 9: Unleash your cluster with YARN

Numbers

Page 10: Unleash your cluster with YARN

Images

Page 11: Unleash your cluster with YARN

Videos

Page 13: Unleash your cluster with YARN

Big Data

Page 14: Unleash your cluster with YARN

100 MB/s

2 TB = 3.5 hours

The Big Data problem

Page 15: Unleash your cluster with YARN

100 MB/s

2 TB = 30 min

The Big Data problem

Page 17: Unleash your cluster with YARN
Page 18: Unleash your cluster with YARN
Page 19: Unleash your cluster with YARN

HDFS

Node Node Node Node Node Node Node NodeHardware

Page 20: Unleash your cluster with YARN

HDFS

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Page 21: Unleash your cluster with YARN

HDFS

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Page 22: Unleash your cluster with YARN

$> hadoop fs -ls

HDFS

Page 23: Unleash your cluster with YARN

$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt

$>

HDFS

Page 24: Unleash your cluster with YARN

$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt

$> hadoop fs -ls dir

HDFS

Page 25: Unleash your cluster with YARN

$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt

$> hadoop fs -ls dirFound 2 items-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file2.txt-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file3.txt

$>

HDFS

Page 26: Unleash your cluster with YARN

$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt

$> hadoop fs -ls dirFound 2 items-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file2.txt-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file3.txt

$> hadoop fs -cat dir/file3.txt

HDFS

Page 27: Unleash your cluster with YARN

$> hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2015-06-11 11:27 dir-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 file1.txt

$> hadoop fs -ls dirFound 2 items-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file2.txt-rw-r--r-- 1 hadoop supergroup 2198927 2015-06-10 17:22 dir/file3.txt

$> hadoop fs -cat dir/file3.txtline1line2line3line4line5

HDFS

Page 28: Unleash your cluster with YARN

MapReduce

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Page 29: Unleash your cluster with YARN

MapReduce

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Processing

Page 30: Unleash your cluster with YARN

MapReduce Job

Data Pipeline

Application

Page 31: Unleash your cluster with YARN

MapReduce Job

MapReduce Job

MapReduce Job

Data Pipeline

Application

Page 32: Unleash your cluster with YARN

MapReduce Job

Page 33: Unleash your cluster with YARN

Map

Map

Map

Map

MapReduce Job

Split

Split

Split

Split

Page 34: Unleash your cluster with YARN

Map

Map

Map

Map

MapReduce Job

Split

Split

Split

Split

map(){ // Your code here}

Page 35: Unleash your cluster with YARN

Map

Map

Map

Map

Reduce

Reduce

Reduce

MapReduce Job

Split

Split

Split

Split

map(){ // Your code here}

Page 36: Unleash your cluster with YARN

Map

Map

Map

Map

Reduce

Reduce

Reduce

MapReduce Job

Split

Split

Split

Split

map(){ // Your code here}

reduce(){ // Your code here}

Page 37: Unleash your cluster with YARN

Map

Map

Map

Map

Reduce

Reduce

Reduce

MapReduce Job

Split

Write

Write

Write

Split

Split

Split

map(){ // Your code here}

reduce(){ // Your code here}

Page 38: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Processing Job Job

The Big Data problemData Pipeline

Application

Page 39: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Processing Job Job

The Big Data problemData Pipeline

Application

Page 40: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Processing Job Job

The Big Data problemData Pipeline

Application

Page 41: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Processing Job Job

The Big Data problemData Pipeline

Application

Page 42: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

Hardware

Storage

Processing Job Job

The Big Data problemData Pipeline

Application

Page 43: Unleash your cluster with YARN

NodeJobTracker

NodeTaskTracker

MapReduce 1.0 Architecture

NodeTaskTracker

NodeTaskTracker

NodeTaskTracker

Page 44: Unleash your cluster with YARN

MapReduce 1.0 Architecture

NodeTaskTracker

Map Map Map Map

Map Map Map Reduce

Reduce Reduce Reduce Reduce

Page 45: Unleash your cluster with YARN

NodeJobTracker

NodeTaskTracker

MapReduce 1.0 Architecture

NodeTaskTracker

NodeTaskTracker

NodeTaskTracker

Application

Page 46: Unleash your cluster with YARN

NodeJobTracker

NodeTaskTracker

MapReduce 1.0 Architecture

NodeTaskTracker

NodeTaskTracker

NodeTaskTracker

Application

Map

Page 47: Unleash your cluster with YARN

NodeJobTracker

NodeTaskTracker

MapReduce 1.0 Architecture

NodeTaskTracker

NodeTaskTracker

NodeTaskTracker

Application

Reduce

Page 48: Unleash your cluster with YARN

Limitations

Page 49: Unleash your cluster with YARN

Limitations

Page 50: Unleash your cluster with YARN

Limitations

Page 51: Unleash your cluster with YARN

Limitations

Map

Reduce

Map

Map

Reduce

Map

Map

Reduce

Map

Page 52: Unleash your cluster with YARN

MapReduce Job

Limitations

Iterative

Page 53: Unleash your cluster with YARN

MapReduce Job

Limitations

Iterative

Graph Algorithms

Page 54: Unleash your cluster with YARN

MapReduce Job

Limitations

Iterative

Graph Algorithms

Page 55: Unleash your cluster with YARN

YARN

Page 56: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

YARN

Hardware

Storage

Resource Manager

The Big Data problemYARN - Yet Another Resource Negotiator

Processing

Page 57: Unleash your cluster with YARN

Cores

Page 58: Unleash your cluster with YARN

Memory

Page 59: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

NodeManager

x8 x8

x8 x8

Page 60: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

NodeManager

Applicationx61 core1024MB

x8 x8

x8 x8

Page 61: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

ApplicationMaster

NodeManager

Applicationx61 core1024MB

x8 x8

x8 x8

Page 62: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

ApplicationMaster

Container

Container

Container

NodeManager

Container

Container

Applicationx61 core1024MB

Container

x8 x8

x8 x8

Page 63: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

ApplicationMaster

Map

Map

Map

NodeManager

Map

Map

Applicationx61 core1024MB

Map

x8 x8

x8 x8

Page 64: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

ApplicationMaster

Reduce

Reduce

Reduce

NodeManager

Reduce

Reduce

Applicationx61 core1024MB

Reduce

x8 x8

x8 x8

Page 65: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

ApplicationMaster

Container

Container

Container

NodeManager

Container

Container

Container

Applicationx61 core1024MB

Application 2x42 cores2048MB

x8 x8

x8 x8

Page 66: Unleash your cluster with YARN

NodeManagerResourceManager

YARN Architecture

ApplicationMaster

Container

Container

Container

NodeManager

Container

Container

ApplicationMaster

Container

Applicationx61 core1024MB

Application 2x42 cores2048MB

x8 x8

x8 x8

Page 67: Unleash your cluster with YARN

NodeManager

Container

ResourceManager

YARN Architecture

Container

ApplicationMaster

Container

Container

Container

NodeManager

Container

Container

Container

ApplicationMaster

Container

Container

Applicationx61 core1024MB

Application 2x42 cores2048MB

x8 x8

x8 x8

Page 68: Unleash your cluster with YARN

NodeManager

Container

ResourceManager

YARN Architecture

Container

ApplicationMaster

Container

Container

Container

NodeManager

Container

Container

Container

ApplicationMaster

Container

Container

Applicationx61 core1024MBetl

Application 2x42 cores2048MBquery

x8 x8

x8 x8

scheduleretl: weight 1

query: weight 2

Page 69: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

YARN

Hardware

Storage

Resource Manager

The Big Data problemNew Paradigms

Processing

Page 70: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

YARN

Hardware

Storage

Resource Manager

The Big Data problemNew Paradigms

Processing

Application

Batch

Page 71: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

YARN

Hardware

Storage

Resource Manager

The Big Data problemNew Paradigms

Processing

Application

In Memory / Streaming

Page 72: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

YARN

Hardware

Storage

Resource Manager

The Big Data problemNew Paradigms

Processing

Application

Interactive SQL

Page 73: Unleash your cluster with YARN

Node Node Node Node Node Node Node Node

HDFS - Hadoop Distributed File System

YARN

Hardware

Storage

Resource Manager

The Big Data problemNew Paradigms

Processing ...

Application

Page 74: Unleash your cluster with YARN

Improved Data Pipelines

Map

Reduce

Map

Map

Reduce

Map

Map

Reduce

Map

Page 75: Unleash your cluster with YARN

Improved Data Pipelines

Map

Reduce

Map

Map

Reduce

Map

Reduce

Page 76: Unleash your cluster with YARN

MapReduce Job

MapReduce Job

MapReduce Job

Improved Data Pipelines

Application

Page 77: Unleash your cluster with YARN

Improved Data Pipelines

MapReduce Job

Spark JobMapReduce

Job

Application

Page 78: Unleash your cluster with YARN

Improved Data Pipelines

MapReduce Job

Spark Job

Application

Page 79: Unleash your cluster with YARN
Page 80: Unleash your cluster with YARN

Trovit

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Page 81: Unleash your cluster with YARN

Trovit

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Search engine

Page 82: Unleash your cluster with YARN

Trovit

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Business Intelligence

Search engine

Page 83: Unleash your cluster with YARN

Trovit

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Business Intelligence

Search engine

Mailing

Page 84: Unleash your cluster with YARN

Trovit

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Business Intelligence

Search engine

Mailing Push Notifications

Page 85: Unleash your cluster with YARN

Trovit

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Business Intelligence

Search engine

Mailing Push Notifications

Online Media Buying

Page 86: Unleash your cluster with YARN

Challenges

Trovit

Maintain

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Page 87: Unleash your cluster with YARN

Challenges

Trovit

Maintain Try new paradigms

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Page 88: Unleash your cluster with YARN

Challenges

Trovit

Maintain Try new paradigms

Fine tune

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Page 89: Unleash your cluster with YARN

Trovit

Data Analysis with SQL on Hadoop

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Page 90: Unleash your cluster with YARN

Trovit

Data Analysis with SQL on Hadoop

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

HiveSQL M/R

Page 91: Unleash your cluster with YARN

Trovit

Data Analysis with SQL on Hadoop

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

HiveSQL M/R

Sqoop onMySQL

Page 92: Unleash your cluster with YARN

Challenges

Trovit

Data Analysis with SQL on Hadoop

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

ImpalaInteractive

Page 93: Unleash your cluster with YARN

Challenges

Trovit

Data Analysis with SQL on Hadoop

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

ImpalaInteractive

Machine Learning

Page 94: Unleash your cluster with YARN

Trovit

Data Analysis with SQL on Hadoop

Near Real Time on a Storm cluster

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Separated Cluster

Page 95: Unleash your cluster with YARN

Challenges

Trovit

Data Analysis with SQL on Hadoop

Near Real Time on a Storm cluster

+70 MapReduce Jobs adding business value

Multi-tenant cluster executing +7000 jobs per day

Storm on YARN

Page 96: Unleash your cluster with YARN

Questions?

Page 97: Unleash your cluster with YARN

Thank YouFerran Galí i Reniu

@ferrangali

Icons made by Freepik from Flaticon is licensed by CC BY 3.0