Top Banner
Build Enterprise Grade Applications in YARN with Poorna Chandra [email protected] Big Data App Meetup July 27, 2016
26

Building Enterprise Grade Applications in Yarn with Apache Twill

Jan 07, 2017

Download

Technology

Cask Data, Inc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building Enterprise Grade Applications in Yarn with Apache Twill

Build Enterprise Grade Applications in YARN with

Poorna Chandra [email protected]

Big Data App MeetupJuly 27, 2016

Page 2: Building Enterprise Grade Applications in Yarn with Apache Twill

Agenda● Hadoop YARN● Challenges in building enterprise applications● Apache Twill● Architecture● Features● Real World Enterprise Use Case - CDAP● Roadmap● Q & A

2

Page 3: Building Enterprise Grade Applications in Yarn with Apache Twill

First: The NEWS

to the Apache Twill Community!!!

Apache Twill is now a Top-Level Project of the ASF

Announcement: https://s.apache.org/Rzsf

3

Page 4: Building Enterprise Grade Applications in Yarn with Apache Twill

Apache Hadoop® YARN● MapReduce NextGen aka MRv2● Resource management vs job scheduling/monitoring● New ResourceManager manages the global assignment of compute

resources to applications● Introduce concept of ApplicationMaster per application to communicate

with ResourceManager for compute resource management● Enables more than MR jobs on cluster - like Apache Spark, etc.

4

Page 5: Building Enterprise Grade Applications in Yarn with Apache Twill

How YARN Application Works

5

Page 6: Building Enterprise Grade Applications in Yarn with Apache Twill

YARN is powerful, but...● Every application needs to write boilerplate code

○ Negotiate resources from RM○ Talk to NM to run jobs○ Monitor running jobs

● Every application needs to handle ○ High availability ○ Long running applications

■ Security aspects - delegation token expiry○ Easy scalability

6

Page 7: Building Enterprise Grade Applications in Yarn with Apache Twill

● Provides abstraction for YARN to reduce complexity to develop complex and large scale distributed applications

● Adds simplicity to the power of YARN○ Java thread-like programming

model● Reduces boilerplate code● Offers common needs for distributed

enterprise-grade application development○ Lifecycle management○ High Availability○ Scalability○ Service discovery

Simplification with Apache Twill

7

Page 8: Building Enterprise Grade Applications in Yarn with Apache Twill

Hello World in TwillDefine a TwillRunnable

public class HelloWorldRunnable extends AbstractTwillRunnable {

@Override

public void run() {

LOG.info("Hello World. My first distributed application.");

}

}

8

Page 9: Building Enterprise Grade Applications in Yarn with Apache Twill

Hello World in TwillLaunch it!

public class HelloWorld {

public static void main(String[] args) throws Exception {

TwillRunnerService twillRunner =

new YarnTwillRunnerService(new YarnConfiguration(), "localhost:2181");

twillRunner.startAndWait();

TwillController controller = twillRunner.prepare(new HelloWorldRunnable());

controller.start();

controller.awaitTermination();

//...

}

}9

Page 10: Building Enterprise Grade Applications in Yarn with Apache Twill

Major Features● Service Discovery● Placement Policy● Elastic Scaling● Command Messages● State Recovery

10

Page 11: Building Enterprise Grade Applications in Yarn with Apache Twill

11

Service Discovery

Page 12: Building Enterprise Grade Applications in Yarn with Apache Twill

Placement Policy● Placement policy can be used to address

○ Performance○ Availability○ Resource conflict

● Exposes container placement policy from YARN● Will allow Twill to allocate containers in specific racks and host based on

DISTRIBUTED deployment mode

12

Page 13: Building Enterprise Grade Applications in Yarn with Apache Twill

Elastic Scaling● Ability to add or reduce number of YARN containers to run the

application● Scale your application based on load● No need to restart the application● Twill API TwillController.changeInstances is used to accomplish

this task

13

Page 14: Building Enterprise Grade Applications in Yarn with Apache Twill

14

Command Messages

Page 15: Building Enterprise Grade Applications in Yarn with Apache Twill

15

State Recovery

Page 16: Building Enterprise Grade Applications in Yarn with Apache Twill

Real World Enterprise Usages - CDAP● Cask Data Application Platform (CDAP) - http://cdap.io

○ Open source application and integration framework for big data○ Simplifies and enhances data application development and management

■ APIs for simplification, portability and standardization● Works across wide range of Hadoop versions and all common distros

■ Built-in System services, such as metrics and logs aggregation, dataset

management, and distributed transaction service for common big data applications needs

○ Extensions to enhance user experience■ Hydrator - Interactive data pipeline construction■ Tracker - Metadata discovery and data lineage

16

Page 17: Building Enterprise Grade Applications in Yarn with Apache Twill

Apache Twill in CDAP● CDAP runs different types of processes on YARN

○ Long running daemons○ REST services○ Real-time transactional streaming framework○ Workflow execution

● CDAP only interacts with Twill○ Greatly simplifies the CDAP code base○ Just a matter of minutes to add support for new type of work to run on YARN

● Twill support of common needs○ Service discovery○ Leader election○ Elastic scaling○ Security

17

Page 18: Building Enterprise Grade Applications in Yarn with Apache Twill

CDAP Architecture

18

Page 19: Building Enterprise Grade Applications in Yarn with Apache Twill

Service Discovery● CDAP exposes all functionalities through REST● Almost all CDAP HTTP services are running in YARN

○ No fixed host and port○ Bind to ephemeral port○ Announce the host and port through Twill

■ Unique service name for a given service type

● Router inspects the request URI to derive a service name○ Uses Twill discovery service client to locate actual host and port○ Proxy the request and response

19

Page 20: Building Enterprise Grade Applications in Yarn with Apache Twill

Long Running Applications● All CDAP services on YARN are long running

○ Transaction server, metrics and log processing, real-time data ingestion, …

● Many user applications are long running too○ Real-time streaming, HTTP service, application daemon

● Secure cluster, specifically Kerberos enabled cluster○ All all Hadoop services use delegation token

■ NN, RM, HBase Master, Hive, KMS, ... ○ YARN containers don’t have the keytab, hence can’t update the token

20

Page 21: Building Enterprise Grade Applications in Yarn with Apache Twill

Long Running Applications in Twill● Twill provides support for updating delegation tokens

○ TwillRunner.scheduleSecureStoreUpdate

● Update delegation tokens from the launcher process (kinit process)○ Acquires new delegation tokens periodically○ Serializes tokens to HDFS○ Notifies all running applications about the update

■ Through command message○ Each runnable refreshes delegation tokens by reading from HDFS

■ Requires a non-expired HDFS delegation token

● New launcher process will discover all Twill apps from ZK○ Can run HA launcher processes using leader election support from Twill

21

Page 22: Building Enterprise Grade Applications in Yarn with Apache Twill

Scalability● Many components in CDAP are linearly scalable, such as

○ Streaming data ingestion (through REST endpoint)○ Log processing

■ Reads from Kafka, writes to HDFS○ Metrics processing

■ Reads from Kafka, writes to timeseries table○ User real-time streaming DAG○ User HTTP service

● Twill supports adding/reducing YARN containers for a given TwillRunnable○ No need to restart application○ Guarantees a unique instance ID is assigned

■ Application can use it for partitioning

● Dynamic scaling using service discovery22

Page 23: Building Enterprise Grade Applications in Yarn with Apache Twill

High Availability● In production environment, it is important to have high availability● Twill provides couple means to achieve that

○ Running multiple instances of the same TwillRunnable○ Use dynamic service discovery to route requests○ Twill Automatic restart of TwillRunnable container if it gets killed / exit abnormally

■ Killed container will be removed from the service discovery■ Restarted container will be added to the service discovery

○ Built-in leader election support to have active-passive type of redundancy■ Tephra service use that, as it requires only having one active server

○ Placement policy to make sure that instances run on different hosts

23

Page 24: Building Enterprise Grade Applications in Yarn with Apache Twill

Apache Twill in Enterprise● CDAP, which uses Twill, is being used by large enterprises in production● Apache Twill runs on different cluster types

○ AWS, Azure, bare metal, VMs

● Compatible with wide range of Hadoop versions○ Vanilla Hadoop 2.0 - 2.7○ HDP 2.1 - 2.3○ CDH 5○ MapR 4.1 - 5.1

24

Page 25: Building Enterprise Grade Applications in Yarn with Apache Twill

Roadmap● Generalize to run on more frameworks

○ Apache Mesos, Kubernetes

● Smarter containers management○ Run simple runnable in AM○ Multiple runnables in one container

● Fine-grained control of containers lifecycle○ When to start, stop and restart on failure

● Smaller footprint○ Optional Kafka, optional ZooKeeper

25

Page 26: Building Enterprise Grade Applications in Yarn with Apache Twill

Thank you!● Apache Twill is Open Source

○ http://twill.apache.org ○ [email protected] ○ @ApacheTwill

● Contributions are welcome!

26