Managing Enterprise Hadoop Clusters with Apache Ambari

Post on 13-Apr-2017

181 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

Transcript

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Managing Enterprise Hadoop Clusters with

Apache Ambari

Jayush Luniya @ Hortonworks Apache Ambari PMC

© Hortonworks Inc. 2011 – 2016. All Rights Reserved May 2016

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What’s Apache Ambari?

100% open-source platform for simplifying

Hadoop cluster management and

use.

Highly extensible.

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

It’s a wild zoo out there!Gotta manage this

efficiently.

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ambari Themes

• Deliver the core operational capabilities to provision, manage and monitor Hadoop clusters at scale.

Operate Hadoop at Scale

• Robust API for integration with existing enterprise systems, such as Microsoft SCOM and Teradata Viewpoint.

Integrate with the Enterprise

• Provide extensible platform for Customers, Partners and the Community (Stacks, Views)

Extend for the Ecosystem

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ambari

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Open Source Activity

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Inception: AMBARI-1 (Sept, 2011)

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Fast forward 5 years to today…

Latest JIRA: AMBARI-16131 150+ Contributors 60+ Committers 16131 JIRAs filed 14254 JIRAs fixed

At 1.5 day per JIRA ~ 90 person years!

Used by hundreds of companies

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari – 3rd Biggest Project* @ Apache

* Based on total JIRAs filed on a project basis as of April 26, 2016

#2: Hadoop at ~32k as it is split across multiple JIRA Projects

#1#3#4#5

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Timeline

Ambari 1.6.*May 2014908 JIRAs

Ambari 1.5.*Apr 2014

1218 JIRAs

Ambari 1.7.*Dec 2014

1620 JIRAs

Ambari 2.0.* April 20151804 JIRAs

Current GA Version (2.2.2)

Ambari 2.1.*July 2015

2674 JIRAs

Ambari Stacks

Resolution of 9k+ JIRAs

Ambari Blueprints Ambari Views

Alerts FrameworkMetrics SystemRolling UpgradeKerberos Automation

Enhanced DashboardsSmart Configs

Ambari 2.2.*Dec 2015

1542 JIRAs

Express UpgradeAMS Grafana

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Extensibility Features

• To add new Services (ISV or otherwise) beyond HDP stack• To customize a Stack for customer specific environmentsStacks

• To use Ambari for automating cluster installations.• To share best practices on layout and cluster configurationBlueprints

• To extend and customize the Ambari Web UI• Add new capabilities, customize existing capabilitiesViews

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Anatomy of Ambari Extension Points

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Stacks

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Terminology

Term Definition Examples

STACK Defines a set of Services, where to obtain the software packages and how to manage the lifecycle.

HDP-2.3, HDP-2.2

SERVICE Defines the Components that make-up the service. HDFS, NAGIOS, YARN

COMPONENT The building-blocks of a Service, that adhere to a certain lifecycle.

NAMENODE, DATANODE, OOZIE_SERVER

CATEGORY The category of Component. MASTER, SLAVE, CLIENT

REPO Repository metadata where the artifacts reside http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.3.0.0

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Stack Stacks define Services + Repo

– What is a stack, and where to get the bits

Each service has a definition– What components are part of the Service

Each service has defined lifecycle commands– start, stop, status, install, configure

Lifecycle is controlled via command scripts Ability to define “custom” commands

Ambari Server

Stack

Service Definitions

Command Scripts

xml python

Ambari Agents

Repos

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stacks Support Inheritance

HDP 2.1 Stack

HDP 2.0 Stack

Overrides any Service definitions, commands and configurations Adds new Services specific to this Stack

Defines a set of Service definitions Default service configurations and command scripts

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Blueprints

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Automated Cluster Deployment

Deploy clusters of any scale with ease Two REST API calls is all it takes to provision a clusterWho uses it? HDInsight (Microsoft Azure) Hortonworks QA

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Create a 100-node Cluster

{ "configurations" : [ { ”hdfs-site" : {

"dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : ”worker-host", "components" : [ { "name" : ”DATANODE” }, { "name" : ”NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" }}

{ "blueprint" : ”my-blueprint", "host_groups" :[ { "name" : ”master-host", "hosts" : [ { "fqdn" : ”master001.ambari.apache.org”

} ] }, { "name" : ”worker-host", "hosts" : [ { "fqdn" : ”worker001.ambari.apache.org”

}, { "fqdn" : ”worker002.ambari.apache.org”

}, … { "fqdn" : ”worker099.ambari.apache.org”

} ] } ]}

1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Cluster Replication

{ "configurations" : [ { ”cluster-env" : {

”user_group" : ”hadoop" } ”hdfs-site" : {

"dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" } ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" }}

GET/api/v1/clusters/my-cluster?format=blueprint

Export blueprint from an existing cluster Import blueprint to replicate the cluster

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Blueprint Features

Ambari 2.0: High availability (HA) cluster deployments Adding hosts using blueprints (AMBARI-8458)Ambari 2.1: Advanced cluster creation options (AMBARI-10750)Ambari 2.2: Kerberized cluster deployments (AMBARI-13431) Stack advisor recommendations (AMBARI-13487)

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrades

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrades Rolling vs Express Upgrade modes Side-by-Side Bits and Configs

Bits:/usr/hdp/2.2.0.0-2041/usr/hdp/2.2.4.2-2/usr/hdp/2.3.0.0-3000

Configs:/etc/hive/conf/ (initial)/etc/hive/conf/v0 (HDP 2.2.4.2)/etc/hive/conf/v1 (HDP 2.3)

2.2.0.0 2.2.4.2 2.3.0.0minor jump major jump

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Express vs Rolling Upgrade

Rolling Upgrade Services are up the entire time Upgrade one component at a time Robust and fault-tolerant Service checks performed frequently during the upgradeExpress Upgrade All services are brought down, upgraded and restarted Faster upgrade mode Planned service downtime Relatively service checks performed less frequently during the upgrade.

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Install Version

Install new version in parallel on all agents No downtime

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Orchestration

Not necessarily “one-click” but fully guided

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Upgrade Catalog

Upgrades are driven by upgrade catalogs defined in stack definitions. Defines upgrade groups and upgrade order Provides ability to modify configurations

– Set, move, delete, transform Upgrade steps can be marked as skippable and retryable Supports executing custom scripts during upgrade

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Upgrade Catalog

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Downgrade

Can trigger downgrade at any stage of the stack upgrade Cannot downgrade once stack upgrade has been finalized

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Smart Configurations

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hadoop Configuration Challenges

Too many configurations– Which ones are important?

Too easy to mess up– What are valid/reasonable values?– What are the units?– Ok, what about dependencies?

Gets harder with combinations of services, host assignments, enabled features, CPU/RAM/disks, etc– Any recommendations? What am I doing wrong?

Smart Configurations

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Smart Configs UI

Customizable layout

- Tabs- Sections- Sub-sections- Simple grid layout

(Advanced Tab contains remaining configurations)

New Widgets

- Sliders- Recommended- Minimum- Maximum- Increment Step

- Combos- Enumerated values

- Toggles- Binary options

- Spinners- Splits value into multiple

controls. Time in milliseconds split into days, hours, minutes.

- Lists- Enumerated values- Single select- Multi select

Implemented- HDFS- YARN- MapReduce- Hive- HBase

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Driven Layouts

Stack has theme.json file

Layout Tabs Sections Sub-sections

Placement Configs placement in sub-sections

Widgets Widget type Optional Units Bytes (B, KB, MB, GB, TB, PB) Time (Millis, Seconds, Minutes, Hours, Days, Months,

Years)

{ "name": "default", "description": "Default theme for HBASE service", "configuration": { "layouts": [ { "name": "default", "tabs": [ { "name": "settings", "display-name": "Settings", "layout": { "tab-columns": "3", "tab-rows": "3", "sections": [ ... ] } } ] } ], "placement": { "configuration-layout": "default", "configs": [...] }, "widgets": [ { "config": "hbase-env/hbase_master_heapsize", "widget": { "type": "slider", "units": [ { "unit-name": "GB" } ] } }, ... ] }}

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Config Metadata and Dependencies

Extended Metadata Defined in property_value_attributes Hold non-UI metadata about value range,

increment, unit, etc

Dependencies Models bi-directional relationship between configs Depends On (property_depends_on)

Answers “which configs do I depend on?”

Depended By (dependencies) Answers “which configs are dependent on me?”

Ambari automatically updates dependencies

{ "StackConfigurations": { "final": "false", "property_depends_on": [ { "type": "yarn-site", "name": "yarn.nodemanager.resource.memory-mb" } ], "property_description": “The minimum allocation for every", "property_display_name": "Minimum Container Size (Memory)", "property_name": "yarn.scheduler.minimum-allocation-mb", "property_type": [], "property_value": "512", "property_value_attributes": { "type": "int", "maximum": "5120", "minimum": "0", "unit": "MB", "increment_step": "256" }, "type": "yarn-site.xml" }, "dependencies": [ { "StackConfigurationDependency": { "dependency_name": "hive.tez.container.size", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.map.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.reduce.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }… ]}

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metrics

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Metrics Service (AMS) - Goals

Ability to collect metrics from Hadoop and other Stack services Ability to collect system level metrics Ability to retain metrics at a high precision for a configurable time period Ability to automatically purge metrics after retention period Provide integration point for metrics collection and retention by external system Trigger alerts based on metrics in Ambari

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Metrics System - Architecture

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AMS Grafana

Ambari 2.2.2 Powerful dashboard builder integrated with AMS Pre-built Grafana dashboards for host-level and service-level metrics User can build and save custom dashboards

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AMS Grafana

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Alerts

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Alert – Types

Type Description Status ThresholdsConfigurable?

PORT Watches a port based on a configuration property such as the URI. OK, WARN, CRIT Yes (seconds)

WEB Watches an HTTP or HTTPS endpoint and determines connectivity and HTTP status code. OK, WARN, CRIT No

AGGREGATE Aggregate of status for another alert definition. OK, WARN, CRIT Yes (percentage)

METRIC Watches a metric or series of metrics in JMX and compares a mathematical result against a threshold. OK, WARN, CRIT Yes (variable)

SCRIPT Uses a custom script to handle checking. OK or CRIT No

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UI – Current Alerts

Configured by default; managed via the the web client

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UI – Host Alerts

Automatically refreshes Query alert history

46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UI– Customization & Instances

Status text, thresholds, and interval

47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Views

48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Views

View Framework Provide various applications accessible from Ambari Web UI – interact with the cluster via a

browser from a single place for all users (cluster operators, data analysis, developers, etc)

Easy to develop No need to understand Ambari core code – view development is just like creating any other web

application

Easy to deploy Packaged as a single jar file Auto create / auto configure

49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

CS Queue Manager for Cluster Operators

Capacity Scheduler Queue Manager

50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS File Browser for General Users

HDFS File Browser

51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Job Analysis for Developers

Troubleshoot Tez JobsTroubleshoot / Improve Hive queries

52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Query Editors for Data Analysts

Create, edit, execute, and analyze Hive queries Create, edit, and execute Pig scripts

53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Server in Views-Only mode

AmbariServer Cluster managed by Ambari

AmbariServer “Views-only” mode

(aka “Stand-alone” mode)Cluster not managed by Ambari

Management

Use Views

Use Views

Use Views

Use Views on existing clusters not managed by Ambari Can use Views against multiple clusters

54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos Automation

55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos Automation

Ambari 2.0 Ambari manage Kerberos principals and keytabs Works with existing MIT KDC or Active Directory Once Kerberized, seamlessly handle:

Adding new hosts Adding new components to existing hosts Adding new services Moving components to different hosts

56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You!

Try Ambari Follow the Ambari Quick Start Guide https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide

Learn more Visit the project website http://ambari.apache.org/

Get Involved User Mailing List: user-subscribe@ambari.apache.org

Developer Mailing List: dev-subscribe@ambari.apache.org

Use JIRA to file bugs and improvement requests https://issues.apache.org/jira/browse/AMBARI/

Jayush Luniya @ Hortonworks (Apache Ambari PMC)

59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Future Roadmap

AMS Grafana Integration Ambari Management Packs Ambari Logsearch Patch Upgrades Multi Service Versions Multi Service Instances

60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q&A

Stats

Largest production clusters managed by Ambari ~1600 nodes, ~800 nodes

Largest test cluster for Ambari scale testing ~400 nodes

Largest test cluster where rolling upgrade was performed ~400 nodes~40 hours

top related