Top Banner
http://poll.fm/50lt0
28

How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Jul 08, 2015

Download

Software

Ian Lumb

Outline:

- The Apache Project's 4-step upgrade process for its Hadoop distro

- Upgrade processes for the Hadoop stack involving Apache Ambari and other management tools

- Bright roles for Hadoop service definition, assignment and composition

- The 1-step, 0-downtime Bright upgrade process for Hadoop distros and the analytics stack
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

http://poll.fm/50lt0

Page 2: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

How to Upgrade Your Hadoop Stack in 1

Step -- with Zero Downtime

Ian Lumb

Bright Evangelist

Developed originally for a Bright Computing webinar (link) delivered November 5, 2014.

Page 3: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Page 4: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Key Takeaways

The Apache Project

• 4-step upgrade process for its Hadoop distro

Upgrade processes for the Hadoop stack

• Apache Ambari

• Other management tools

Bright roles for Hadoop

• Service definition, assignment and composition

The 1-step, 0-downtime Bright upgrade process

• Hadoop distros and the analytics stack

Page 5: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Why Upgrade Hadoop?

Gain access to new capabilities

• Enhancements - new features and/or functionalities

• Improvements – maintenance (e.g., security)

Transitioning from pilot to production

Maintain compatibility

• Between sites within an organization

• Between project participants

Other reasons?

Page 6: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

4-Step Rolling Upgrade Process: Overview

1. Prepare the rolling upgrade

• Snapshot HDFS metadata

2. Upgrade active and standby NameNode services

3. Upgrade DataNodes

4. Finalize the rolling upgrade

Page 7: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

4-Step Rolling Upgrade Process: Considerations

High Availability (HA)?

• if ( “No” ) then

Downtime!

Federated clusters?

• Repeat for each namespace

Out of scope

• JournalNodes

• ZooKeeperNodes

• Analytics stack

Page 9: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

What Makes Hadoop Upgrades Challenging?

HDFS is the underlying platform

• YARN and analytics apps depend upon HDFS

Complexity

• Interdependencies

HDFS services plus the rest of the Hadoop stack

Highly distributed

Scale

Page 10: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Bright Cluster Manager and Hadoop Upgrades

Bright roles

• Facilitates service definition, assignment and composition

Almost any service can be made highly available

– Run redundant copies on different nodes

Bright CMSH

• Cluster-Management SHell

Page 11: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Bright Concepts - Role

Device:Entity in cluster management infrastructure which represents a physical device in the cluster

Category:A group of nodes sharing the same configuration. A node must always be a member of exactly 1 category

Node group:A group of nodes, not necessarily sharing the same configuration. A node can be a member of 0 or more node groups.

Role:Task that can be assigned to a node.

For example, a node can be assigned the Provisioning role, which makes it a provisioning node.

Page 12: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Hadoop-Related Roles in Bright Cluster Manager

Page 13: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Bright Cluster Management Interfaces

Three ways to manage cluster:

CMSH

• Command-line interface to cluster

• Usually runs on head node, but can also be used remotely

• Can be used interactively and from scripts

• Powerful tool but takes some time to get familiar with …

CMGUI

• Desktop GUI application (supported: Windows, Linux, OS X)

(installable packages in /cm/shared/apps/cmgui/dist)

• Can also be run on head node through SSH with X-

forwarding

• Intuitive and easy to use

SOAP / JSON API

• Python and C++ interfaces available which hide SOAP /

JSON

Page 14: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Bright Cluster Management Shell (CMSH)

Features:

Modular interface

Command completion using tab key

Command line history

Output redirection to file or shell command

Scriptable in batch mode

Support for looping over objects

Example[demo]% device

[demo->device]% status

demo ................ [ UP ]

node001 ............. [ UP ]

node002 ............. [ UP ]

Page 15: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Bright Hadoop Upgrades

Single script captures the Apache Project’s 4 steps

Enhancements

• Automated deployment of updated software

Ensures configured instances of Hadoop are updated

• DataNodes can be upgraded simultaneously

Distributed provisioning (large-cluster option)

• JournalNodes are upgraded without downtime

• Automated testing of the upgrade prior to commitment

Validation of the Hadoop setup

– Teragen, terasort and teravalidate are executed

DEMO …

Page 16: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Cascading Upgrade

Page 17: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Bright Support for Apache Hadoop

FULLY INTEGRATED — Bright Cluster Manager

bundles, installs and manages the `product’

completely. Nothing else is needed.

INTEGRATED — Bright Cluster Manager installs and

manages some aspects of the `product’, but

something else is need for COMPLETE support.

COMPATIBLE — Bright Cluster Manager doesn’t

install or manage the `product’, but it can be installed

on a cluster that is itself Bright-managed.

INCOMPATIBLE — Bright Cluster Manager doesn’t

work with the `product’ at all.

Page 18: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Hadoop Support

FULLY INTEGRATED

• Apache Hadoop, CDH & HDP

HDFS and its services

– HBase, NameNode, DataNode & JournalNode

• ZooKeeper

INTEGRATED

• YARN

• Pig, Hive, Accumulo & Spark

COMPATIBLE

• E.g., Giraph

INCOMPATIBLE

Note: HA YARN available soon.

Page 19: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Compatible Support Example: Giraph

Page 20: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Bright Maintenance of Hadoop

Innovation characterizes the entire history and

evolution of Big Data Analytics via Hadoop

• BUT … introduces challenges and opportunities …

Bright Computing’s approach leverages

• People

Proactively maintaining business and technical relationships

• Process

`Hands-on engineering’ begins with each release

– Preliminary to fully enterprise-ready implementations

• Product

Bright Cluster Manager released once per year

– Compatible updates flow continuously via YUM …

Page 21: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Further Discussion

Upgrade scenarios

Migrating distros

Hadoop stack

Page 22: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Page 23: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Key Takeaways

The Apache Project

• 4-step upgrade process for its Hadoop distro

Upgrade processes for the Hadoop stack

• Apache Ambari

• Other management tools

Bright roles for Hadoop

• Service definition, assignment and composition

The 1-step, 0-downtime Bright upgrade process

• Hadoop distros and the analytics stack

Page 24: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Q & A

Ian Lumb, [email protected]

Page 25: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Additional Slides

Page 26: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Customer Needs

Quickly build and deploy a Hadoop cluster

• Needed yesterday for an important project?

Build a PoC cluster to test drive Hadoop

• Unsure about taking the Hadoop plunge?

Build a hybrid HPC/Hadoop cluster

• HPC and Hadoop required

1

2

3

Page 27: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Feature Benefit

Installs on bare metal Pallet to production in less time

Simple deployment process Running right – first time, every time

Comprehensive monitoring and health checking

Know how your cluster is running

Deploys multiple distributions Make the choice that best fits your needs

Operate multiple Hadoop instances simultaneously

Accommodate multiple choices at the same time

Integrated HDFS management operations Easily allocate storage resources to users

Product Details

Page 28: How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime

Product Differentiation

• Installs on bare metal through to the Hadoop distro

• Works with almost any Hadoop distro

• Single-pane-of-glass management interface

Addresses the physical cluster and Hadoop

• Fully manages Hadoop services

• HDFS, YARN, etc.

• Customized monitoring and health checks

• Multiple instances of Hadoop

Architected specifically for Hadoop

• Simultaneous, independent instances on dedicated hardware

• Time-sliced instances on shared hardware

HPC and Hadoop together