Top Banner
HDFS Federation Page 1 Sanjay Radia Founder and Architect @ Hortonworks
22

HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Jul 01, 2015

Download

Technology

Hortonworks

Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this single system, the NameNode both limits the growth in size of the cluster and makes the NameService a bottleneck for the MapReduce framework as demand increases. HDFS Federation horizontally scales the NameService using multiple federated NameNodes/namespaces. The federated NameNodes share the DataNodes in the cluster as a common storage layer. HDFS Federation also adds client-side namespaces to provide a unified view of the file system. In this talk, Hortonworks co-founder and key architect, Sanjay Raidia, will discuss the benefits, features and best practices for implementing HDFS Federation.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

HDFS Federation

Page 1

Sanjay Radia Founder and Architect @ Hortonworks

Page 2: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

About Me

•  Apache Hadoop Committer and Member of Hadoop PMC

•  Architect of core-Hadoop @ Yahoo -  Focusing on HDFS, MapReduce scheduler,

Compatibility, etc.

•  PhD in Computer Science from the University of Waterloo, Canada

Page 2

Page 3: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Agenda

•  HDFS Background

•  Current Limitations

•  Federation Architecture

•  Federation Details

•  Next Steps

•  Q&A

Page 3

Page 4: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Two main layers •  Namespace

-  Consists of dirs, files and blocks

-  Supports create, delete, modify and list files or dirs operations

•  Block Storage -  Block Management

•  Datanode cluster membership

•  Supports create/delete/modify/get block location operations

•  Manages replication and replica placement -  Storage - provides read and write access to

blocks

HDFS Architecture B

lock

Sto

rage

N

ames

pace

Namenode

Block Management

NS

Storage

Datanode Datanode …  

Page 4

Page 5: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Implemented as •  Single Namespace Volume

-  Namespace Volume = Namespace + Blocks

•  Single namenode with a namespace -  Entire namespace is in memory

-  Provides Block Management

•  Datanodes store block replicas -  Block files stored on local file system

HDFS Architecture B

lock

Sto

rage

N

ames

pace

Namenode

Block Management

NS

Storage

Datanode Datanode …  

Page 5

Page 6: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Limitation - Isolation

Poor Isolation •  All the tenants share a single namespace

-  Separate volume for tenants is desirable

•  Lacks separate namespace for different application categories or application requirements -  Experimental apps can affect production apps -  Example - HBase could use its own namespace

Page 6

Page 7: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Limitation - Scalability

Scalability •  Storage scales horizontally - namespace doesn’t •  Limited number of files, dirs and blocks

-  250 million files and blocks at 64GB Namenode heap size

•  Still a very large cluster

•  Facebook clusters are sized at ~70 PB storage

Performance •  File system operations throughput limited by a single node

-  120K read ops/sec and 6000 write ops/sec

•  Support 4K clusters easily

•  Easily scalable to 20K write ops/sec by code improvements

Page 7

Page 8: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Namespace and Block Management are distinct layers •  Tightly coupled due to co-location

•  Separating the layers makes it easier to evolve each layer •  Separating services

-  Scaling block management independent of namespace is simpler

-  Simplifies Namespace and scaling it,

Block Storage could be a generic service •  Namespace is one of the applications to use the service

•  Other services can be built directly on Block Storage -  HBase

-  MR Tmp

-  Foreign namespaces

Limitation – Tight Coupling

Page 8

Page 9: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Page 9

Stated Problem

Isolation is a problem

for even small clusters!

Page 10: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

HDFS Federation

•  Multiple independent Namenodes and Namespace Volumes in a cluster

-  Namespace Volume = Namespace + Block Pool

•  Block Storage as generic storage service -  Set of blocks for a Namespace Volume is called a Block Pool -  DNs store blocks for all the Namespace Volumes – no partitioning

Datanode  1   Datanode  2 Datanode  m ... ... ...

                   NS1 Foreign  NS  n

... ...                    NS  k

                       Block    Pools

Pool    n Pool    k Pool    1

NN-1 NN-k NN-n

Common  Storage

Blo

ck S

tora

ge

Nam

espa

ce

Page 10

Page 11: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Key Ideas & Benefits

•  Distributed Namespace: Partitioned across namenodes -  Simple and Robust due to independent

masters •  Each master serves a namespace volume •  Preserves namenode stability – little namenode

code change -  Scalability – 6K nodes, 100K tasks, 200PB and

1 billion files

HBase

Storage Service

HDFS Namespace

Alternate NN Implementation

MR tmp

Page 11

Page 12: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Key Ideas & Benefits

•  Block Pools enable generic storage service -  Enables Namespace Volumes to be

independent of each other

-  Fuels innovation and Rapid development •  New implementations of file systems and

Applications on top of block storage possible •  New block pool categories – tmp storage,

distributed cache, small object storage

•  In future, move Block Management out of namenode to separate set of nodes -  Simplifies namespace/application

implementation

•  Distributed namenode becomes significantly simpler

HBase

Storage Service

HDFS Namespace

Alternate NN Implementation

MR tmp

Page 12

Page 13: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

•  Simple design -  Little change to the Namenode, most changes in Datanode, Config and Tools

-  Core development in 4 months

-  Namespace and Block Management remain in Namenode

•  Block Management could be moved out of namenode in the future

•  Little impact on existing deployments

-  Single namenode configuration runs as is

•  Datanodes provide storage services for all the namenodes

-  Register with all the namenodes

-  Send periodic heartbeats and block reports to all the namenodes

-  Send block received/deleted for a block pool to corresponding namenode

HDFS Federation Details

Page 13

Page 14: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

HDFS Federation Details

•  Cluster Web UI for better manageability -  Provides cluster summary

-  Includes namenode list and summary of namenode status

-  Decommissioning status

•  Tools -  Decommissioning works with multiple namespace

-  Balancer works with multiple namespaces

•  Both Datanode storage or Block Pool storage can be balanced

•  Namenode can be added/deleted in Federated cluster -  No need to restart the cluster

•  Single configuration for all the nodes in the cluster Page 14

Page 15: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Managing Namespaces

•  Federation has multiple namespaces -  Don’t you need a single global namespace?

-  Some tenants want private namespace

-  Global? Key is to share the data and the names used to access the data

•  A single global namespace is one way share

home project

NS1 NS3 NS2

NS4

tmp

/ Client-side mount-table

data

Page 15

Page 16: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Managing Namespaces

•  Client-side mount table is another way to share -  Shared mount-table => “global” shared

view

-  Personalized mount-table => per-application view

•  Share the data that matter by mounting it

•  Client-side implementation of mount tables -  No single point of failure

-  No hotspot for root and top level directories

home project

NS1 NS3 NS2

NS4

tmp

/ Client-side mount-table

data

Page 16

Page 17: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Next Steps

•  Complete separation of namespace and block management layers -  Block storage as generic service

•  Partial namespace in memory for further scalability

•  Move partial namespace from one namenode to another -  Namespace operation - no data copy

Page 17

Page 18: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

•  Namenode as a container for namespaces -  Lots of small namespace volumes

•  Chosen per user/tenant/data feed

•  Mount tables for unified namespace

-  Can be managed by a central volume server

-  Move namespace from one container to another for balancing

•  Combined with partial namespace -  Choose number of namenodes to match

•  Sum of (Namespace working set)

•  Sum of (Namespace throughput)

Next Steps

Datanode Datanode …  

Namenodes

Page 18

Page 19: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Thank You

More information

1.  HDFS-1052: HDFS Scalability with multiple namenodes 2.  Hadoop – 7426: user guide for how to use viewfs with federation 3.  An Introduction to HDFS Federation – https://hortonworks.com/an-introduction-to-hdfs-federation/

Page 19

Page 20: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Other Resources

Page 20 © Hortonworks Inc. 2012

• Next webinar: Improve Hive and HBase Integration –  May 2, 2012 @ 10am PST –  Register now :http://hortonworks.com/webinars/

• Hadoop Summit – June 13-14 – San Jose, California – www.Hadoopsummit.org

• Hadoop Training and Certification – Developing Solutions Using Apache Hadoop – Administering Apache Hadoop – http://hortonworks.com/training/

Page 21: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Backup slides

Page 21

Page 22: HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

HDFS Federation Across Clusters

home

project data

tmp

/

Cluster 1

home

project data tmp

/

Cluster 2

Application mount-table in Cluster 1

Application mount-table in Cluster 2

Page 22