Top Banner
HDFS deep dive Zoltan C. Toth [email protected] @zoltanctoth
23

HDFS Deep Dive

Apr 11, 2017

Download

Data & Analytics

Zoltan C. Toth
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HDFS Deep Dive

HDFS deep diveZoltan C. Toth

[email protected] @zoltanctoth

Page 2: HDFS Deep Dive

Hadoop File Systems• Local file://

• HDFS hdfs://

• S3 s3://

• Azure wasb://

• FTP ftp://

• WebHDFS, SecureWebHDFS (swebhdfs://)

Page 3: HDFS Deep Dive

HDFS Design• It is a distributed file system

• Should handle very large files

• Streaming Data Access: Write once, read many times

• Able to run on commodity Hardware Fault tolerance

Page 4: HDFS Deep Dive

Concepts

• Replication3 nodes by default, configurable

• Block basedBig blocks: Unix: 4kb, HDFS: 64-256MB

Page 5: HDFS Deep Dive

Blocks• Makes data abstraction simpler

• Enables storing files that only fit on more disks

• Makes replication easier

• Blocks are not filled up with empty space

• Mappers usually work at 1 block at a timeBig blocks -> fewer mappers -> might be slow

Page 6: HDFS Deep Dive

Permissions• Permissions like in UNIX

rwxrwxrwx

• x is ignored by files, used for directory listing

• Hadoop Security can be enabled Off by default, hdfs user = client’s username

Page 7: HDFS Deep Dive

Directory structure

• /tmp

• /user/<username>

• /var/log

• … It’s up to you

Page 8: HDFS Deep Dive

Architecture

DataNode

NameNode

DataNode DataNode

client

Page 9: HDFS Deep Dive

NameNode• Responsible for managing the filesystem

filesystem treepermissionsfile-block mappingsblock-datanode mappings datanode location arbitrary metadata

• Namespace image and edit log fsimage, editsstores everything in-memory

Page 10: HDFS Deep Dive

NN Data protection

• No overwrite by default

• Safe Mode - read only mode In case of a failure or e.g. low free space

Page 11: HDFS Deep Dive

DataNode

• Stores and receives blocks

• Reports health to NN

• At startup: Reports block and file data to NN

Page 12: HDFS Deep Dive

DN Block Caching

• Stores frequently accessed files in memory

• ConfigurableE.g.: frequently accessed small files can replicated to all nodes and caches

Page 13: HDFS Deep Dive

Replica Management• Network distance metric

1. same node2. same rack3. same data center4. different data center this has to be configured

• NN keeps track of under replicated blocks

• Cluster can get unbalancedhadoop balancer

Page 14: HDFS Deep Dive

Reading a file

Uses a proximity-prioritized list of DataNodes

Page 15: HDFS Deep Dive

Writing a file

1. File exists? 2. Permissions granted? 3. OK Create file record 4. Stream data to datanodes

data queue ack queue

Default placement strategy: 1. same node 2. different rack 3. same rack as (2.), but on a different node

Page 16: HDFS Deep Dive

Moving a file

?A simple NameNode FS tree update

Page 17: HDFS Deep Dive

Interfaces• CLI

hdfs dfs -ls /

• JavaConfiguration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(“hdfs://…”), conf);

• libhdfs#include "hdfs.h"

• FuseFilesystem mount

Page 18: HDFS Deep Dive

Problems

Page 19: HDFS Deep Dive

HDFS weaknesses• Not suitable for low latency data access

• Slow if you have lots of small filesfile access overhead

• No arbitrary file modification it’s append-only

• NameNode if a single point of failure

• Everything in the NN’s memory: Scalability issues

Page 20: HDFS Deep Dive

Secondary NameNode

• Store the namespace image and edit log on NFS

• Secondary NameNode (Not a NameNode) Stores the namepsace image and merges it with edit log periodically

Page 21: HDFS Deep Dive

High Availability NN• Active-standby configuration of 2 NameNodes

• Failover is usually managed by ZooKeeper

• HA shared storage for the edit log - NFS (disfavoured due to multiple writers) - Quorum Journal Manager (QJM): 3 nodes store each edit (confugurable)

Each edit must be written to the majority of the journal nodes

Page 22: HDFS Deep Dive

HDFS federation

• One NN manages on namespace volume, e.g.:NN-1: /user NN-2: /data

• Configuration must be done on the client’s side NN’s don’t know about each other

Page 23: HDFS Deep Dive

Thanks!Zoltan Toth