AHUG Presentation: Fun with Hadoop File Systems

FUN WITH HADOOP FILE SYSTEMS

© Bradley Childs / [email protected]

HISTORY •  Distributed file systems have been around for a long time

•  DFS battle optimizing the CAP theorem

•  Hadoops DFS implementation is called HDFS

•  Wide adoption of hadoop, users forced to use HDFS as the only alternative

•  HDFS has technical trade offs and limitations

HDFS ARCHITECTURE

client

Name Node

client Data Node

client Data Node

client Data Node

Store & Compute

HDFS ISSUES Handy •  Locking around metadata operations permitted by single name

node

•  File locking permitted by single name node

Frustrating

•  Difficult to get data in and out (ingest)

•  Name Node is single point of failure

•  Name Node is system bottleneck

GLUSTER FILE SYSTEM Gluster is an open source multi purpose DFS Features: •  Data Striping •  Global elastic hashing for file placement •  Basic and GEO Replication •  Full POSIX Compliant Interface •  Flexible architecture •  Supports Storage Resident Apps – Compute and Data on

same machine More Info: www.gluster.org

GLUSTER ARCHITECTURE

client

Trusted Peers

client Data Brick

client Data Brick

client Data Brick

Volu

me

Volu

me

Store & Compute

HCFS HCFS: Hadoop Compatible File System •  Implementing the o.a.h.fs.FileSystem interface not enough for

existing hadoop jobs to run on a different file system •  HDFS architecture created semantics and assumptions •  HCFS defines these semantics so any file system can replace

HDFS without fear of compatibility •  Open ongoing effort to define file system semantics decoupled

from architecture JIRA: issues.apache.org/jira/browse/HADOOP-9371

COMMON FILESYSTEM ATTRIBUTES •  Hierarchical structure of directories containing directories and

files

•  File contain between 0 and MAX_SIZE data

•  Directories contain 0 or more files or directories

•  Directories have no data, only child elements

NETWORK ASSUMPTIONS •  The final state of a file system after a network failure is

undefined

•  The immediate consistency state of a file system after a network failure is undefined

•  If a network failure can be reported to the client, the failure MUST be an instance of IOException

NETWORK FAILURE •  Any operation with a file system MAY signal an error by

throwing an instance of IOException

•  File system operations MUST NOT throw RuntimeException exceptions on the failure of a remote operations, authentication or other operational problems

•  Stream read operations MAY fail if the read channel has been idle for a file system specific period of time

•  Stream write operations MAY fail if the write channel has been idle for a file system specific period of time

•  Network failures MAY be raised in the Stream close() operation

ATOMICITY •  Rename of a file MUST be atomic •  Rename of a directory SHOULD be atomic •  Delete of a file MUST be atomic •  Delete of an empty directory MUST be atomic •  Recursive directory deletion MAY be atomic. Although HDFS

offers atomic recursive directory deletion, none of the other file systems that Hadoop supports offers such a guarantee -including the local file systems

•  mkdir() SHOULD be atomic •  mkdirs() MAY be atomic. [It is currently atomic on HDFS, but

this is not the case for most other filesystems -and cannot be guaranteed for future versions of HDFS]

CONCURRENCY •  The data added to a file during a write or append MAY be visible

while the write operation is in progress

•  If a client opens a file for a read() operation while another read() operation is in progress, the second operation MUST succeed. Both clients MUST have a consistent view of the same data

•  If a file is deleted while a read() operation is in progress, the read() operation MAY complete successfully. Implementations MAY cause read() operations to fail with an IOException instead

•  Multiple writers MAY open a file for writing. If this occurs, the outcome is undefined

•  Undefined: action of delete() while a write or append operation is in progress

CONSISTENCY The consistency model of a Hadoop file system is one-copy-update-semantics; partially generally that of a traditional Posix file system. •  Create: once the close() operation on an output stream writing a newly created file has

completed, in-cluster operations querying the file metadata and contents MUST immediately see the file and its data

•  Update: Once the close() operation on an output stream writing a newly created file has completed, in-cluster operations querying the file metadata and contents MUST immediately see the new data

•  Delete: once a delete() operation is on a file has completed, listStatus() , open() , rename() and append() operations MUST fail

•  When file is deleted then overwritten, listStatus() , open() , rename() and append() operations MUST succeed: the file is visible

•  Rename: after a rename has completed, operations against the new path MUST succeed; operations against the old path MUST fail

•  The consistency semantics out of cluster client MUST be the same as in-cluster clients: All clients calling read() on a closed file MUST see the same metadata and data until it is changed from a create() , append() , rename() and append() operation

REFERENCES Apache HCFS Wiki: wiki.apache.org/hadoop/HCFS Apache file Systems semantics JIRA: issues.apache.org/jira/browse/HADOOP-9371 Some of this text is taken from the working draft linked in above Jira, credit Steve Loughran et al.

The opinions expressed do not necessarily represent those of RedHat Inc. or any of its affiliates.

© Bradley Childs / [email protected]

AHUG Presentation: Fun with Hadoop File Systems

Technology

file metadata

created file

hadoop file system

closed file

hadoop compatible file

directories andfiles

traditional posix file

file placement basic