11: Google Filesystem Zubair Nabi [email protected] April 20, 2013 Zubair Nabi 11: Google Filesystem April 20, 2013 1 / 29
11: Google Filesystem
Zubair Nabi
April 20, 2013
Zubair Nabi 11: Google Filesystem April 20, 2013 1 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 2 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 3 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
4 Examples include FAT, NTFS, ext3, ext4, etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
4 Examples include FAT, NTFS, ext3, ext4, etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
4 Examples include FAT, NTFS, ext3, ext4, etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Filesystem
The purpose of a filesystem is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
4 Examples include FAT, NTFS, ext3, ext4, etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names
I Simplifies replication and migration
Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names
I Simplifies replication and migration
Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.
Maintains a namespace which maps logical names to physical namesI Simplifies replication and migration
Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names
I Simplifies replication and migration
Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names
I Simplifies replication and migration
Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Distributed filesystem
Self-explanatory: the filesystem is distributed across many machines
The DFS provides a common abstraction to the dispersed files
Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names
I Simplifies replication and migration
Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.
Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 6 / 29
Introduction
Designed by Google to meet its massive storage needs
Shares many goals with previous distributed filesystems such asperformance, scalability, reliability, and availability
At the same time, design driven by key observations of their workloadand infrastructure, both current and future
Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
Introduction
Designed by Google to meet its massive storage needs
Shares many goals with previous distributed filesystems such asperformance, scalability, reliability, and availability
At the same time, design driven by key observations of their workloadand infrastructure, both current and future
Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
Introduction
Designed by Google to meet its massive storage needs
Shares many goals with previous distributed filesystems such asperformance, scalability, reliability, and availability
At the same time, design driven by key observations of their workloadand infrastructure, both current and future
Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29
Design Goals
1 Failure is the norm rather than the exception: The GFS mustconstantly introspect and automatically recover from failure
2 The system stores a fair number of large files: Optimize for largefiles, on the order of GBs, but still support small files
3 Applications prefer to do large streaming reads of contiguousregions: Optimize for this case
Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
Design Goals
1 Failure is the norm rather than the exception: The GFS mustconstantly introspect and automatically recover from failure
2 The system stores a fair number of large files: Optimize for largefiles, on the order of GBs, but still support small files
3 Applications prefer to do large streaming reads of contiguousregions: Optimize for this case
Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
Design Goals
1 Failure is the norm rather than the exception: The GFS mustconstantly introspect and automatically recover from failure
2 The system stores a fair number of large files: Optimize for largefiles, on the order of GBs, but still support small files
3 Applications prefer to do large streaming reads of contiguousregions: Optimize for this case
Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29
Design Goals (2)
4 Most applications perform large, sequential writes that are mostlyappend operations: Support small writes but do not optimize for them
5 Most operations are producer-consume queues or many-waymerging: Support concurrent reads or writes by hundreds of clientssimultaneously
6 Applications process data in bulk at a high rate: Favour throughputover latency
Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
Design Goals (2)
4 Most applications perform large, sequential writes that are mostlyappend operations: Support small writes but do not optimize for them
5 Most operations are producer-consume queues or many-waymerging: Support concurrent reads or writes by hundreds of clientssimultaneously
6 Applications process data in bulk at a high rate: Favour throughputover latency
Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
Design Goals (2)
4 Most applications perform large, sequential writes that are mostlyappend operations: Support small writes but do not optimize for them
5 Most operations are producer-consume queues or many-waymerging: Support concurrent reads or writes by hundreds of clientssimultaneously
6 Applications process data in bulk at a high rate: Favour throughputover latency
Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29
Interface
The interface is similar to traditional filesystems but no support for astandard POSIX-like API
Files are organized hierarchically into directories with pathnames
Support for create, delete, open, close, read, and write operations
Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
Interface
The interface is similar to traditional filesystems but no support for astandard POSIX-like API
Files are organized hierarchically into directories with pathnames
Support for create, delete, open, close, read, and write operations
Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
Interface
The interface is similar to traditional filesystems but no support for astandard POSIX-like API
Files are organized hierarchically into directories with pathnames
Support for create, delete, open, close, read, and write operations
Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29
Architecture
Consists of a single master and multiple chunkservers
The system can be accessed by multiple clients
Both the master and chunkservers run as user-space server processeson commodity Linux machines
Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
Architecture
Consists of a single master and multiple chunkservers
The system can be accessed by multiple clients
Both the master and chunkservers run as user-space server processeson commodity Linux machines
Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
Architecture
Consists of a single master and multiple chunkservers
The system can be accessed by multiple clients
Both the master and chunkservers run as user-space server processeson commodity Linux machines
Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bithandle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers
I 3 by default
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bithandle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers
I 3 by default
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bithandle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers
I 3 by default
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bithandle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byterange
Each chunk is replicated on multiple chunkserversI 3 by default
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bithandle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers
I 3 by default
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Files
Files are sliced into fixed-size chunks
Each chunk is identifiable by an immutable and globally unique 64-bithandle
Chunks are stored by chunkservers as local Linux files
Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers
I 3 by default
Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29
Master
In charge of all filesystem metadata
I Namespace, access control information, mapping between files andchunks, and current locations of chunks
I Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunkmigration
Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers
I As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadataI Namespace, access control information, mapping between files and
chunks, and current locations of chunks
I Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunkmigration
Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers
I As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadataI Namespace, access control information, mapping between files and
chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunkmigration
Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers
I As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadataI Namespace, access control information, mapping between files and
chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunkmigration
Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers
I As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadataI Namespace, access control information, mapping between files and
chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunkmigration
Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructions
Clients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers
I As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadataI Namespace, access control information, mapping between files and
chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunkmigration
Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers
I As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Master
In charge of all filesystem metadataI Namespace, access control information, mapping between files and
chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file
Also in charge of chunk leasing, garbage collection, and chunkmigration
Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers
I As a result, the master does not become a performance bottleneck
Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29
Zubair Nabi 11: Google Filesystem April 20, 2013 14 / 29
Consistency Model: Master
All namespace mutations (such as file creation) are atomic as they areexclusively handled by the master
Namespace locking guarantees atomicity and correctness
The operation log maintained by the master defines a global total orderof these operations
Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
Consistency Model: Master
All namespace mutations (such as file creation) are atomic as they areexclusively handled by the master
Namespace locking guarantees atomicity and correctness
The operation log maintained by the master defines a global total orderof these operations
Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
Consistency Model: Master
All namespace mutations (such as file creation) are atomic as they areexclusively handled by the master
Namespace locking guarantees atomicity and correctness
The operation log maintained by the master defines a global total orderof these operations
Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29
Consistency Model: Data
The state after mutation depends on:I Mutation type: write or append
I Whether it succeeds or failsI Whether there are other concurrent mutations
A file region is consistent if all clients see the same data, regardlessof the replica
A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or fails
I Whether there are other concurrent mutations
A file region is consistent if all clients see the same data, regardlessof the replica
A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or failsI Whether there are other concurrent mutations
A file region is consistent if all clients see the same data, regardlessof the replica
A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or failsI Whether there are other concurrent mutations
A file region is consistent if all clients see the same data, regardlessof the replica
A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data
The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or failsI Whether there are other concurrent mutations
A file region is consistent if all clients see the same data, regardlessof the replica
A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety
Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined andconsistent
Concurrent and successful mutations leave the region undefined butconsistent
I Mingled fragments from multiple mutations
A failed mutation makes the region both inconsistent and undefined
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined andconsistentConcurrent and successful mutations leave the region undefined butconsistent
I Mingled fragments from multiple mutations
A failed mutation makes the region both inconsistent and undefined
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined andconsistentConcurrent and successful mutations leave the region undefined butconsistent
I Mingled fragments from multiple mutations
A failed mutation makes the region both inconsistent and undefined
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Consistency Model: Data (2)
If there are no other concurrent writers, the region is defined andconsistentConcurrent and successful mutations leave the region undefined butconsistent
I Mingled fragments from multiple mutations
A failed mutation makes the region both inconsistent and undefined
Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29
Mutation Operations
Each chunk has many replicas
The primary replica holds a lease from the master
It decides the order of all mutations for all replicas
Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
Mutation Operations
Each chunk has many replicas
The primary replica holds a lease from the master
It decides the order of all mutations for all replicas
Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
Mutation Operations
Each chunk has many replicas
The primary replica holds a lease from the master
It decides the order of all mutations for all replicas
Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29
Write Operation
Client obtains the location of replicas and the identity of the primaryreplica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primaryreplica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primaryreplica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primaryreplica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Write Operation
Client obtains the location of replicas and the identity of the primaryreplica from the master
It then pushes the data to all replica nodes
The client issues an update request to primary
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client
Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client
1 If the records fits in the current chunk, it is written and communicated tothe client
2 If it does not, the chunk is padded and the client is told to try the nextchunk
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client
1 If the records fits in the current chunk, it is written and communicated tothe client
2 If it does not, the chunk is padded and the client is told to try the nextchunk
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicas
It waits for a reply from all replicas before returning to the client1 If the records fits in the current chunk, it is written and communicated to
the client2 If it does not, the chunk is padded and the client is told to try the next
chunk
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client
1 If the records fits in the current chunk, it is written and communicated tothe client
2 If it does not, the chunk is padded and the client is told to try the nextchunk
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client
1 If the records fits in the current chunk, it is written and communicated tothe client
2 If it does not, the chunk is padded and the client is told to try the nextchunk
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Record Append Operation
Performed atomically
Append location chosen by the GFS and communicated to the client
Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client
1 If the records fits in the current chunk, it is written and communicated tothe client
2 If it does not, the chunk is padded and the client is told to try the nextchunk
Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29
Zubair Nabi 11: Google Filesystem April 20, 2013 21 / 29
Application Safeguards
Use record append rather than write
Insert checksums in record headers to detect fragments
Insert sequence numbers to detect duplicates
Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
Application Safeguards
Use record append rather than write
Insert checksums in record headers to detect fragments
Insert sequence numbers to detect duplicates
Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
Application Safeguards
Use record append rather than write
Insert checksums in record headers to detect fragments
Insert sequence numbers to detect duplicates
Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29
Chunk Placement
Put on chunkservers with below average disk space usage
Limit number of “recent” creations on a chunkserver, to ensure that itdoes not experience any traffic spike due to its fresh data
For reliability, replicas spread across racks
Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
Chunk Placement
Put on chunkservers with below average disk space usage
Limit number of “recent” creations on a chunkserver, to ensure that itdoes not experience any traffic spike due to its fresh data
For reliability, replicas spread across racks
Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
Chunk Placement
Put on chunkservers with below average disk space usage
Limit number of “recent” creations on a chunkserver, to ensure that itdoes not experience any traffic spike due to its fresh data
For reliability, replicas spread across racks
Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at deletetime
Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal
Master pinpoints chunks which have been orphaned
The chunkserver finally reclaims that space
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at deletetime
Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal
Master pinpoints chunks which have been orphaned
The chunkserver finally reclaims that space
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at deletetime
Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal
Master pinpoints chunks which have been orphaned
The chunkserver finally reclaims that space
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at deletetime
Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal
Master pinpoints chunks which have been orphaned
The chunkserver finally reclaims that space
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Garbage Collection
Chunks become garbage when they are orphaned
A lazy reclamation strategy is used by not reclaiming chunks at deletetime
Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal
Master pinpoints chunks which have been orphaned
The chunkserver finally reclaims that space
Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29
Stale Replica Detection
Each chunk is assigned a version number
Each time a new lease is granted, the version number is incremented
Stale replicas will have outdated version numbers
They are simply garbage collected
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Stale Replica Detection
Each chunk is assigned a version number
Each time a new lease is granted, the version number is incremented
Stale replicas will have outdated version numbers
They are simply garbage collected
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Stale Replica Detection
Each chunk is assigned a version number
Each time a new lease is granted, the version number is incremented
Stale replicas will have outdated version numbers
They are simply garbage collected
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Stale Replica Detection
Each chunk is assigned a version number
Each time a new lease is granted, the version number is incremented
Stale replicas will have outdated version numbers
They are simply garbage collected
Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29
Outline
1 Introduction
2 Google Filesystem
3 Hadoop Distributed Filesystem
Zubair Nabi 11: Google Filesystem April 20, 2013 26 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are calledDataNodes
Chunks are known as blocks
Exposes a Java API and a command-line interface
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are calledDataNodes
Chunks are known as blocks
Exposes a Java API and a command-line interface
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are calledDataNodes
Chunks are known as blocks
Exposes a Java API and a command-line interface
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are calledDataNodes
Chunks are known as blocks
Exposes a Java API and a command-line interface
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Introduction
Open-source clone of GFS
Comes packaged with Hadoop
Master is called the NameNode and chunkservers are calledDataNodes
Chunks are known as blocks
Exposes a Java API and a command-line interface
Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29
Command-line API
Accessible through: bin/hdfs dfs -command args
Useful commands: cat, copyFromLocal, copyToLocal, cp,ls, mkdir, moveFromLocal, moveToLocal, mv, rm, etc.1
1http://hadoop.apache.org/docs/r1.0.4/file_system_shell.htmlZubair Nabi 11: Google Filesystem April 20, 2013 28 / 29
Command-line API
Accessible through: bin/hdfs dfs -command args
Useful commands: cat, copyFromLocal, copyToLocal, cp,ls, mkdir, moveFromLocal, moveToLocal, mv, rm, etc.1
1http://hadoop.apache.org/docs/r1.0.4/file_system_shell.htmlZubair Nabi 11: Google Filesystem April 20, 2013 28 / 29
References
1 Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. TheGoogle file system. In Proceedings of the nineteenth ACM symposiumon Operating systems principles (SOSP ’03). ACM, New York, NY,USA, 29-43.
Zubair Nabi 11: Google Filesystem April 20, 2013 29 / 29