High performing, scalable, distributed
file system.
Batch oriented , data-intensive apps.
Fault-tolerant.
Inexpensive commodity hardware.
Google uses the GFS to organize huge
files and to allow application
developers,the research and
development resources they require.
GFS is unique to Google and isn't for sale.
Inexpensive commodity hardware.
Modest number of large files.
Large streaming reads, small random
reads. (map-reduce)
Mostly appends.
Consistent concurrent execution is
important.
High throughput and low latency.
Control (metadata) requests to master
server
Data requests to chunkservers
Caches metadata
No caching of data
DVS 8
Manages metadata
Manages chunk creation, replication,
placement
Performs check pointing and logging of
changes to metadata
Garbage Collection
Periodically communicate with chunkservers
(Heart Beat Message)
DVS 9
• Files are divided into fixed-size chunks
• Chunk Servers store chunks on local disk as Linux
files
• Unique 64 bit chunkhandle
• Replication for Reliability
• Chunk Size : 64MB - Much larger than typical file
system block sizes
DVS 10
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
• Update operation log
• update metadata
rack 2rack 1
Create /home/user/filename
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
• Update operation log
• update metadata
• choose locations for chunks
• across multiple racks
• across multiple networks
• machines with low contention
• machines with low disk use
rack 2rack 1
Create /home/user/filename
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
• Update operation log
• update metadata
• choose locations for chunks
rack 2rack 1
Returns chunk handle,
Chunk locations
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
Chunk id,
chunk offset
GFS Chunk
Server
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
Chunkserver locations
(caches this)
GFS Chunk
Server
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server
GFS Master
GFS Chunk
Server
data
Pass along data to nearest replica
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
Serializes all concurrent writes
GFS Chunk
Server
operation
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
GFS Chunk
Server
serialized order of writes
GFS Client
Application
GFS Chunk
Server
GFS Chunk
Server…
GFS Master
GFS Chunk
Server
ack, chunk index
Extremely cheap hardware
› High failure rate
Highly concurrent reads and writes
Highly scalable
Supports undelete (for configurable
time)