Top Banner
33
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Google file system
Page 2: Google file system

High performing, scalable, distributed

file system.

Batch oriented , data-intensive apps.

Fault-tolerant.

Inexpensive commodity hardware.

Page 3: Google file system

Google uses the GFS to organize huge

files and to allow application

developers,the research and

development resources they require.

GFS is unique to Google and isn't for sale.

Page 4: Google file system

Inexpensive commodity hardware.

Modest number of large files.

Large streaming reads, small random

reads. (map-reduce)

Mostly appends.

Consistent concurrent execution is

important.

High throughput and low latency.

Page 5: Google file system

Need large, distributed, highly fault

tolerant file system

Page 6: Google file system
Page 7: Google file system

GFS CLIENT

GFS MASTER SERVER

GFS CHUNKSERVER

Page 8: Google file system

Control (metadata) requests to master

server

Data requests to chunkservers

Caches metadata

No caching of data

DVS 8

Page 9: Google file system

Manages metadata

Manages chunk creation, replication,

placement

Performs check pointing and logging of

changes to metadata

Garbage Collection

Periodically communicate with chunkservers

(Heart Beat Message)

DVS 9

Page 10: Google file system

• Files are divided into fixed-size chunks

• Chunk Servers store chunks on local disk as Linux

files

• Unique 64 bit chunkhandle

• Replication for Reliability

• Chunk Size : 64MB - Much larger than typical file

system block sizes

DVS 10

Page 11: Google file system

CREATE

READ

WRITE

Page 12: Google file system
Page 13: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Create /home/user/filename

Page 14: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

• Update operation log

• update metadata

rack 2rack 1

Create /home/user/filename

Page 15: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

• Update operation log

• update metadata

• choose locations for chunks

• across multiple racks

• across multiple networks

• machines with low contention

• machines with low disk use

rack 2rack 1

Create /home/user/filename

Page 16: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

• Update operation log

• update metadata

• choose locations for chunks

rack 2rack 1

Returns chunk handle,

Chunk locations

Page 17: Google file system
Page 18: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

filename and

chunk index

Page 19: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

chunk handle,

server locations

Page 20: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Chunk handle,

bit range

Page 21: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Data

Page 22: Google file system
Page 23: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Chunk id,

chunk offset

GFS Chunk

Server

Page 24: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Chunkserver locations

(caches this)

GFS Chunk

Server

Page 25: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server

GFS Master

GFS Chunk

Server

data

Pass along data to nearest replica

Page 26: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

Serializes all concurrent writes

GFS Chunk

Server

operation

Page 27: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

GFS Chunk

Server

serialized order of writes

Page 28: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

GFS Chunk

Server

ackack ack

Page 29: Google file system

GFS Client

Application

GFS Chunk

Server

GFS Chunk

Server…

GFS Master

GFS Chunk

Server

ack, chunk index

Page 30: Google file system
Page 31: Google file system

Extremely cheap hardware

› High failure rate

Highly concurrent reads and writes

Highly scalable

Supports undelete (for configurable

time)

Page 32: Google file system
Page 33: Google file system