Top Banner
Distributed File Systems Distributed File Systems Brian Nielsen [email protected] [email protected]
30

Distributed File SystemsDistributed File Systems

Jan 14, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed File SystemsDistributed File Systems

Distributed File SystemsDistributed File Systems

Brian [email protected]@cs.aau.dk

Page 2: Distributed File SystemsDistributed File Systems

Distributed filesystemsDistributed filesystems• The most important intranet distributed

applicationpp– Sharing of data (cscw) and programs– Easy management and backup, economy– Fast reliable file-server HW (eg RAID)– Fast, reliable file-server HW (eg RAID)– Infrastructure for print+naming– User mobility

S it– Security• High transparency requirements• High performance requirementsHigh performance requirements• Today:

– Basic Distributed FS (emulate ordinary FS for clients diff t t )on different computers)

– No replication

Page 3: Distributed File SystemsDistributed File Systems

FilesFiles

• Unix Style: sequence of bytes+meta-dataUnix Style: sequence of bytes+meta data

Fil l th

T h i s i s a f i l e T T T T T T T T

File lengthCreation timestamp

Read timestampW it ti t

filePointer(offset)

Write timestampAttribute timestamp

Reference countO

Attributes, eg.

OwnerFile type

Access control list

Page 4: Distributed File SystemsDistributed File Systems

UNIX file system operationsUNIX file system operationsfiledes = open(name, mode) Opens an existing file with the given name.p ( , )filedes = creat(name, mode)

p g gCreates a new file with the given name.Both operations deliver a file descriptor referencing the openfile. The mode is read, write or both.

t t l (fil d ) Cl th fil fil dstatus = close(filedes) Closes the open file filedes.count =read(filedes,buffer, n)count = write(filedes, buffer, n)

Transfers n bytes from the file referenced by filedes to buffer.

Transfers n bytes to the file referenced by filedes from buffer.Both operations deliver the number of bytes actuallyBoth operations deliver the number of bytes actuallytransferred and advance the read-write pointer.

pos = lseek(filedes, offset,whence)

Moves the read-write pointer to offset (relative or absolute,depending on whence).

status = unlink(name) Removes the file name from the directory structure. If the filehas no other names, it is deleted.

status = link(name1, name2) Adds a new name (name2) for a file (name1).status = stat(name buffer) Gets the file attributes for file name into bufferstatus = stat(name, buffer) Gets the file attributes for file name into buffer.

Page 5: Distributed File SystemsDistributed File Systems

Semantics of File SharingSemantics of File SharingMethod Comment

UNIXEvery operation on a file is instantly visible to all processes:

d ti t th ff t f th l t it tiUNIX semantics

a read operation returns the effect of the last write operationCan only be implemented for remote access models in which there is only a single copy of the fileNo changes are visible to other processes until the file is closed.

Session semantics

No changes are visible to other processes until the file is closed.The effects of read and write operations are seen only to the client that has opened (a local copy) of the file. When the file is closed, only one client’s writes remain

Immutable files No updates are possible; simplifies sharing and replication

Transaction semantics

All changes occur atomically. The file system supports transactions on a single file

• Four ways of dealing with the shared files in a

semanticsIssue: how to allow concurrent access to a physically distributed file

Four ways of dealing with the shared files in a distributed system.

Page 6: Distributed File SystemsDistributed File Systems

File System ModelsFile System Models

Remote access model Upload/download model

Page 7: Distributed File SystemsDistributed File Systems

The Sun Network File System (NFS)

• An implementation and a specification (RFC) ofAn implementation and a specification (RFC) of a software system for accessing remote files across LANs (or WANs)

• SUN 1985• RPC/XDR based protocolp• Goals

– Access transparencyy– Heterogeneous, – OS Independent

• Mounting and the actual remote-file-access are distinct services

Page 8: Distributed File SystemsDistributed File Systems

NFS ProtocolNFS Protocol• Provides a set of remote procedure calls for remote file

operations. hi f fil ithi di t– searching for a file within a directory

– reading a set of directory entries – manipulating links and directories

i fil tt ib t– accessing file attributes– reading and writing files

• NFS servers are stateless; each request has to provide a full set of arguments

(NFS V4 is becoming available – very different, stateful)

• The NFS protocol does not provide concurrency-control mechanisms

Page 9: Distributed File SystemsDistributed File Systems

NFS architectureClient computer Server computer

Application Application

UNIX kernel

system calls

ppprogram

ppprogram

UNIX

UNIX kernelUNIX kernel

Local Remote

UNIX NFS NFS UNIX

Virtual file systemVirtual file systemst

em

protocol

filesystem

NFSclient

NFSserver file

systemNFSO

ther

file

sys

RPC/XDRRPC/XDR

•Virtual File System (VFS) provides a standard file system interface that hides the difference between accessing local or remote file systems.•V node = virtual file identifier (remote/local ID)•V-node = virtual file identifier (remote/local, ID)

• ID= i-node number, if local• ID=fileHandle, if remote (File-Sys id, i-node, i-node-generation)

Page 10: Distributed File SystemsDistributed File Systems

NFS server operations (simplified) – 1

lookup(dirfh, name) → fh, attr Returns file handle and attributes for the file name in thedirectory dirfhdirectory dirfh.

create(dirfh, name, attr) →newfh, attr

Creates a new file name in directory dirfh with attributesattr and returns the new file handle and attributes.

remove(dirfh, name) → status Removes file name from directory dirfh.getattr(fh) → attr Returns file attributes of file fh. (Similar to the UNIX stat

system call.)setattr(fh, attr) → attr Sets the attributes (mode, user id, group id, size,

access time and modify time of a file) Setting the sizeaccess time and modify time of a file). Setting the size to 0 truncates the file.

read(fh, offset, count) →attr, data

Returns up to count bytes of data from a file starting at offset.Also returns the latest attributes of the file.

write(fh offset count data) W it t b t f d t t fil t ti t ff twrite(fh, offset, count, data) →attr

Writes count bytes of data to a file starting at offset.Returns the attributes of the file after the write has taken place.

rename(dirfh, name, todirfh,toname) → status

Changes the name of file name in directory dirfh to tonamein directory to todirfh.

link(newdirfh, newname, dirfh,name) → status

Creates an entry newname in the directory newdirfh whichrefers to file name in the directory dirfh.

Continues on next slide .

Page 11: Distributed File SystemsDistributed File Systems

NFS server operations (simplified) – 2

symlink(newdirfh, newname,string) status

Creates an entry newname in the directory newdirfh of typesymbolic link with the value string The server does notstring) → status symbolic link with the value string. The server does notinterpret the string but makes a symbolic link file to hold it.

readlink(fh) → string Returns the string that is associated with the symbolic link fileidentified by fh.

mkdir(dirfh, name, attr) →newfh, attr

Creates a new directory name with attributes attr andreturns the new file handle and attributes.

rmdir(dirfh, name) → status Removes the empty directory name from the parentdirectory dirfh Fails if the directory is not emptydirectory dirfh. Fails if the directory is not empty.

readdir(dirfh, cookie, count) →entries

Returns up to count bytes of directory entries from thedirectory dirfh. Each entry contains a file name, a file handle,and an opaque pointer to the next directory entry, called acookie. The cookie is used in subsequent readdir calls to startreading from the following entry. If the value of cookie is 0,reads from the first entry in the directory.

statfs(fh) → fsstats Returns file system information (such as block size, numberstat s( ) sstats Returns file system information (such as block size, numberof free blocks and so on) for the file system containing a file fh.

Page 12: Distributed File SystemsDistributed File Systems

Simple Example: NFS RPC f R di FilNFS RPCs for Reading a File

• Where are RPCs for close()?

24

Where are RPCs for close()?• File Pointer supplied at each R/W operation?

Page 13: Distributed File SystemsDistributed File Systems

Fault-ToleranceFault Tolerance• No open / close!No open / close!• File-pointer supplied at each invocation• Operations are Idempotent• Operations are Idempotent

– Repeated invocations leaves server in same statestate

• Server is State-less!– Server crash: Client can continue unaffectedServer crash: Client can continue unaffected

when server recovers– Client crash: No state to be cleaned up at

server

Page 14: Distributed File SystemsDistributed File Systems

CachingCaching• Store recently accessed disk-blocks locally in main y y

memory• Needed for good performance

– disk access time registersL0:

disk access time– network latency,– bandwidth

• Exploit memory hierarchy

L1

main memory

L2

L1:

L2:

L3:• Exploit memory hierarchy– locality-of-reference– local access is fast(er)

main memory

local disksL4:

• Caching in Normal Unix FS– Read-ahead– Delayed-write (write dirty blocks every 30s)

tape storageL5:

Page 15: Distributed File SystemsDistributed File Systems

Caching in NFSCaching in NFS• Server-side caching

clientcache

Servercache

– Read operations: easy.– Write operations:

• Write-through, orWrite through, or• Delayed-write: flush on commit operation (+file close)

• Client-side cachingg– Consistency problems when several clients holds

copies of the same blocks

Client 1 Client 2Server2 read

“Hello”1 read

“Hello”“Hello”

HelloHello“HelloWorld”3 write“HelloWorld”

Page 16: Distributed File SystemsDistributed File Systems

Client cache check in NFSClient cache check in NFS• Time stamps based validation• Client validation before use of cache contentsC e t a dat o be o e use o cac e co te ts

– TC is the time of the last validation of cached block • Tm-server is the modification timestamp stored at server• Tm-client is the modification timestamp stored at client

– T=current timet is the freshness interval– t is the freshness interval

• (T- TC < t) or (Tm-client = Tm-server)– T obtained through getattr polling before cache entry– Tm obtained through getattr polling before cache entry

is used– t is 3-30s adaptive (compromise between consistency

and efficiency)

Page 17: Distributed File SystemsDistributed File Systems

Inconsistency TimeInconsistency Time

Client 2 polls freshness interval

Client 1write

serverwrite

Client 1Commit

(close/sync)

• Optional block I/O daemon perform

(close/sync)

commit and read-ahead

Page 18: Distributed File SystemsDistributed File Systems

NFS GoalsNFS Goals• Access transparency : yesp y y• Location transparency : yes, (dependent on

mounting)• Failure transparency : partial• Failure transparency : partial• Mobility transparency : yes, (with update of

mount tables))• Replication transparency : no• HW/SW heterogeneity: Yes

C i i i• Consistency: approximation to one-copy semantics (3 sec lag)

• Scalability : noScalability : no

Page 19: Distributed File SystemsDistributed File Systems

PerformancePerformance

• Early experiencesEarly experiences– Getattr polling (many optimizations needed)

• Piggy-backing on every operation• Piggy-backing on every operation• Apply attributes to all cached blocks

– Write-through cache at server (no commit)Write through cache at server (no commit)– Few writes

• LADDIS Benchmark• LADDIS Benchmark• Effective in LAN intranets

Page 20: Distributed File SystemsDistributed File Systems

The Andrews file system (AFS)

• A distributed computing environment under development p g psince 1983 at Carnegie-Mellon University

• AFS 1, AFS 2, AFS-3• Available today eg. from www.openafs.org/• Design objectives

Highly scalable: targeted to span over 5000– Highly scalable: targeted to span over 5000 workstations.

– Secure: Little discussed here (see the above paper)( p p )• Whole-file-serving• Whole-file-caching (on client’s disk)• Shared vs. private files• Clients more independent of server than NFS

Page 21: Distributed File SystemsDistributed File Systems

Basic ideaBasic idea• A user process issues an open operation on a

shared files not in the local cache The clientshared files not in the local cache. The client requests a copy of the file

• The copy is cached on the local file system, it isThe copy is cached on the local file system, it is opened, and the user process can continue.

• Read and write operations are performed on the p plocal copy

• When the user process performs a closep poperation, and if the file has been modified, it is copied back to the server. The server installs the

i f th fil d d t th l tnew version of the file, and updates the last modified timestamp for the file.

Page 22: Distributed File SystemsDistributed File Systems

Why AFSWhy AFS• For infrequently updated files, the cached copies

remain valid for long periods (e g systemremain valid for long periods (e.g. system binaries)

• Large caches are possible• Large caches are possible• The following observations: (Unix Workload)

– Files are small (often less than 10Kb)– Files are small (often less than 10Kb)– Reads are more common than writes– Sequential access is commonq– Most files are read and written by only one user– When a file is shared it is often only one user who

modifies it– Files are referenced in bursts.

Page 23: Distributed File SystemsDistributed File Systems

Distribution of processes in the Andrew File System

Workstations Servers

VenusU VenusUserprogram

ViceUNIX kernel

VenusNetwork

UNIX kernel

Userprogram

Vice

UNIX kernel

VenusUNIX kernel

UserprogramUNIX kernel

Page 24: Distributed File SystemsDistributed File Systems

The main components of the Vice service interface

F t h(fid) tt d t Returns the attributes (status) and optionally the contentsFetch(fid) → attr, dataReturns the attributes (status) and, optionally, the contentsof file identified by the fid and records a callback promiseon it.

Store(fid attr data) Updates the attributes and (optionally) the contents of aStore(fid, attr, data) p ( p y)specified file.

Create( ) → fid Creates a new file and records a callback promise on it.Remove(fid) Deletes the specified file.Remove(fid) Deletes the specified file.SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the

lock may be shared or exclusive. Locks that are not removed expire after 30 minutes.

ReleaseLock(fid) Unlocks the specified file or directory.RemoveCallback(fid) Informs server that a Venus process has flushed a file

from its cache.This call is made by a Vice server to a Venus processBreakCallback(fid) This call is made by a Vice server to a Venus process.It cancels the callback promise on the relevant file.

Page 25: Distributed File SystemsDistributed File Systems

Implementation of calls in AFSUser process UNIX kernel Venus (client) Net Vice (server)

open(FileName,mode)

If FileName refers to a file in shared file space pass the request to Venus.

Check list of files in local cache. If not present or there is no valid callbackthere is no valid callback promise send a request for the file to the Vice serverthat is custodian of thevolume containing thefile.

Transfer a copy of the file and a callback promise to the

Open the local file andreturn the file descriptorto the application

e

Place the copy of the file in the local file system, enter its local name in the local cache list and return the local name to UNIX

workstation. Log the callback promise.

to the application. the local name to UNIX.

read(FileDescriptor,Buffer, length)

Perform a normalUNIX read operationon the local copy.

write(FileDescriptor, Perform a normalBuffer, length) UNIX write operation

on the local copy.

close(FileDescriptor) Close the local copyand notify Venus thatthe file has been closed.

If the local copy hasbeen changed, send a

t th ViReplace the file contents and

copy to the Vice serverthat is the custodian ofthe file.

send a callback to all other clients holding callback promises on the file.

Page 26: Distributed File SystemsDistributed File Systems

Cache Consistency 1Cache Consistency 1• “call-back promise” is a token representing a p p g

promise made by server that it will notify the client when the cached file is modified by other clients

• Stored in client disk-cache • States: valid or cancelled

– Moves from valid to cancelled state when callback is received

– Client access to file with cancelled call-back promisef h f h f=> fetch fresh copy from server

– Client access to file with valid call-back promise => use local copy

Page 27: Distributed File SystemsDistributed File Systems

Cache Consistency 2Cache Consistency 2• Client Crash: missed callbacks!

– State of callbacks uncertain– First use after restart: send cache validation request

to server to check timestampto server to check timestamp• Communication Failures

– No communication with server for T minutes:No communication with server for T minutes:– Renew callback (leasing principle)

• Server Crash (State-full)( )– List of clients with callback promises stored on disk– With atomic update

Page 28: Distributed File SystemsDistributed File Systems

Update Semantics• Unix

• one-copy semantics• there is one copy of the file and each write is destructive• there is one copy of the file, and each write is destructive

(i.e., “last write wins”)

• NFS• one-copy semantics, except:

• clients may have out-of-date cache entries for brief periods of time when files are sharedthi l d t i lid it t th• this can lead to invalid writes at the server

• AFS• one-copy semantics, except:py , p

• if a callback message is lost, a client will continue working with an out-of-date copy for at most T minutes

• If two clients writes to the same file concurrently => last to close wins (Use locking if needed)

Page 29: Distributed File SystemsDistributed File Systems

Failure Performance• When an NFS server fails, everything fails

• all accesses have apparent local semantics (except for “ ft t ”)“soft mounts”)

• when a server fails, it is as though the local disk has become unobtainable

• since authentication files are often stored on NFS servers, this brings down the entire system

• When an AFS server fails life (partly) goes on• When an AFS server fails, life (partly) goes on• all locally cached files remain available• work is still possible, though there is a higher chance

f fl f h d f lof conflict for shared files

Page 30: Distributed File SystemsDistributed File Systems

ENDEND