Top Banner
An Introduction to NFS Avishay Traeger IBM Haifa Research Lab Internal Storage Course November 2010 v1.2
48

An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

Apr 14, 2018

Download

Documents

leminh@
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

An Introduction to NFS

Avishay Traeger

IBM Haifa Research Lab Internal Storage Course

―November 2010

v1.2

Page 2: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

2

Outline

The Basics NFSv2 NFSv3 NFSv4 NFSv4.1

Page 3: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

3

Typical Use

RAID Storage

nfsserv (NFS Server)

/

home …

avishay bob carl

/etc/exports:

/home 10.0.2.*(rw)

ws-avishay(NFS Client)

10.0.2.56

ws-bob(NFS Client)

10.0.2.103

ws-carl(NFS Client)

10.0.2.81

mount -t nfsnfsserv:/home /home

Some benefits of NFS: 1. All clients have the same view 2. Centralized storage management

Page 4: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

4

NFS Evolution

NFS is a standardized protocol

Version Year RFC # Pages Status

NFSv2 1989 1094 27 Obsolete

NFSv3 1995 1813 126 Most popular

NFSv4 2003 3530 275Available on several OSs, slowly

but surely replacing NFSv3

NFSv4.1 2010 5661 617 Early adopters only

Page 5: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

5

Design Goals

OS independence & interoperability Simple crash recovery for clients and servers Transparent access (client programs do not

know files are remote) Maintain local file system semantics Reasonable performance

Page 6: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

6

Remote Procedure Call (RPC)

NFS is defined as a set of RPCs – their arguments, results, and effects

RPCs are synchronous The use of RPCs makes the protocol easier to

understand

Page 7: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

7

NFS Client/Server

Page 8: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

8

Outline

The Basics NFSv2 NFSv3 NFSv4 NFSv4.1

Page 9: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

9

Stateless Protocol

The server does not keep state for RPCs Each RPC contains the necessary information

to complete the call This makes crash recovery easy Server crash: server does no crash recovery, clients

resubmit requests Client crash: no crash recovery for client or server

This is nice in theory, but Adds complexity Not really stateless...

File locking adds state, provided by separate protocol & daemon Server keeps an RPC reply cache to handle duplicate non-

idempotent RPC

Page 10: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

10

File Handles

The most common NFS procedure parameter is a structure called a file handle (fh, fhandle)

Provided by the server and used by the client to reference a file

The fhandle is opaque to the client New fhandles returned by LOOKUP, CREATE,

MKDIR, ... The fhandle for the root of the file system is

obtained by the client when it mounts the file system

Page 11: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

11

Operations

NULL() returns () Do nothing procedure to used for pinging the server

LOOKUP(dirfh, name) returns (fh, attr) Returns a new fh and attributes for the named file in

the directory specified by dirfh CREATE(dirfh, name, attr) returns (newfh,

attr) Creates a new file name in the directory dirfh and

returns the new fh and attributes. REMOVE(dirfh, name) returns (status) Removes the file name from directory dirfh.

Page 12: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

12

Operations

GETATTR(fh) returns (attr) Returns file attributes (similar to stat syscall)

SETATTR(fh, attr) returns (attr) Sets the mode, uid, gid, size, access time, and

modify time of a file. Setting the size to zero truncates the file.

Page 13: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

13

Operations

READ(fh, offset, count) returns (attr, data) Returns up to count bytes of data from a file starting

offset bytes into the file. Returns the attributes of the file.

WRITE(fh, offset, count, data) returns (attr) Writes count bytes of data to a file beginning at

offset bytes from the beginning of the file. Returns the new attributes of the file after the write.

Page 14: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

14

Operations

RENAME(dirfh, name, tofh, toname) returns (status)

Renames name in directory dirfh, to toname in directory tofh.

LINK(dirfh, name, tofh, toname) returns (status)

Creates a hard link toname in directory tofh, that points to name in directory dirfh.

Page 15: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

15

Operations

SYMLINK(dirfh, name, string) returns (status) Creates a symlink name in the directory dirfh with

value string. The server does not interpret the string argument in any way, just saves it and makes an association to the new symlink file.

READLINK(fh) returns (string) Returns the string which is associated with the

symlink file.

Page 16: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

16

Operations

MKDIR(dirfh, name, attr) returns (fh, newattr) Creates a new directory name in the directory dirfh

and returns the new fh and attributes. RMDIR(dirfh, name) returns (status) Removes the empty directory name from the parent

directory dirfh.

STATFS(fh) returns (fsstats) Returns file system information such as block size,

number of free blocks, etc.

Page 17: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

17

Operations

READDIR (dirfh, cookie, count) returns (entries)

Returns up to count bytes of directory entries from the directory dirfh.

Each entry contains a file name, file id, and an opaque pointer to the next directory entry called a cookie.

The cookie is used in subsequent readdir calls to start reading at a specific entry in the directory.

A readdir call with the cookie of zero returns entries starting with the first entry in the directory.

Page 18: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

18

The MOUNT Protocol

The MOUNT protocol takes a directory pathname and returns an fhandle if the client has permissions to mount the file system

Separate protocol Easier to plug in new permission check methods Separates the OS-dependent aspects of the protocol

Other OS implementations can change the MOUNT protocol without having to change the NFS protocol

Page 19: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

19

The Linux File Handle

Remember that information contained in the fhandle is only meaningful on the server

If the local FS on the server reuses an inode number, an NFS client could mistakenly use an old file handle and access the new file. File systems include generation numbers in the inode to avoid this. The value is usually taken from a counter used across the file system.

Important file handle fields: Major/minor number of the exported device Inode number Generation number

Page 20: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

20

Security

NFSv2 uses UNIX-style permission checks The client passes uid/gid info in RPCs, and

the server performs permission checks as if the user was performing the operation locally

Problem – the mapping from uid/gid to user must be the same on the client and server

Can be solved via Network Information Service (NIS) Another problem – should root on the client

have root access to files on the server? Server specifies policy

Page 21: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

21

Cache Consistency Problems

Clients use caching and write buffering to improve performance, but this causes issues

Problem: Update visibility; If client C1 buffers writes in its cache, client C2 will see the old version

NFSv2 solution: Close-to-open consistency – Clients flush on close(), so other clients will see the latest version on open()

Problem: Stale cache; If C1 has a file cached, it will see old data even if the file is updated by C2

NFSv2 solution: Send a GETATTR and check the file's modification time to see if it has been updated. Cache attributes for a few seconds to reduce the number of GETATTR calls.

Page 22: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

22

Strong Semantics for Write

Because the NFS server is stateless, when servicing an NFS request it must commit any modified data to stable storage before returning results

The implication for UNIX based servers is that requests which modify the file system must flush all modified data & metadata to disk before returning from the call

This can be a big performance bottleneck unless something is done to improve write performance (e.g., NetApp's WAFL file system)

Page 23: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

23

Outline

The Basics NFSv2 NFSv3 NFSv4 NFSv4.1

Page 24: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

24

Major Changes from NFSv2 to v3

Sizes and offsets are widened from 32 bits to 64 bits

A new COMMIT RPC allows for reliable asynchronous writes

A new ACCESS RPC improves support for ACLs and super-user

All operations now return attributes to reduce the number of subsequent GETATTR procedure calls

The 8KB data size limitation on the READ and WRITE procedures is relaxed

Page 25: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

25

Major Changes from NFSv2 to v3

A new READDIRPLUS RPC returns both file handle and attributes to eliminate LOOKUP calls when scanning a directory

Page 26: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

26

Asynchronous Writes

In NFSv3, the server can reply to WRITE RPCs immediately, without syncing to disk

When the client wants to ensure that data is on stable storage, it sends a COMMIT RPC

Asynchronous writes are optional, and negotiated at mount time

Page 27: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

27

Asynchronous Writes: Crash Recovery

The client must keep all uncommitted data in case of a server crash

Replies for WRITE and COMMIT RPCs include a write verifier for server crash detection

Write verifier: 8-byte value that the server must change if it crashes (many use boot time)

The client must save verifiers returned by async WRITE RPCs, and compare them to the verifier returned by a leter COMMIT RPC

If the verifiers don't match, the client must rewrite all uncommitted data

Page 28: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

28

Outline

The Basics NFSv2 NFSv3 NFSv4 NFSv4.1

Page 29: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

29

Additional Goals for NFSv4

Improved access and good performance on the Internet

Only TCP Easy to transit firewalls: uses one port (mount & lock

protocols merged into NFS) COMPOUNDs, delegations, uid/gid issue resolved

Strong security with negotiation built in Better cross-platform interoperability Better extensibility New security types, new attributes, etc.

Big design change – NFSv4 is stateful

Page 30: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

30

Security

For previous versions, only UNIX permissions were widely adopted

NFSv4 mandates the use of strong RPC security flavors that depend on cryptography

Security type negotiation is done securely and in-band

User and groups are identified with strings, not numbers

Access control policies compatible with both UNIX and Windows

The problematic MOUNT protocol is removed

Page 31: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

31

RPCSEC_GSS

A framework adopted by NFSv4 to provide authentication, integrity, and privacy at the RPC level

The following mechanisms must be implemented: Kerberos v5, LIPKEY, SPKM3

Security options are negotiated at mount time The SECINFO operation allows a client to

determine the security policy (usually on mount, but can be on a per-filehandle basis)

RPCSEC_GSS can be used with previous versions of NFS, but in NFSv4 support is mandatory

Page 32: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

32

Identifying Users

In v2 and v3, users and groups were represented as integers

This required all clients and the server to agree on user and group assignments - not practical (especially over the Internet)

NFSv4 uses strings ‘user@domain’ and ‘group@domain’, where domain represents a registered DNS domain or a sub-domain

On Linux, idmapd translates NFSv4 IDs

Page 33: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

33

COMPOUND Procedure

NFSv4 has 2 procedures: NULL and COMPOUND

The COMPOUND procedure can contain several operations (similar to previous NFS procedures)

Possible example: {LOOKUP, OPEN, READ} Operations are evaluated in order, and each

can have a return value If an operation fails, the server stops evaluating

the COMPOUND and returns

Page 34: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

34

Filehandles

Current filehandle: used by most operations Saved filehandle: used as an additional

operand

Example from Linux #1: WRITE request PUTFH(fh): set CURFH to the target file WRITE: write the data to the current file GETATTR: get attributes for the current file

Page 35: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

35

Filehandles

Example from Linux #2: CREATE request PUTFH(dirfh): set CURFH to the directory SAVEFH: save CURFH (SAVEDFH=CURFH) CREATE: create the file (CURFH=NEWFH) GETFH: return CURFH to the client GETATTR: get the attributes of the new file RESTOREFH: (CURFH=SAVEDFH) GETATTR: get the attributes of the directory

Page 36: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

36

Some Differences in Operations

CREATE now creates file, directories, and special files

LOOKUPP was introduced to look up the parent directory – no special meaning for ‘.’ and ‘..’ as in previous NFS versions (better cross-platform interoperability)

READDIRPLUS removed - READDIR now returns requested attributes

Page 37: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

37

Filehandle Types

In previous NFS versions, the fhandle was valid for the lifetime of the file system object

Now these fhandles are called “persistent filehandles”

“Volatile filehandles” may become invalid, but the client is prepared to deal with these semantics

Page 38: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

38

File System Migration/Replication

Migration The file system locations attribute provides a method

for the client to probe the server about the location of a file system

In the event of a file system migration, the client will receive an error when operating on the file system and it can then query as to the new location

Replication The client is able to query the server for the multiple

available locations of a particular file system From this information, the client can use its own

policies to access the appropriate file system location

Page 39: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

39

Attribute Types

Mandatory: minimal set of file or file system attributes that must be provided by the server

type, filehandle expiration type, change indicator, size, fsid, lease duration, etc.

Recommended: represent different file system types and operating environments

case insensitive, hidden, max file size, max read size, max write size, UNIX mode bits, owner string, group string, modify/create/access time, etc.

Named: Similar to extended attributes, implemented as hidden directories

ACLs: implemented as recommended attribute

Page 40: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

40

Pseudo Filesystems

In NFSv4, the server presents a single seamless view of all the exported file systems to a client

The client can use the fsid to notice changes mount -t nfs4 servername:/ /mnt/dir

Page 41: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

41

Client Caching

File, attribute, and directory caching is similar to previous versions: clients determine what to cache and for how long, and when to see if an update occurred

Close-to-open consistency Client checks if cached data is valid on OPEN Client writes modified data on CLOSE Sufficient for most applications and users

Page 42: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

42

Leases

A lease is a time-bounded grant of control of the state of a file, from the server to the client (lock or delegation)

During a lease interval a server may not grant conflicting control to another client

The client may assume that a lock granted by the server will remain valid for a fixed (server-specified) interval and is subject to renewal by the client

The client is responsible for refreshing the lease If the lease interval expires without a refresh from the

client, the server assumes the client has failed and may allow other clients to acquire the same lock

If the server fails, on reboot the server waits a duration equal to a lease interval for clients to reclaim the locks that they may still hold, before allowing any new lock requests

Page 43: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

43

File Locking

Support for byte-range file locking part of protocol

Lease-based model: lease state is stored on the server

Clients must either explicitly renew leases (RENEW), or implicitly renew them (usually READ)

A refresh of any lock by the client validates all locks held by the client to a particular server (reduces the number of refreshes)

Page 44: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

44

Delegations

The server may grant a read or write delegation for a file to a client

Read delegation: client is assured that no other client will write to the file for the duration of the delegation

Write delegation: like read delegation, but other clients may not read or write

Delegations may be recalled using a callback A callback is a server → client RPC A client must support callbacks in order to get a

delegation - tested with CB_NULL request Delegations allow clients to service operations

like OPEN, CLOSE, LOCK, READ, WRITE without immediate interaction with the server

Page 45: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

45

Outline

The Basics NFSv2 NFSv3 NFSv4 NFSv4.1

Page 46: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

46

Parallel NFS (pNFS)

Clients may now access storage devices directly and in parallel

Eliminates the classic NFS bottleneck of having only one server

The management protocol is NFSv4.1

The data protocol can be NFSv4.1, OSD, or FC

Page 47: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

47

Other NFSv4.1 Highlights

Sessions Session layer on top of the transport layer Solves many issues with dropped connections, as

well as client and server crashes

Delegation support for directories

Page 48: An Introduction to NFS - IBM · performance (e.g., NetApp's WAFL file system) 23 Outline The Basics

48

References

http://pages.cs.wisc.edu/~cs537-1/notes/34_file-nfs.pdf

RFC1094 - NFS version 2 RFC1813 - NFS version 3 RFC1831 - RPC: Remote Procedure Call Protocol Specification Version 2 RFC1832 - XDR: External Data Representation Standard RFC1964 - The Kerberos Version 5 GSS-API Mechanism RFC2025 - The Simple Public-Key GSS-API Mechanism (SPKM) RFC2054 - WebNFS Client Specification RFC2055 - WebNFS Server Specification RFC2203 - RPCSEC_GSS Protocol Specification RFC2224 - NFS URL Scheme RFC2581 - TCP Congestion Control RFC2623 - NFS Version 2 and Version 3 Security Issues and the NFS Protocol's Use of

RPCSEC_GSS and Kerberos V5 RFC2624 - NFS Version 4 Design Considerations RFC2224 - Security Negotiation for WebNFS RFC2743 - Generic Security Service Application Program Interface, Version 2, Update 1 RFC2847 - LIPKEY - A Low Infrastructure Public Key Mechanism Using SPKM RFC3010 - NFS version 4 Protocol (Obsoleted by RFC3530) RFC3530 - NFS version 4 Protocol RFC5661 - NFS version 4 Minor Version 1 Protocol