Top Banner
P.J.Braam/CMU -- 1 Linux Virtual File System Peter J. Braam
32

Vfs

May 06, 2015

Download

Education

Waqas !!!!

Virtual file system by Waqas
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Vfs

P.J.Braam/CMU -- 1

Linux Virtual File System

Peter J. Braam

Page 2: Vfs

P.J.Braam/CMU -- 2

Aims

• Present the data structures in Linux VFS

• Provide information about flow of control

• Describe methods and invariants needed to implement a new file system

• Illustrate with some examples

Page 3: Vfs

P.J.Braam/CMU -- 3

File access

History

• BSD implemented VFS for NFS: aim dispatch to different filesystems

• VMS had elaborate filesystem

• NT/Win95 have VFS type interfaces

• Newer systems integrate VM with buffer cache.

Page 4: Vfs

P.J.Braam/CMU -- 4

Linux Filesystems

• Media based– ext2 - Linux native– ufs - BSD– fat - DOS FS– vfat - win 95– hpfs - OS/2– minix - well….– Isofs - CDROM– sysv - Sysv Unix– hfs - Macintosh– affs - Amiga Fast FS– NTFS - NT’s FS– adfs - Acorn-strongarm

• Network– nfs– Coda – AFS - Andrew FS– smbfs - LanManager– ncpfs - Novell

• Special ones– procfs -/proc – umsdos - Unix in DOS– userfs - redirector to user

Page 5: Vfs

P.J.Braam/CMU -- 5

Linux Filesystems (ctd)

• Forthcoming:– devfs - device file system– DFS - DCE distributed

FS• Varia:

– cfs - crypt filesystem– cfs - cache filesystem– ftpfs - ftp filesystem– mailfs - mail filesystem– pgfs - Postgres versioning

file system

• Linux serves (unrelated to the VFS!)– NFS - user & kernel– Coda– AppleShare -

netatalk/CAP– SMB - samba– NCP - Novell

Page 6: Vfs

P.J.Braam/CMU -- 6

Linux is Obsolete

Andrew Tanenbaum

Usefulness

Page 7: Vfs

P.J.Braam/CMU -- 7

File access

Linux VFS

• Multiple interfaces build up VFS:– files– dentries – inodes– superblock – quota

• VFS can do all caching & provides utility fctns to FS

• FS provides methods to VFS; many are optional

Page 8: Vfs

P.J.Braam/CMU -- 8

User level file access

• Typical user level types and code:– pathnames: “/myfile”

– file descriptors: fd = open(“/myfile”…)

– attributes in struct stat: stat(“/myfile”, &mybuf), chmod, chown...

– offsets: write, read, lseek

– directory handles: DIR *dh = opendir(“/mydir”)

– directory entries: struct dirent *ent = readdir(dh)

Page 9: Vfs

P.J.Braam/CMU -- 9

VFS

• Manages kernel level file abstractions in one format for all file systems

• Receives system call requests from user level (e.g. write, open, stat, link)

• Interacts with a specific file system based on mount point traversal

• Receives requests from other parts of the kernel, mostly from memory management

Page 10: Vfs

P.J.Braam/CMU -- 10

File system level

• Individual File Systems– responsible for managing file & directory data

– responsible for managing meta-data: timestamps, owners, protection etc

– translates data between

• particular FS data: e.g. disk data, NFS data, Coda/AFS data

• VFS data: attributes etc in standard format

– e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so.

Page 11: Vfs

P.J.Braam/CMU -- 11

Anatomy of stat system callsys_stat(path, buf) { dentry = namei(path); if ( dentry == NULL ) return -ENOENT;

inode = dentry->d_inode; rc =inode->i_op->i_permission(inode); if ( rc ) return -EPERM; rc = inode->i_op->i_getattr(inode, buf); dput(dentry); return rc;}

Establish VFS data

Call into inode layer of filesystem

Call into inode layer of filesystem

Page 12: Vfs

P.J.Braam/CMU -- 12

sys_fstatfs(fd, buf) { /* for things like “df” */ file = fget(fd); if ( file == NULL ) return -EBADF; superb = file->f_dentry->d_inode->i_super; rc = superb->sb_op->sb_statfs(sb, buf); return rc;}

Call into superblock layer of filesystem

Translate fd to VFS data structure

Anatomy of fstatfs system call

Page 13: Vfs

P.J.Braam/CMU -- 13

Data structures

• VFS data structures for:

– VFS handle to the file: inode (BSD: vnode)

– User instantiated file handle: file (BSD: file)

– The whole filesystem: superblock (BSD: vfs)

– A name to inode translation: dentry

Page 14: Vfs

P.J.Braam/CMU -- 14

Shorthand method notation

• super block methods: sss_methodname

• inode methods: iii_methodname

• dentry methods: ddd_methodname

• file methods: fff_methodname

• instead of :

inode i_op lookup we write iii_lookup

Page 15: Vfs

P.J.Braam/CMU -- 15

namei

struct dentry *namei(parent, name) {

if (dentry = d_lookup(parent,name))

else

ddd_hash(parent, name)

ddd_revalidate(dentry)

iii_lookup(parent, name)

sss_read_inode(…)

struct inode *iget(ino, dev) {

/* try cache else .. */

}

VFS FS

Page 16: Vfs

P.J.Braam/CMU -- 16

Superblocks

• Handle metadata only (attributes etc)• Responsible for retrieving and storing

metadata from the FS media or peers• Struct superblocks hold things like:

– device, blocksize, dirty flags, list of dirty inodes– super operations– wait queue– pointer to the root inode of this FS

Page 17: Vfs

P.J.Braam/CMU -- 17

Super Operations (sss_)

• Ops on Inodes:– read_inode– put_inode– write_inode– delete_inode– clear_inode– notify_change

• Superblock manips:– read_super (mount)– put_super (unmount) – write_super (unmount)– statfs (attributes)

Page 18: Vfs

P.J.Braam/CMU -- 18

Inodes

• Inodes are VFS abstraction for the file• Inode has operations (iii_methods)• VFS maintains an inode cache, NOT the

individual FS’s (compare NT, BSD etc)• Inodes contain an FS specific area where:

– ext2 stores disk block numbers etc– AFS would store the FID

• Extraordinary inode ops are good for dealing with stale NFS file handles etc.

Page 19: Vfs

P.J.Braam/CMU -- 19

What’s inside an inode - 1

list_head i_hashlist_head i_listlist_head i_dentryint i_count

long i_inoint i_dev

{m,a,c}time{u,g}idmodesizen_link

caching

Identifies file

Usual stuff

Page 20: Vfs

P.J.Braam/CMU -- 20

What’s inside an inode -2

superblock i_sbinode_ops i_op

wait objects, semaphorelockvm_area_structpipe/socket info

page information

union { ext2fs_inode_info i_ext2 nfs_inode_info i_nfs coda_inode_info i_coda..} u

Which FS

For mmap,networking

waiting

FS Specificinfo:

blockno’sfids etc

Page 21: Vfs

P.J.Braam/CMU -- 21

Inode state• Inode can be on one or two lists:

– (hash & in_use) or (hash & dirty ) or unused– inode has a use count i_count

• Transitions – unused hash: iget calls sss_read_inode

– dirty in_use: sss_write_inode

– hash unused: call on sss_clear_inode, but if

i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0

Page 22: Vfs

P.J.Braam/CMU -- 22

Dirty inodes

Inode_hashtable

1. iget: if i_count>0 ++2. iput: if i_count>1 - -

sss_write_inode(sync one)

Fs storage

Used inodes

Unused inodes

Fs storage

sss_read_inode(iget)

sss_clear_inode(freeing inos)orsss_delete_inode(iput)

media fs only

(mark_inode_dirty)

3. free_inodes4. syncing inodes

Players:

Fs storage

Inode Cache

Page 23: Vfs

P.J.Braam/CMU -- 23

Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in 1998.

Estimates of installed servers (InfoWorld):- Linux: 7 million- OS/2: 5 million- Macintosh: 1 million

Sales

Page 24: Vfs

P.J.Braam/CMU -- 24

Inode operations (iii_)• lookup: return inode

– calls iget• creation/removal

– create– link– unlink– symlink– mkdir– rmdir– mknod– rename

• symbolic links– readlink– follow link

• pages– readpage, writepage,

updatepage - read or write page. Generic for mediafs.

– bmap - return disk block number of logical block

• special operations– revalidate - see dentry sect– truncate– permission

Page 25: Vfs

P.J.Braam/CMU -- 25

Dentry world

• Dentry is a name to inode translation structure

• Cached agressively by VFS

• Eliminates lookups by FS & private caches– timing on Coda FS: ls -lR 1000 files after priming cache

• linux 2.0.32: 7.2secs

• linux 2.1.92: 0.6secs

– disk fs: less benefit, NFS even more

• Negative entries!

• Namei is dramatically simplified

Page 26: Vfs

P.J.Braam/CMU -- 26

Inside dentry’s

• name

• pointer to inode

• pointer to parent dentry

• list head of children

• chains for lots of lists

• use count

Page 27: Vfs

P.J.Braam/CMU -- 27

Dentry associated lists

d_alias chainsplace: d_instantiateremove: dentry_iput

inode I_dentry list head

d_child chainsplace: d_allocremove: d_prune, d_invalidate, d_put

inode i_dentry list head

= d_inode pointer = d_parent pointer

dentry inode relationship dentry tree relationship

Legend: inode dentry

Page 28: Vfs

P.J.Braam/CMU -- 28

Dcachedentry_hashtable (d_hash chains)

unused dentries (d_lru chains)

namei iii_lookup d_add

pruned_invalidate d_drop

• namei tries cache: d_lookup– ddd_compare

• Success: ddd_revalidate– d_invalidate if fails– proceed if success

• Failure: iii_lookup– find inode– iget

• sss_read_inode– finish:

• d_add– can give negative entry

in dcache

dhash(parent, name) list head

Page 29: Vfs

P.J.Braam/CMU -- 29

Dentry methods

• ddd_revalidate: can force new lookup

• ddd_hash: compute hash value of name

• ddd_compare: are names equal?

• ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity

Page 30: Vfs

P.J.Braam/CMU -- 30

Dentry particulars:

• ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat:– case insensitive– long and short filename pleasantries

• ddd_revalidate -- can force new lookup if inode not in use:– used for NFS/SMBfs aging– used for Coda/AFS callbacks

Page 31: Vfs

P.J.Braam/CMU -- 31

Dijkstra probably hates me

Linus Torvalds

Style

Page 32: Vfs

P.J.Braam/CMU -- 32

Memory mapping

• vm_area structure has – vm_operations– inode, addresses etc.

• vm_operations– map, unmap– swapin, swapout– nopage -- read when page isn’t in VM

• mmap– calls on iii_readpage– keeps a use count on the inode until unmap