Top Banner
FILE SYSTEM TOPICS Lei Xu
31

FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Mar 28, 2015

Download

Documents

Ashton Kearney
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

FILE SYSTEM TOPICS

Lei Xu

Page 2: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Agenda

Introduction VFS Optimizations Examples F&Q

Page 3: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Introduction

“A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device(s) which contain it.” – from Wikipedia Store data Organize data Access data Manage storage resources (e.g. hard drive)

Page 4: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Relationship to Architecture Course

Acknowledge to the slides from 830 course

Page 5: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Relationship to Architecture Course

File system is designed between memory and secondary storage (or remote servers) One of the most complex part in an

operating system Main R&D focuses:

Performance: throughput, latency, scalability Reliability and availability Management: snapshot and etc.

Acknowledge to the slides from 830 course

Page 6: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Different types of file systems

Local file systems Stored data on local hard drives, SSDs, floppy

drives, optical disks or etc. Examples: NTFS, EXT4, HFS+, ZFS

Network/distributed file systems Stored data on remote file server(s) Example: NFS, CIFS/Samba, AFP, Hadoop DFS, Ceph

Pseudo file systems Example: procfs, devfs, tmpfs

“List of file systems” http://en.wikipedia.org/wiki/List_of_file_systems

Page 7: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Agenda

Introduction VFS Optimizations Examples F&Q

Page 8: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Overall Architecture of Linux file system components

Acknowledgement: “Anatomy of the Linux file system”, IBM developerWorks.

Page 9: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Virtual File System (VFS)

VFS is the essential concept in UNIX-like FS Specify an interface between the kernel and a

concrete file system Introduced by SUN in 1985

Pass system calls to the underlying file systems E.g. pass sys_write() to Ext4 (i.e. ext4_write())

Three major metadata in VFS Metadata: the data about data (wikipedia) Super block, dentry and inode OO design

Each component defines a set of data members and the functions to access them

Page 10: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Super block

A segment of metadata that describes a file system Is constructed when mount a file system Usually, a persistent copy of super block is

stored in the beginning of a storage device Describes:

File system type, size, status (e.g. dirty bit, read only bit)

Block size, max file bytes, device size.. How to find other metadata and data. How to manipulates these data (i.e. sb_ops)

Page 11: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Inode

“Index-node” in Unix-style file system All information about one file (or directory)

Except its name In UNIX-like system, file names are stored in the

directory file: the content of it is an “array” of file names

E.g. owner, access rights, mode, size, time and etc.

Pointers to data

Page 12: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Directory Entry (dentry)

Dentry conceptually points a file name to its corresponding Inode Each file/directory has a dentry presenting

it File systems use dentry to lookup a file in

the hierarchical namespace Each dentry has a pointer to the dentry of its

parent directory Each dentry of a directory has a list of dentries

of its sub-directories and sub-files

Page 13: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Agenda

Introduction VFS Optimizations Examples F&Q

Page 14: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Optimizations

Most of file system optimizations are designed based on the characteristics of the memory hierarchy and storage devices. Recall:

RAM 50-100 ns Disks: 5-10 ms 2-3 orders of magnitude difference Almost all widely used local file systems are

designed for hard disk drives, which have their unique characteristics

Page 15: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Hard Disk Drive (HDD)

Stores data on one or more rotating disks, coated with magnetic material Introduce by IBM

in 1956 Use magnetic

head to read data

Page 16: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

The very early HDD…..

Acknowledge to:

Page 17: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

HDD (Cont’d)

The essential structure of HDD has not changed too much… Constitute with several

disks Each disk is divided to

tracks, each of which then is divided to sectors

The single most significant factor: Seek time

Page 18: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Why seek time matters

When access a data (sector), the HDD head must first move to the track (seek time), then rotates the disk to the sector (rotational time) Seek time: 3 ms on high-end server disks, 12

ms on desktop-level disks [1] Rotational time: 5.56ms on 5400 RPM HDD,

4.17ms on 7200 RPM HDD [1] As a result, sequential IO is much faster than

random IO, because there is no seek /rotational time[1], http://en.wikipedia.org/wiki/Disk-drive_performance_characteristics

Page 19: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

General Optimizations

Based on two principles: RAM access is much faster than the access on

disk Sequential IOs is much faster than random IOs

on disk So we design file systems that

Largely utilizes CPU/RAM to reduce IO to disks (various caches/write buffers)

Prefers sequential IOs Computes disk layout to arrange related data

sequentially located on disks

Page 20: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Dcache

Dentry cache (dcache) Directories are stored as files on disks. For each file lookup, we want obtain the inode from

the given full file path OS looks the dentries from the root to all parent directories

in the path. E.g. for looking up file “/Users/john/Documents/course.pdf”, OS

needs traverse the dentries that presents “/”, “Users”, “john”, “Documents”, and “course.pdf”

To accelerate this: We use a global hash table (dcache) to map “file path” ->

dentry A two-list solution: one for active dentries, and one for

“recent unused dentries” (LRU).

Page 21: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Inode cache

Similar to the dcache, OS maintains a cache for inode objects. Each inode object

has 1-to-1 relation to a dentry

If the dentry object is evicted, this inode is evicted

Page 22: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Page Cache

…a “transparent” buffer for disk-backed pages kept in RAM for fast access… [wikipedia] A write-back cache Main purpose: reducing the # of IOs to disks Access based on page (usually 4KB).

Page cache is per-file based. A Redix-tree in inode object. Prefetch pages to serve future read Absorb writes to reduce # of IOs

The dirty pages (modified) are flushed to disks for : 1) each 30s or 5s, or 2) OS wants to reclaim RAMs Also can be forced to flush by calling “fsync()” system call

Page 23: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Agenda

Introduction VFS Optimizations Examples F&Q

Page 24: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Examples

Several concrete file system designs Ext4, classic UNIX-like file system concepts NTFS, advanced Windows file system ZFS, “the last word of file system” NFS, a standard network file system Google File System, a special distributed

file system for special requirements

Page 25: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Ext4

The latest version of the “extended file system” (Ext2/3/4) The standard Linux

file system for a long time

Inspired from UFS from BSD/Solaris

Group files to block groups Keep file data near to

inodesAck: http://bit.ly/tjipWY

Page 26: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

NTFS

“New Technology File System” (NTFS) The standard file

system in Windows world.

A Master File Table (MFT) contains all metadata. Directory is also a

file

Page 27: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

ZFS

ZFS: “the last word of file system” The most advanced local file system in

production 128 bits space (2128 bytes in theory)

larger the # of sand in the earth… A lot of advanced features:

E.g. transactional commits, end-to-end integration, snapshot, volume management and much more…

Will never lose data and always be consistent. Every OS community wants to clone or copy its

features… Btrfs on Linux, ReFS on Windows, ZFS on FreeBSD

Page 28: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

NFS

“Network File System (NFS)” A protocol

developed by SUN in 1984 A set of RPC calls

IETF standard Supported by all

major OSs Simple and

efficient

Page 29: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Google File System (GFS)

A large distributed file system specially designed for MapReduce framework High throughput High availability Special designed. Not

compatible to VFS/POSIX API. Requires clients linked to

the GFS library. Hadoop DFS clones the

concepts of GFS

Page 30: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

More File Systems

Interesting file systems that are worth to explore Btrfs (B-tree FS) from oracle, expected to be the

next standard Linux file system. Many concepts are shared with ZFS.

ReFS: The file system for Windows 8 (from Microsoft). Many concepts are shared with ZFS (too!).

WAFL (Write Anywhere File Layout) file system from NetApp.

FUSE (Filesystem in Userspace): a cross-platform library that allows developers to write file system running in user mode

Page 31: FILE SYSTEM TOPICS Lei Xu. Agenda Introduction VFS Optimizations Examples F&Q.

Thanks

FAQ?