Top Banner
More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006
36

More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 1CS502 Spring 2006

More on Disks and File Systems

CS-502 Operating Systems

Spring 2006

Page 2: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 2CS502 Spring 2006

Review – Disks

• Implementer of File abstraction• Storage of large amounts of data for very long

times• Persistence, reliability

• Controlled like I/O devices, but integral part of information storage subsystem

• Rapidly increasing capacities, dropping prices• $0.5–$6.0 per gigabyte

• Slowly improving transfer rates, seek performance• Only a factor of 5-10 in three decades!

Page 3: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 3CS502 Spring 2006

Review – Disks (continued)

• Organized into cylinders, tracks, sectors• Random access

• Any sector can be read & written independently of any other• Very high bandwidth for consecutive reads or writes• Seek time is (often) dominating factor in performance

• Bad blocks are a fact of life• Most detected during formatting• Some occur during operation• Controller or OS must step around them

• Seek optimization algorithms• Popular study topics, less popular in real systems• Long seek queues system is out of balance

Page 4: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 4CS502 Spring 2006

Review – File Systems

• Fundamental abstraction for persistent storage• Usually organized as linear array of bytes

• Any sequence of bytes may be read or overwritten

• Extreme performance demands• Many small files vs. a few humongous files

• Fundamental ambiguity• Is file the “information” or the “container”

• OS sees the container; users focus on information

• Many attributes• Stored in file metadata associated with file

Page 5: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 5CS502 Spring 2006

Review – File Systems (continued)

• Operations• Open, Close; Read, Write, Truncate; Seek, Tell

• Create; Destroy

• Access methods• Sequential

• Random

• Indexed (not used very much any more)

• Structure imposed by applications• Databases, libraries, executable images

Page 6: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 6CS502 Spring 2006

Review – Directories

• Special kind of file• Tool for users to organize files

• Tool for system to find file containers

• Organization• Single level, two level, hierarchical

• Directory operations• Create, Destroy; Add entry, Remove entry

• Find, List, Rename; Link, Unlink

• Links• Soft (symbolic) links in Unix, Windows

• Hard links in Unix (reference counted in metadata)

Page 7: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 7CS502 Spring 2006

Review – File System Implementation

• Contiguous (with optional extents)• Very efficient for large files (e.g., databases)• Prone to space fragmentation for many small files• Bad blocks must be concealed by OS or controller

• Linked• No space fragmentation; lots of seek fragmentation• Sequential access only • FAT (File Allocation Table) pseudo-random

• Indexed• i-node (index block) points to every block of file• Fast random access• Scales easily from small to large• No space fragmentation; lots of seek fragmentation

• Defragmentation• Remapping linked, FAT, or indexed files to minimize seek time

Page 8: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 8CS502 Spring 2006

Additional Topics

• Implementation of Directories

• CD-ROM devices and file systems

• RAID – Redundant Array of Inexpensive Disks

• Stable Storage

• Log Structured File Systems

Page 9: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 9CS502 Spring 2006

Implementation of Directories

• A list of [name, information] pairs• Must be scalable from very few entries to very many

• Name:• User-friendly, variable length, any language• Fast access by name

• Information:• File metadata (itself)• Pointer to file metadata block (or i-node) on disk• Pointer to first & last blocks of file• Pointer to extent block(s)• …

Page 10: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 10CS502 Spring 2006

Very Simple Directory

• Short, fixed length names• Attribute & disk addresses contained in directory• MS-DOS, etc.

name1 attributes

name2 attributes

name3 attributes

name4 attributes

… …

Page 11: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 11CS502 Spring 2006

Simple Directory

• Short, fixed length names• Attributes in separate blocks (e.g., i-nodes)

• Attribute pointers are disk addresses (or i-node numbers)

• Older Unix versions

name1

name2

name3

name4

i-node

i-node

i-node

i-node

Data structurescontaining attributes

Page 12: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 12CS502 Spring 2006

More Interesting Directory

• Variable length file names– Stored in heap at end

• Modern Unix, Windows• Linear or logarithmic

search for name• Compaction needed after

– Deletion, Rename

attributes

attributes

attributes

attributes

… …

name1 longer_name3 very_long_name4 name2 …

Page 13: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 13CS502 Spring 2006

Very Large Directories

• Hash-table implementation

• Each hash chain like a small directory with variable-length names

• Must be sorted for listing

Page 14: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 14CS502 Spring 2006

File System Implementation – Free Space Management

• Bitmap– Very compact on disk– Expensive to search

• Free list– Linked list of free blocks– Only head of list needs to be cached in memory– Larger than bitmap:–

• Consumes 1/n of free space• List grows and shrinks inversely with allocating or freeing

blocks

– Very fast to search and allocate

Page 15: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 15CS502 Spring 2006

CD-ROMs

• See Tanenbaum, pp. 306-310• Audio CD

– Molded polycarbonate– 120 mm diameter with 15 mm hole– One single spiral track

• Starts in center, spirals outward• 22,188 revolutions, approx 5.6 kilometers long

– Constant linear velocity under read head• Audio playback:– 120 cm/sec• Variable speed motor:– 200 – 530 rpm

• ISO standard IS 10149, aka the Red Book

Page 16: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 16CS502 Spring 2006

CD-ROM (continued)

• Problem for adapting to data usage– No bad block recovery capability!

• ISO standard for data: Yellow Book– Three levels of error-correcting schemes: – Symbol, Frame, Sector

• ~7200 bytes to record 2048 byte payload per sector

– Mode 2: less error correction in exchange for more data rate • Audio and video data

– Sectors linearly numbered from center to edge

• Read speed– 1x ~ 153,000 bytes/sec– 40x ~ 5.9 megabytes/sec

• ISO standard for multi-media: Green Book– Interleaved audio, video, data in same sector

Page 17: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 17CS502 Spring 2006

CD-ROM File SystemISO 9660 — High Sierra

• Write once contiguous file allocation• Variable length directories• Variable length directory entries

• Points to first sector of file• File size and metadata stored in directory entry• Variable length names

• Several extensions to standard for additional features

Page 18: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 18CS502 Spring 2006

Break

Page 19: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 19CS502 Spring 2006

Problem

• Question:–– If mean time to failure of a disk drive is 100,000 hours,– and if your system has 100 identical disks,– what is mean time between drive replacement?

• Answer:–– 1000 hours (i.e., 41.67 days 6 weeks)

• I.e.:–– You lose 1% of your data every 6 weeks!

• But don’t worry – you can restore most of it from backup!

Page 20: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 20CS502 Spring 2006

Can we do better?

• Yes, mirrored– Write every block twice, on two separate disks– Mean time between simultaneous failure of

both disks is 57,000 years

• Can we do even better?– E.g., use fewer extra disks?– E.g., get more performance?

Page 21: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 21CS502 Spring 2006

RAID –Redundant Array of Inexpensive Disks

• Distribute a file system intelligently across multiple disks to– Maintain high reliability and availability– Enable fast recovery from failure– Increase performance

Page 22: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 22CS502 Spring 2006

“Levels” of RAID

• Level 0 – non-redundant striping of blocks across disk

• Level 1 – simple mirroring

• Level 2 – striping of bytes or bits with ECC

• Level 3 – Level 2 with parity, not ECC

• Level 4 – Level 0 with parity block

• Level 5 – Level 4 with distributed parity blocks

Page 23: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 23CS502 Spring 2006

RAID Level 0 – Simple Striping

• Each stripe is one or a group of contiguous blocks• Block/group i is on disk (i mod n)• Advantage

– Read/write n blocks in parallel; n times bandwidth

• Disadvantage– No redundancy at all. System MBTF is 1/n disk MBTF!

stripe 8stripe 4stripe 0

stripe 9stripe 5stripe 1

stripe 10stripe 6stripe 2

stripe 11stripe 7stripe 3

Page 24: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 24CS502 Spring 2006

RAID Level 1– Striping and Mirroring

• Each stripe is written twice• Two separate, identical disks

• Block/group i is on disks (i mod 2n) & (i+n mod 2n)• Advantages

– Read/write n blocks in parallel; n times bandwidth– Redundancy: System MBTF = (Disk MBTF)2 at twice the cost– Failed disk can be replaced by copying

• Disadvantage– A lot of extra disks for much more reliability than we need

stripe 8stripe 4stripe 0

stripe 9stripe 5stripe 1

stripe 10stripe 6stripe 2

stripe 11stripe 7stripe 3

stripe 8stripe 4stripe 0

stripe 9stripe 5stripe 1

stripe 10stripe 6stripe 2

stripe 11stripe 7stripe 3

Page 25: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 25CS502 Spring 2006

RAID Levels 2 & 3

• Bit- or byte-level striping

• Requires synchronized disks• Highly impractical

• Requires fancy electronics • For ECC calculations

• Not used; academic interest only

• See Silbershatz, §12.7.3 (pp. 471-472)

Page 26: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 26CS502 Spring 2006

Observation

• When a disk or stripe is read incorrectly,

we know which one failed!

• Conclusion:– A simple parity disk can provide very high

reliability• (unlike simple parity in memory)

Page 27: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 27CS502 Spring 2006

RAID Level 4 – Parity Disk

• parity 0-3 = stripe 0 xor stripe 1 xor stripe 2 xor stripe 3• n stripes plus parity are written/read in parallel• If any disk/stripe fails, it can be reconstructed from others

– E.g., stripe 1 = stripe 0 xor stripe 2 xor stripe 3 xor parity 0-3

• Advantages– n times read/write bandwidth– System MBTF = (Disk MBTF)2 at 1/n additional cost– Failed disk can be reconstructed “on-the-fly” (hot swap)– Hot expansion: simply add n + 1 disks all initialized to zeros

stripe 8stripe 4stripe 0

stripe 9stripe 5stripe 1

stripe 10stripe 6stripe 2

stripe 11stripe 7stripe 3

parity 8-11parity 4-7parity 0-3

Page 28: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 28CS502 Spring 2006

RAID Level 5 – Distributed Parity

• Parity calculation is same as RAID Level 4• Advantages & Disadvantages

– Same as RAID Level 4• Additional advantage: avoids beating up on parity disk

• Writing individual stripes (RAID 4 & 5)– Read existing stripe and existing parity– Recompute parity– Write new stripe and new parity

stripe 12stripe 8stripe 4stripe 0

parity 12-15stripe 9stripe 5stripe 1

stripe 13parity 8-11stripe 6stripe 2

stripe 14stripe 10parity 4-7stripe 3

stripe 15stripe 11stripe 7parity 0-3

Page 29: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 29CS502 Spring 2006

New Topic

• Problem – how to protect against disk write operations that don’t complete– Power or CPU failure in the middle of a block– Related series of writes interrupted in middle

• Examples:– Database update of charge and credit– RAID 1, 4, 5 failure between redundant writes

Page 30: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 30CS502 Spring 2006

Solution (part 1) – Stable Storage

• Write everything twice (separate disks)• Be sure 1st write does not invalidate previous 2nd

copy

• RAID 1 is okay; RAID 4/5 not okay!

• Read blocks back to validate; then report completion

• Reading both copies• If 1st copy okay, use it – i.e., newest value

• If 2nd copy different, update it with 1st copy

• If 1st copy error; use 2nd copy – i.e., old value

Page 31: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 31CS502 Spring 2006

Stable Storage (continued)

• Crash recovery• Scan disks, compare corresponding blocks• If one is bad, replace with good one• If both good but different, replace 2nd with 1st copy

• Result:–• If 1st block is good, it contains latest value• If not, 2nd block still contains previous value

• An abstraction of an atomic disk write of a single block

• Uninterruptible by power failure, etc.

Page 32: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 32CS502 Spring 2006

What about more complex disk operations?

• E.g., File create operation involves• Allocating free blocks

• Constructing and writing i-node– Possibly multiple i-node blocks

• Reading and updating directory

• What if system crashes with the sequence only partly completed?

• Answer: inconsistent data structures on disk

Page 33: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 33CS502 Spring 2006

Solution (Part 2) –Log-Structured File System

• Make changes to cached copies in memory• Collect together all changed blocks• Write to log file

• A circular buffer on disk

• Fast, contiguous write

• Update log file pointer in stable storage

• Offline: Play back log file to actually update directories, i-nodes, free list, etc.

• Update playback pointer in stable storage

Page 34: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 34CS502 Spring 2006

Transaction Data Base Systems

• Similar techniques– Every transaction is recorded in log before

recording on disk– Stable storage techniques for managing log

pointers– One log exist is confirmed, disk can be updated

in place– After crash, replay log to redo disk operations

Page 35: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 35CS502 Spring 2006

Unix LFS

• Tanenbaum, §6.3.8, pp. 428-430• Everything is written to log

• i-nodes point to updated blocks in log

• i-node cache in memory updated whenever i-node is written

• Cleaner daemon follows behind to compact log

• Advantages:– LFS is always consistent

– LFS performance• Much better than Unix FS for small writes

• At least as good for reads and large writes

Page 36: More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 36CS502 Spring 2006

Break