Chapter 11: File System Implementationtawalbeh/nyit/csci620/slides/ch11.pdf · Chapter 11: File System Implementation ... File Concept The file system ... zFile-organization module
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Chapter 11: File System Chapter 11: File System ImplementationImplementation
Directory StructureDirectory StructureA collection of nodes containing information about all files in a file system (name. location, size)
A directory is a system file for maintaining the structure of the file system
F 1 F 2F 3
F 4
F n
Directory
Files
Both the directory structure and the files reside on disk (or a partition)A disk can contain multiple file systemsDirectories can be organized in several different ways.
File system structures that reside on disk generally include:
A boot control block (boot block, boot sector)
contains information needed to boot the OS. If no OS, this block is empty
Usually at the beginning of each partition (disk)
A volume control block contains volume (partition) information such as the number of blocks, size of blocks, free-block count and free-block pointers.
(UFS: superblock. NTFS: master file table.)
A directory structure per file system is used to organize files.
A file control block for each file contains details about the file, including permissions, ownership, size, and location of data blocks. (UFS: inode. NTFS: stored within the master file table.)
Linear list of file names with pointer to the data blocks.
simple to program
time-consuming to execute requires linear search
To create a new file, the directory must be searched first to make sure that no existing file has the same name.
Also the same for deleting a file.
Hash Table – linear list stores the directory entries but with hash data structure. (hash function takes filename as input and returns a pointer to the filename in the list)
decreases directory search time
collisions – situations where two file names hash to the same location
Another problem is fixed size of hash tables. If the directory size must increase, a new hash function is needed. (ex: mod 64 to mod 128)
Disk blocks must be allocated to files. The main problem is how to allocate space to files so that disk space is utilized effectively and files can be accessed quickly.
keeping track of which disk blocks go with which file
There are three major methods of allocating disk space to files:
Contiguous AllocationContiguous AllocationEach file occupies a set of contiguous blocks on the diskProvides efficient direct access.
Accessing block b+1 after block b requires no head movement, so the number of head seeks is minimal.
Simple addressing scheme: only starting location (block #) and length (number of blocks) are required
The directory entry for each file indicates the address of the starting block and the number of blocks required
Both sequential and direct access can be supportedA dynamic storage-allocation problem.
First fit and best fit are the most common used strategies.External fragmentation can be a problem.
Another problem is knowing how much space is needed for the file (file size) when it is createdFiles cannot grow, so if the allocated space is not enough any more:
Terminate the user program, with an appropriate error messageFind larger hole and copy the file into it.
Pre-allocation of space is generally required if the file size is known in advance, However, the file might reach its final size over long period (may be years) which can cause significant internal fragmentation.
RecoveryRecoveryCare must be taken to ensure that system failure does not result in data loss or data inconsistency.
Consistency checking – compares data in directory structure kept in memory with data blocks on disk, and tries to fix inconsistencies
performing a backup on an active file system might lead to inconsistency. (If files and directories are being added, deleted, and modified during the dumping process, the resulting backup may beinconsistent )
Example: fsck on Unix or chkdsk in DOS
Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical)
Full backup: should the entire file system be backed up or only part of it?
it is usually desirable to back up only specific directories andeverything in them rather than the entire file system.
Incremental backup: back up files that have not changed since the last backup
Recover lost file or disk by restoring data from backup
Log Structured File SystemsLog Structured File Systems
We have faster CPUs, faster and larger memories, larger but SLOW diskswrites are done in very small chunks very inefficient due to high seek time
Ex: creating a file requires the i-node for the directory, the directory block, the i-node for the file, and the file itself must all be written
The goal of LFS is to achieve the full bandwidth of the disk, even in the face of a workload consisting in large part of small random writes Log structured (or journaling) file systems record each small update to the file system as a transaction
Used in NTFS, optional in Solaris UNIXAll operations for a transactions are written to a log (the disk is viewed as a log)
A transaction is considered committed once all of its operations have been successfully written to the log. Then the system call can return to the userHowever, the file system may not yet be updated
The transactions in the log are asynchronously written to the file systemWhen the file system has been successfully updated, the transactions are
removed from the log.After a system crash, all remaining committed transactions in the log are performed. Uncommitted transactions are rolled back.
Summary: all writes are initially buffered in memory, and periodically all the buffered writes are written to the disk in a single segment, at the end of the log
RAID – multiple disk drives provides reliability via redundancy.distributes data across several physical disks which look to the operating system and the user like a single logical disk
Idea: Use many disks in parallel to increase storage bandwidth, improve reliability
Files are striped across disks
Each stripe portion is read/written in parallel
Bandwidth increases with more disks, but more error prone
RAID is arranged into six different levels. RAID 0 distributes data across several disks in a way which gives improved speed and full capacity, but all data on all disks will be lost if any one disk fails.
RAID 1 (mirrored disks) uses two (possibly more) disks which each store the same data, so that data is not lost so long as one disk survives
RAID 5 combines three or more disks in a way that protects data against loss of any one disk
RAID LevelsRAID LevelsRAID 0: Striped set without parity:
Provides improved performance and additional storage but no fault tolerance. Any disk failure destroys the array, which becomes more likely with more disks in the array
RAID1: Mirrored set without parity.Provides fault tolerance from disk errors and single disk failure
RAID2: Redundancy through Hamming codeRAID3: Striped with interleaved parity
Dedicated disk for parity writing bottleneck
RAID4: striped with Block level parity.Dedicated disk for parity writing bottleneck
RAID5: Striped with distributed parityEach disk have some blocks that hold the corresponding blocks parityUpon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user
RAID6: Striped set with dual parityeach group of blocks have two distributed parity blocks that are distributed across the disks