File Systems (1)
Dec 26, 2015
File Systems (1)
Readings
Silbershatz et al: 10.1,10.2, 11.1-11.5
Files Named collection of related information
recorded on secondary storage Logical unit of storage on a device e.g., helloworld.c, resume.doc
Can contain programs (source, binary) or data
Files have attributes: Name, type, location, size, protection,
creation time etc
File Naming
Files are named Even though files are just a sequence of
bytes, programs can impose structure on them Files with a certain standard structure imposed
can be identified using an extension to their name
Application programs may look for specific file extensions to indicate the file’s type
But as far as the operating system is concerned its just a sequence of bytes
File Types – Name, Extension
File Attributes
Possible file attributes
Basic File Operations
Create Write Read Delete Others
A reposition within the file, append, rename
Basic File Operations
For write/read operations, the operating system needs to keep a file position pointer for each process
All these operations require that the directory structure be first searched for the target system
Directories
File systems use directories to keep track of files
Directory operations File search File creation File deletion Directory listing File renaming File system traversal
Directories are just another type of file.
Logical to Physical View of Files
We just briefly described the logical (user) view of files and directories
Files are stored on secondary storage. The file system is the mapping from the
logical view to the files on secondary storage
File systems require their own algorithms and data structures to support the mapping
File System
There are many file systems Unix uses the UNIX file system (UFS) Windows uses FAT, FAT32 and NTFS
(Windows NT File system) Linux file system is known as the
extended file system with versions denoted by ext2 and ext3
Google created its own file system to meet its needs.
File System Structures
File System Overview
File system requires several on-disk and in-memory structures for its implementation
The structures vary depending on the OS and the file system but there appears to be similar general principles
File Control Block
Information about a file may be maintained in File control block (FCB):
One FCB per file A FCB is associated with a unique
identifier Consists of details about a file
A Typical File Control Block
Directory
A directory structure is associated with each file system
This is used to organize files
Directory Implementation Linear list of file names with pointer to
the data blocks. simple to program time-consuming to execute
Hash Table – linear list with hash data structure. decreases directory search time collisions – situations where two file names
hash to the same location fixed size
Memory Structures Directory-Structure Cache: This holds
directory information about recently accessed directories
System-wide open file table: Contains a copy of the FCB of each open file
Per-process open-file table: Contains a pointer to the appropriate entry in the system-wide open-file table
Buffers for holding file-system blocks when they are being read from disk or written to disk
File Creation
To create a new file, an application uses the file system
Creation requires the allocation of a new File Control Block (FCB)
The system reads the appropriate directory into memory and updates it with the new file name.
In-Memory File System Structures
The figure is not just for the read file operation but also for other operations e.g., open, write
File Opening What happens when an open() is called
in an application program?
The OS searches the system-wide open-file table to see if the process is in use by another process
File Opening If the file is in use by another process?
A per-process open-file table entry is created pointing to the existing system-wide open-file table entry
If the file is not in use by another process? Search the directory structure for the given
file name When found the FCB is copied into a
system-wide open-table A per-process open-file table entry is
created pointing to the existing system-wide open-file table entry
File Opening
The open() call returns a pointer (file descriptor) to the appropriate entry in the per-process file-system table
All file operations are performed via this pointer
The entry in the process file-system table may consist of additional information e.g., Read, write information
File Opening
When a process closes the file, the per-process table entry is removed
A count associated with the system-wide entry’s open count is decremented
In some operating systems, the same structures are used for network programming Unix: Sockets
Now let us look a file-system layout
File System Layout File systems usually are stored on disks Most disks can be divided up into
partitions Independent file systems (for different
operating systems) on each partition Sector 0 of the disk is the Master Boot
Record (MBR) The end of the MBR contains the
partition table Gives the start and end of each partition One of the partitions is marked as active
File System Layout
The MBR program reads in and executes the code in the MBR
This first thing that is determined is the active partition The first block of the active partition is read
in (boot block) This program loads the OS contained in that
partition
File System Layout
Unix File System
The Unix Superblock has the root directory
Allocation Methods
Allocation Methods
Many files are stored on the same disk The main problem is how to allocate
space so that Disk space is used effectively and Files can be access quickly
Allocation Methods
Files are sequences of bytes Granularity of file I/O is bytes
Disks are arrays of sectors (512 bytes) Granularity of disk I/O is sectors File data must be stored in sectors
Allocation Methods
File systems may also define a block size Block size usually consists of a number of
sectors Contiguous sectors are allocated to a block
File systems view the disk as an array of blocks Must allocate blocks to file Must manage free space on disk
Allocation Methods
Approaches to allocating blocks to a file Contiguous Allocation Linked List Allocation Linked List Allocation using Index Index Allocation
Contiguous Allocation
Start and length are stored in theFCB
Contiguous Allocation
Store each file on a contiguous set of disk blocks
Assume a disk of 1-KB blocks 50 KB file is allocated 50 consecutive blocks
The file’s FCB consists of Disk address of the first block Number of blocks in the file
The directory entry often includes file name and the disk address of the first block
Contiguous Allocation
Sequential file access File system remembers the disk address of
the last block reference When necessary the next block is read
Direct access For direct access to block i of a file that
starts with block b, access block b+i
Contiguous Allocation
Advantages: Easy to implement Read performance is excellent
Contiguous Allocation
(a) Contiguous allocation of disk space for 7 files(b) State of the disk after files D and E have been
removed
Contiguous Allocation
Disadvantages Fragmentation Will need periodic compaction (time-
consuming) Will need to manage free lists If new file is put at end of disk
• No problem If new file is put into a “hole”...
• Have to know a file’s maximum possible size ... at the time it is created!
Contiguous allocation
Good for CD-ROMs, DVDs All file sizes are known in advance Files are never deleted
Linked List Allocation
Linked List Allocation
With linked allocation, each file is a linked list of disk blocks
The disk blocks may be scattered anywhere on the disk
The FCB contains pointers to the first and last blocks of the file
Each block contains a pointer to the next block
Linked List Allocation
Advantage No fragmentation (except internal
fragmentation)
Disadvantage Random access is slow
• To get to block n, the operating system has to start at the beginning and read the n-1 blocks prior to it
Linked List Allocation Using Index
Take the pointer word from each disk block and put it in a table in memory
The chains are terminated with a special marker (-1).
The table in main memory is called a FAT (File Allocation Table)
Linked List Allocation Using Index
File-Allocation Table
Linked List Allocation using Index
Linked list allocation using a file allocation table in RAM
Fast Random Access
Linked List Allocation using Index
Linked List Allocation using Index
Linked List Allocation using Index
Linked List Allocation using Index
Linked List Allocation using Index
Linked List Allocation using Index
Linked List Allocation using Index
Linked List Allocation using Index
Linked List Allocation Using Index
Advantage Chain must still be followed to find a given
offset within the file, but the chain is entirely in memory• No disk references are needed
Disadvantage Entire table must be in memory all the time What if you have a 20-GB disk and a 1-KB block size?
• The table needs 20 million entries, one for each of the 20 million disk blocks
• Each entry is a minimum of 3 bytes which means 60 MB of memory
This technique was used in MS-DOS and Windows-98
Indexed Allocation
Linked allocation does not support efficient direct access Pointers to the blocks are scattered with the
blocks themselves Indexed allocation addresses this
problem with bringing all pointers together into an index block
Indexed Allocation
Each file has its own index block An index block is an array of disk-block
addresses The ith entry in the index block points to
the ith block of the file The directory structure contains the
addresses of index blocks
Example of Indexed Allocation
Index Allocation
Every file must have an index block Index block should be as small as possible Too small may not be able to hold enough
pointers for a large file Too large Waste space
Index Allocation Mechanisms
Linked List Link several index blocks An index block might contain:
• A small header giving the name• Set of the first 100 disk-block addresses• The next address is nil or a pointer to another
index block
Index Allocation Mechanisms
Multilevel index Use a first-level index block to point to a set
of second-level of index blocks To access a block, the OS uses the first-level
index to find a second-level index block The second-level index block is used to find
the desired data block
Index Allocation Mechanisms
Combined First index block
• Contains N pointers to direct blocks – contain addresses of disk blocks that contain file data
– Good for small files• Contains 3 pointers to indirect blocks
– Single indirect block: index block with addresses of blocks with data
– Double indirect block: Contains the address of a block that contains the addresses of blocks that contain pointers to the actual data blocks
– Triple indirect block
Combined Scheme: UNIX (4K bytes per block)
FCB – in Unix this is called i-Node
Index Allocation Mechanisms
Combined First index block
• Contains N pointers to direct blocks – contain addresses of disk blocks that contain file data
– Good for small files• Contains 3 pointers to indirect blocks
– Single indirect block: index block with addresses of blocks with data
– Double indirect block: Contains the address of a block that contains the addresses of blocks that contain pointers to the actual data blocks
– Triple indirect block
Index Allocation
What happens when a file needs more blocks? Reserve the last disk address for the
address of a block containing more disk block addresses
I-node
An I-node is Unix’s FCB
The UNIX I-node entries
Structure of an I-Node: Attributes and addresses of disk blocks
Indexed Allocation
Advantage Only the index block needs to be in memory
when the corresponding file is open Disadvantage
Updating the structure is more complex
Entry Lookup
Opening a file requires that the OS needs the pathname
The OS uses the pathname supplied by the user to locate the directory entry
How can we locate the root directory (the start of all paths)?
Entry Lookup
In Unix systems The superblock (among other things) has
the location of the i-node which represents the root directory
Once the root directory is located a search through the directory tree finds the desired directory entry
The directory entry provides the information needed to find the disk blocks for the requested file
Entry Lookup
This discussion focuses on Unix-related file systems
When a file is opened, the file system must take the file name supplied and locate its disk blocks
Let’s see how this is done for the path name /usr/ast/mbox
Looking up for an entry
The steps in looking up /usr/ast/mbox
Entry Lookup First the system locates the root
directory There is information for each file and
directory within the root directory A directory entry consists of the file name
and i-node number The file system looks up the first
component of the path, usr, to find the i-node for /usr which is i-node 6
From this i-node the system locates the directory for /usr/ which is in block 132
Looking up for an entry
The steps in looking up /usr/ast/mbox
Entry Lookup
The system then searches for ast within the /usr directory which is block 132
The entry gives the i-node for /usr/ast/ From this i-node the system can find the
directory itself and lookup mbox The i-node for this file is then read into
memory and kept there until the file is closed
Free Space Management
Free Space Management
Limited disk space Need to reuse the space from deleted files
Bitmap One bit for each block on the disk If the block is free the bit is 1 else the bit is
0 Makes it easy to find a contiguous group of
free blocks Large disks make for large bitmaps that
must be kept in memory Used in Apple machines
Free Space Management
Linked List Link together all the free disk blocks Keep a pointer to the first free block in a
special location on the disk and cache it in the memory
Not efficient since traversal means each block should be read; this requires substantial I/O
However, traversal is not done that often Optimization: Stores the addresses of N free
blocks in the first free block;
Free Space Management
Indexing Treat free blocks as a file This allows for indexing
Use a combination
Summary
We have examined how files are implemented and how one particular implementation is used to support some of the file operations