File Systems (1). Readings r Silbershatz et al: 10.1,10.2, 11.1-11.5.

File Systems (1)

Readings

Silbershatz et al: 10.1,10.2, 11.1-11.5

Files Named collection of related information

recorded on secondary storage Logical unit of storage on a device e.g., helloworld.c, resume.doc

Can contain programs (source, binary) or data

Files have attributes: Name, type, location, size, protection,

creation time etc

File Naming

Files are named Even though files are just a sequence of

bytes, programs can impose structure on them Files with a certain standard structure imposed

can be identified using an extension to their name

Application programs may look for specific file extensions to indicate the file’s type

But as far as the operating system is concerned its just a sequence of bytes

File Types – Name, Extension

File Attributes

Possible file attributes

Basic File Operations

Create Write Read Delete Others

A reposition within the file, append, rename

Basic File Operations

For write/read operations, the operating system needs to keep a file position pointer for each process

All these operations require that the directory structure be first searched for the target system

Directories

File systems use directories to keep track of files

Directory operations File search File creation File deletion Directory listing File renaming File system traversal

Directories are just another type of file.

Logical to Physical View of Files

We just briefly described the logical (user) view of files and directories

Files are stored on secondary storage. The file system is the mapping from the

logical view to the files on secondary storage

File systems require their own algorithms and data structures to support the mapping

File System

There are many file systems Unix uses the UNIX file system (UFS) Windows uses FAT, FAT32 and NTFS

(Windows NT File system) Linux file system is known as the

extended file system with versions denoted by ext2 and ext3

Google created its own file system to meet its needs.

File System Structures

File System Overview

File system requires several on-disk and in-memory structures for its implementation

The structures vary depending on the OS and the file system but there appears to be similar general principles

File Control Block

Information about a file may be maintained in File control block (FCB):

One FCB per file A FCB is associated with a unique

identifier Consists of details about a file

A Typical File Control Block

Directory

A directory structure is associated with each file system

This is used to organize files

Directory Implementation Linear list of file names with pointer to

the data blocks. simple to program time-consuming to execute

Hash Table – linear list with hash data structure. decreases directory search time collisions – situations where two file names

hash to the same location fixed size

Memory Structures Directory-Structure Cache: This holds

directory information about recently accessed directories

System-wide open file table: Contains a copy of the FCB of each open file

Per-process open-file table: Contains a pointer to the appropriate entry in the system-wide open-file table

Buffers for holding file-system blocks when they are being read from disk or written to disk

File Creation

To create a new file, an application uses the file system

Creation requires the allocation of a new File Control Block (FCB)

The system reads the appropriate directory into memory and updates it with the new file name.

In-Memory File System Structures

The figure is not just for the read file operation but also for other operations e.g., open, write

File Opening What happens when an open() is called

in an application program?

The OS searches the system-wide open-file table to see if the process is in use by another process

File Opening If the file is in use by another process?

A per-process open-file table entry is created pointing to the existing system-wide open-file table entry

If the file is not in use by another process? Search the directory structure for the given

file name When found the FCB is copied into a

system-wide open-table A per-process open-file table entry is

created pointing to the existing system-wide open-file table entry

File Opening

The open() call returns a pointer (file descriptor) to the appropriate entry in the per-process file-system table

All file operations are performed via this pointer

The entry in the process file-system table may consist of additional information e.g., Read, write information

File Opening

When a process closes the file, the per-process table entry is removed

A count associated with the system-wide entry’s open count is decremented

In some operating systems, the same structures are used for network programming Unix: Sockets

Now let us look a file-system layout

File System Layout File systems usually are stored on disks Most disks can be divided up into

partitions Independent file systems (for different

operating systems) on each partition Sector 0 of the disk is the Master Boot

Record (MBR) The end of the MBR contains the

partition table Gives the start and end of each partition One of the partitions is marked as active

File System Layout

The MBR program reads in and executes the code in the MBR

This first thing that is determined is the active partition The first block of the active partition is read

in (boot block) This program loads the OS contained in that

partition

File System Layout

Unix File System

The Unix Superblock has the root directory

Allocation Methods

Allocation Methods

Many files are stored on the same disk The main problem is how to allocate

space so that Disk space is used effectively and Files can be access quickly

Allocation Methods

Files are sequences of bytes Granularity of file I/O is bytes

Disks are arrays of sectors (512 bytes) Granularity of disk I/O is sectors File data must be stored in sectors

Allocation Methods

File systems may also define a block size Block size usually consists of a number of

sectors Contiguous sectors are allocated to a block

File systems view the disk as an array of blocks Must allocate blocks to file Must manage free space on disk

Allocation Methods

Approaches to allocating blocks to a file Contiguous Allocation Linked List Allocation Linked List Allocation using Index Index Allocation

Contiguous Allocation

Start and length are stored in theFCB


Store each file on a contiguous set of disk blocks

Assume a disk of 1-KB blocks 50 KB file is allocated 50 consecutive blocks

The file’s FCB consists of Disk address of the first block Number of blocks in the file

The directory entry often includes file name and the disk address of the first block


Sequential file access File system remembers the disk address of

the last block reference When necessary the next block is read

Direct access For direct access to block i of a file that

starts with block b, access block b+i


Advantages: Easy to implement Read performance is excellent


(a) Contiguous allocation of disk space for 7 files(b) State of the disk after files D and E have been

removed


Disadvantages Fragmentation Will need periodic compaction (time-

consuming) Will need to manage free lists If new file is put at end of disk

• No problem If new file is put into a “hole”...

• Have to know a file’s maximum possible size ... at the time it is created!

Contiguous allocation

Good for CD-ROMs, DVDs All file sizes are known in advance Files are never deleted

Linked List Allocation


With linked allocation, each file is a linked list of disk blocks

The disk blocks may be scattered anywhere on the disk

The FCB contains pointers to the first and last blocks of the file

Each block contains a pointer to the next block


Advantage No fragmentation (except internal

fragmentation)

Disadvantage Random access is slow

• To get to block n, the operating system has to start at the beginning and read the n-1 blocks prior to it

Linked List Allocation Using Index

Take the pointer word from each disk block and put it in a table in memory

The chains are terminated with a special marker (-1).

The table in main memory is called a FAT (File Allocation Table)


File-Allocation Table

Linked List Allocation using Index

Linked list allocation using a file allocation table in RAM

Fast Random Access










Advantage Chain must still be followed to find a given

offset within the file, but the chain is entirely in memory• No disk references are needed

Disadvantage Entire table must be in memory all the time What if you have a 20-GB disk and a 1-KB block size?

• The table needs 20 million entries, one for each of the 20 million disk blocks

• Each entry is a minimum of 3 bytes which means 60 MB of memory

This technique was used in MS-DOS and Windows-98

Indexed Allocation

Linked allocation does not support efficient direct access Pointers to the blocks are scattered with the

blocks themselves Indexed allocation addresses this

problem with bringing all pointers together into an index block

Indexed Allocation

Each file has its own index block An index block is an array of disk-block

addresses The ith entry in the index block points to

the ith block of the file The directory structure contains the

addresses of index blocks

Example of Indexed Allocation

Index Allocation

Every file must have an index block Index block should be as small as possible Too small may not be able to hold enough

pointers for a large file Too large Waste space

Index Allocation Mechanisms

Linked List Link several index blocks An index block might contain:

• A small header giving the name• Set of the first 100 disk-block addresses• The next address is nil or a pointer to another

index block


Multilevel index Use a first-level index block to point to a set

of second-level of index blocks To access a block, the OS uses the first-level

index to find a second-level index block The second-level index block is used to find

the desired data block


Combined First index block

• Contains N pointers to direct blocks – contain addresses of disk blocks that contain file data

– Good for small files• Contains 3 pointers to indirect blocks

– Single indirect block: index block with addresses of blocks with data

– Double indirect block: Contains the address of a block that contains the addresses of blocks that contain pointers to the actual data blocks

– Triple indirect block

Combined Scheme: UNIX (4K bytes per block)

FCB – in Unix this is called i-Node


Combined First index block

• Contains N pointers to direct blocks – contain addresses of disk blocks that contain file data

– Good for small files• Contains 3 pointers to indirect blocks

– Single indirect block: index block with addresses of blocks with data

– Double indirect block: Contains the address of a block that contains the addresses of blocks that contain pointers to the actual data blocks

– Triple indirect block

Index Allocation

What happens when a file needs more blocks? Reserve the last disk address for the

address of a block containing more disk block addresses

I-node

An I-node is Unix’s FCB

The UNIX I-node entries

Structure of an I-Node: Attributes and addresses of disk blocks

Indexed Allocation

Advantage Only the index block needs to be in memory

when the corresponding file is open Disadvantage

Updating the structure is more complex

Entry Lookup

Opening a file requires that the OS needs the pathname

The OS uses the pathname supplied by the user to locate the directory entry

How can we locate the root directory (the start of all paths)?

Entry Lookup

In Unix systems The superblock (among other things) has

the location of the i-node which represents the root directory

Once the root directory is located a search through the directory tree finds the desired directory entry

The directory entry provides the information needed to find the disk blocks for the requested file

Entry Lookup

This discussion focuses on Unix-related file systems

When a file is opened, the file system must take the file name supplied and locate its disk blocks

Let’s see how this is done for the path name /usr/ast/mbox

Looking up for an entry

The steps in looking up /usr/ast/mbox

Entry Lookup First the system locates the root

directory There is information for each file and

directory within the root directory A directory entry consists of the file name

and i-node number The file system looks up the first

component of the path, usr, to find the i-node for /usr which is i-node 6

From this i-node the system locates the directory for /usr/ which is in block 132

Looking up for an entry

The steps in looking up /usr/ast/mbox

Entry Lookup

The system then searches for ast within the /usr directory which is block 132

The entry gives the i-node for /usr/ast/ From this i-node the system can find the

directory itself and lookup mbox The i-node for this file is then read into

memory and kept there until the file is closed

Free Space Management


Limited disk space Need to reuse the space from deleted files

Bitmap One bit for each block on the disk If the block is free the bit is 1 else the bit is

0 Makes it easy to find a contiguous group of

free blocks Large disks make for large bitmaps that

must be kept in memory Used in Apple machines


Linked List Link together all the free disk blocks Keep a pointer to the first free block in a

special location on the disk and cache it in the memory

Not efficient since traversal means each block should be read; this requires substantial I/O

However, traversal is not done that often Optimization: Stores the addresses of N free

blocks in the first free block;


Indexing Treat free blocks as a file This allows for indexing

Use a combination

Summary

We have examined how files are implemented and how one particular implementation is used to support some of the file operations

File Systems (1). Readings r Silbershatz et al: 10.1,10.2, 11.1-11.5.

Documents

file slide

directory r

file naming r files

directories r file systems

type of file

file creation

file search

basic file operations