File management
Jan 08, 2018
File management
OutlineFile Concept and StructureDirectory StructuresFile OrganizationsAccess MethodsProtectionUnix file system calls
File SystemFile system is a mechanism for storage of
and access to both programs and data to users.
Has two parts, collection of files and a directory structure.
File System has 3 main functions: Facilities for file manipulation and long-term
storage of filesto provide secondary storage management
(covered in earlier lectures)to provide support for system integrity
File Structure– None - sequence of words/bytes– Simple record structure
– Lines– Fixed Length– Variable Length
– Complex Structures– Formatted document– Relocatable Load File
– Can simulate last two with first method by inserting appropriate control characters
– Who decides – Operating System– Program
File AttributesName
symbolic file-name, only information in human-readable formType -
for systems that support multiple typesLocation -
pointer to a device and to file location on deviceSize -
current file size, maximal possible sizeProtection -
controls who can read, write, executeTime, Date and user identification
data for protection, security and usage monitoringInformation about files are kept in the directory
structure, maintained on disk
File OperationsA file is an abstract data type. It can be defined
by operations:Create a fileWrite a fileRead a fileReposition within file - file seekDelete a fileTruncate a fileOpen(Fd)
search the directory structure on disk for entry Fd, and move the content of entry to memory.
Close(Fd) move the content of entry Fd in memory to directory
structure on disk.
File types - name.extensionFile Type Possible extension Function
Executable Exe,com,bin Machine languageprogram
Object Obj, o Compiled machine lang.,not linked
Source code c, CC, p, java, asm… Source code in variouslanguages
Batch Bat, sh Commands to commandinterpreter
text Txt, doc Textual data, documentsPrint, view ps, dvi, gif ASCII or binary filearchive Arc, zip, tar Group of files, sometimes
compressedLibrary Lib, a Libraries of routines
Directory StructureNumber of files on a system can be extensive
DS breaks file systems into partitions ( treated as a separate storage device)
Hold information about files within partitions.Device Directory: A collection of nodes
containing information about all files on a partition.
Both the directory structure and files reside on disk.
Backups of these two structures are kept on tapes.
Information in a Device Directory– File Name– File Type– Address or Location– Current Length– Maximum Length– Date created, Date last accessed (for
archival), Date last updated (for dump)– Owner ID , Protection information• Also on a per file, per process basis– Current position - read/write position– usage count
Operations Performed on Directory
Search for a fileCreate a fileDelete a fileList a directoryRename a fileTraverse the filesystem
Logical Directory Organization -- GoalsEfficiency - locating a file quicklyNaming - convenient to users
Two users can have the same name for different files.
The same file can have several different names.Grouping
Logical grouping of files by properties (e.g. all Pascal programs, all games…)
Single Level DirectoryA single directory for all usersNaming Problem and Grouping Problem
As the number of files increases, difficult to remember unique names
As the number of users increase, users must have unique names.
Two Level DirectoryIntroduced to remove naming problem
between usersFirst Level contains list of user directoriesSecond Level contains user filesNeed to specify Path nameCan have same file names for different users.System files kept in separate directory or
Level 1.Efficient searching
Two Level Directory
Tree structured Directories
Tree Structured DirectoriesArbitrary depth of directories
Leaf nodes are files, interior nodes are directories.Efficient SearchingGrouping CapabilityCurrent Directory (working directory)
cd /spell/mail/progtype list
MS-DOS uses a tree structured directory
Tree Structured Directories– Absolute or relative path name
– Absolute from root– Relative paths from current working directory
pointer.– Creating a new file is done in current
directory– Creating a new subdirectory is done in
current directory, e.g. mkdir <dir-name>– Delete a file , e.g. rm file-name– Deletion of directory • Option 1 : Only delete if directory is empty• Option 2: delete all files and subdirectories under
directory
Access MethodsSequential Access
read nextwrite nextresetno read after last write (rewrite)
Direct Access ( n = relative block number)read nwrite nposition to n read next write nextrewrite n
ProtectionFile owner/creator should be able to control
what can be doneby whom
Types of accessreadwriteexecuteappenddelete list
Access lists and groupsAssociate each file/directory with access list
Problem - length of access list..Solution - condensed version of list
Mode of access: read, write, executeThree classes of users
owner access - user who created the file groups access - set of users who are sharing the file
and need similar access public access - all other users
In UNIX, 3 fields of length 3 bits are used. Fields are user, group, others(u,g,o), Bits are read, write, execute (r,w,x). E.g. chmod go+rw file , chmod 761 game
Standard I/O FunctionsThe C standard library (The C standard library (libc.alibc.a) contains a ) contains a
collection of higher-level collection of higher-level standard I/O standard I/O functionsfunctionsExamples of standard I/O functions:Examples of standard I/O functions:
– Opening and closing files (fopen and fclose)– Reading and writing bytes (fread and fwrite)– Reading and writing text lines (fgets and fputs)– Formatted reading and writing (fscanf and fprintf)
Standard I/O StreamsStandard I/O models open files as Standard I/O models open files as streamsstreams– Abstraction for a file descriptor and a buffer in memory.
C programs begin life with three open streams (defined in C programs begin life with three open streams (defined in stdio.hstdio.h))– stdin (standard input)– stdout (standard output)– stderr (standard error)
#include <stdio.h>extern FILE *stdin; /* standard input (descriptor 0) */extern FILE *stdout; /* standard output (descriptor 1) */extern FILE *stderr; /* standard error (descriptor 2) */
int main() { fprintf(stdout, “Hello, world\n”);}
Example: File creation• The following example shows a file creation with rwxr-xr-x The following example shows a file creation with rwxr-xr-x
permissions; note also the dealing with error conditions in the permissions; note also the dealing with error conditions in the standard way. standard way.
Note that creat() erases the content of the file whose creation is Note that creat() erases the content of the file whose creation is requested, if it already exists, and in this case the permissions are requested, if it already exists, and in this case the permissions are left unchanged. left unchanged.
#include <sys/stat.h>#include <sys/types.h> #include <fcntl.h> int fildes;….fildes=creat("myfile", 0755); if (fildes==-1) { perror("myfile"); exit(1); }else{ ….
Opening FilesOpening a file informs the kernel that you are Opening a file informs the kernel that you are
getting ready to access that file.getting ready to access that file.
Returns a small identifying integer Returns a small identifying integer file descriptorfile descriptor– fd == -1 indicates that an error occurred
Each process created by a Unix system Each process created by a Unix system begins begins life with three open fileslife with three open files associated with a associated with a terminal:terminal:– 0: standard input– 1: standard output– 2: standard error
int fd; /* file descriptor */
if ((fd = open(“/etc/hosts”, O_RDONLY)) < 0) { perror(“An error has occurred”); exit(1);}
Moving inside a file• Input and output are normally sequential: each read or write Input and output are normally sequential: each read or write
takes place at a position in the file right after the previous one. takes place at a position in the file right after the previous one. When necessary, however, a file can be read or written in any When necessary, however, a file can be read or written in any arbitrary order. The system call lseek provides a way to move arbitrary order. The system call lseek provides a way to move around in a file without reading or writing any data: around in a file without reading or writing any data: long lseek(int fd, long offset, int origin);long lseek(int fd, long offset, int origin);sets the current position in the file whose descriptor is fd to sets the current position in the file whose descriptor is fd to offset, which is taken relative to the location specified by origin. offset, which is taken relative to the location specified by origin. Subsequent reading or writing will begin at that position. origin Subsequent reading or writing will begin at that position. origin can be 0, 1, or 2 to specify that offset is to be measured from can be 0, 1, or 2 to specify that offset is to be measured from the beginning, from the current position, or from the end of the the beginning, from the current position, or from the end of the file respectively.file respectively.On success, lseek() returns the resulting pointer location, as On success, lseek() returns the resulting pointer location, as measured in bytes from the beginning of the file, hence to measured in bytes from the beginning of the file, hence to enquire about the current position you can just call it as enquire about the current position you can just call it as lseek(fildes, 0, SEEK_CUR).lseek(fildes, 0, SEEK_CUR).
Example#include "syscalls.h" #include "syscalls.h" /*get: read n bytes from position pos */ /*get: read n bytes from position pos */ int get(int fd, long pos, char *buf, int n) int get(int fd, long pos, char *buf, int n) { { if (lseek(fd, pos, 0) >= 0) /* get to pos */ if (lseek(fd, pos, 0) >= 0) /* get to pos */ return read(fd, buf, n); return read(fd, buf, n); else return -1; else return -1; }}
Example• You are allowed to set the file pointer at a position way You are allowed to set the file pointer at a position way
beyond the current end of the file. For example,beyond the current end of the file. For example,
opensopens a file for writing, discarding its content if it already existed, a file for writing, discarding its content if it already existed, then sets about to write after a ``hole'' 100,000 bytes long. This then sets about to write after a ``hole'' 100,000 bytes long. This doesn't cause UNIX to waste 100,000 bytes of disk space: holes in doesn't cause UNIX to waste 100,000 bytes of disk space: holes in files created by seeks like the above consume very little storage. files created by seeks like the above consume very little storage. If you later attempt to read from within a hole, the system will If you later attempt to read from within a hole, the system will supply ASCII NULL characters (i.e. zeros), but these character will supply ASCII NULL characters (i.e. zeros), but these character will not be physically on the disk.not be physically on the disk.
... fildes=open("myfile", O_WRONLY|O_TRUNC|O_CREAT, 0644); if (fildes==-1) { perror("myfile"); exit(1); }
lseek(fildes, 100000, SEEK_END); ...
Writing FilesWriting a file copies bytes from memory to the current file Writing a file copies bytes from memory to the current file
position, and then updates current file position.position, and then updates current file position.
Returns number of bytes written from Returns number of bytes written from bufbuf to file to file fd.fd.– nbytes < 0 indicates that an error occurred.– As with reads, short counts are possible and are not errors!
Transfers up to 512 bytes from address Transfers up to 512 bytes from address bufbuf to file to file fdfd
char buf[512];int fd; /* file descriptor */int nbytes; /* number of bytes read */
/* Open the file fd ... *//* Then write up to 512 bytes from buf to file fd */if ((nbytes = write(fd, buf, sizeof(buf)) < 0) { perror(“Error occured”); exit(1);}
Reading • UNIX only provides system calls for reading or writing blocks of data. UNIX only provides system calls for reading or writing blocks of data.
To read a block of data from the open file with descriptor fd a code To read a block of data from the open file with descriptor fd a code sequence like the following can be used.sequence like the following can be used.
• As the following example shows, the read() system call takes three As the following example shows, the read() system call takes three arguments: arguments: – the descriptor of the file to read from, – the buffer - where the read bytes should be stored, – and an unsigned number of bytes to read.
• It returns the number of bytes actually read, or -1 if an error It returns the number of bytes actually read, or -1 if an error occurred. occurred.
char buf[512];int fd; /* file descriptor */int nbytes; /* number of bytes read */
/* Open file fd ... *//* Then read up to 512 bytes from file fd */if ((nbytes = read(fd, buf, sizeof(buf))) < 0) { perror(“error reading”); exit(1);}
Reading FilesReading a file copies bytes from the current file Reading a file copies bytes from the current file
position to memory, and then updates file position to memory, and then updates file position.position.
Returns number of bytes read from file Returns number of bytes read from file fdfd into into bufbuf– nbytes < 0 indicates that an error occurred.
char buf[512];int fd; /* file descriptor */int nbytes; /* number of bytes read */
/* Open file fd ... *//* Then read up to 512 bytes from file fd */if ((nbytes = read(fd, buf, sizeof(buf))) < 0) { perror(“error reading”); exit(1);}
Closing FilesClosing a file informs the kernel that you are Closing a file informs the kernel that you are
finished accessing that file.finished accessing that file.
Always check return codes, even for seemingly Always check return codes, even for seemingly benign functions such as benign functions such as close()close()
int fd; /* file descriptor */int retval; /* return value */
if ((retval = close(fd)) < 0) { perror(“Error closing”); exit(1);}
File-System Structure• File Structure
• Logical Storage Unit with collection of related information
– File System resides on secondary storage (disks).• To improve I/O efficiency, I/O transfers between
memory and disk are performed in blocks.– Read/Write/Modify/Access each block on disk.
• File system organized into layers.• File control block - storage structure
consisting of information about a file.
File System MountingFile System must be mounted before it can
be available to process on the systemThe OS is given the name of the device and the
mount point (location within file structure at which files attach).
OS verifies that the device contains a valid file system.
OS notes in its directory structure that a file system is mounted at the specified mount point.
Hierarchical Model of the File and I/O SubsystemsAverage user needs to be concerned only
with logical files and devices. The user should not know machine level details.
Unified view of file system and I/OForm a hierarchical organization of file
system and I/O where– File system functions closer to the user– I/O details closer to the hardware
Functional LevelsDirectory retrievalMap from symbolic file names to precise location of the file, its descriptor, or a table containing this information. The directory is searched for entry to the referenced file.
Basic file systemActivate and deactivate files by opening and closing
routines –more laterVerifies the access rights of user, if necessaryRetrieves the descriptor when file is opened
Physical organization methodsTranslation from original logical file address into
physical secondary storage request Allocation of secondary storage and main storage
buffers
Functional Levels• Device I/O techniques– Requested operations and physical records
are converted into appropriate sequences of I/O instructions, channel
• Commands, and controller orders– I/O scheduling and control– Actual queuing, scheduling, initiating, and
controlling of all I/O requests– Direct communication with I/O hardware– Basic I/O servicing and status reporting
Allocation of Disk SpaceLow level access methods depend upon the
disk allocation scheme used to store file dataContiguous AllocationLinked List AllocationBlock Allocation
Contiguous AllocationEach file occupies a set of contiguous blocks on
the disk. Simple - only starting location (block #) and length
(number of blocks) are required. Suits sequential or direct access. Fast (very little head movement) and easy to recover in
the event of system crash.Problems
Wasteful of space (dynamic storage-allocation problem). Use first fit or best fit. Leads to external fragmentation on disk.
Files cannot grow - expanding file requires copying Users tend to overestimate space - internal
fragmentation.Mapping from logical to physical - <Q,R>
Block to be accessed = Q + starting address Displacement into block = R
Contiguous Allocation
Linked AllocationEach file is a linked list of disk blocks
Blocks may be scattered anywhere on the disk.Each node in list can be a fixed size physical block
or a contiguous collection of blocks.Allocate as needed and then link together via
pointers.Disk space used to store pointers, if disk block is
512 bytes, and pointer (disk address) requires 4 bytes, user sees 508 bytes of data.
Pointers in list not accessible to user.pointer
Block = Data
Linked Allocation
Linked Allocation - Advantages
Simple - need only starting address.Free-space management system - space
efficient.Can grow in middle and at ends. No estimation of
size necessary.Suited for sequential access but not random
access. why? Class discuss.No external fragmentationNo need to declare the size of a fileNo need to compact disk space
Linked Allocation (cont.) – Disadv.
Slow - defies principle of locality. Need to read through linked list nodes sequentially to
find the record of interest.Not very reliable
System crashes can scramble files being updated.Effective only for sequentially accessed files Wasted space to keep pointers (2 words out of
512) 0.39% wastage) Reliability – A bug might overwrite or lose a
pointerMight be solved by doubly linked lists (more
waste of space)
Indexed AllocationBrings all pointers together into one block
called the index block.Logical view
Index table
Indexed AllocationIndex block for each file – disk-block
addresses– ith entry in index block ith block of file– Supports direct access without suffering
from external fragmentation– Pointer overhead generally higher than
that for linked allocation– More space wasted for small files– Size of index block
Indexed Allocation
49
Indexed Allocation (cont.)Need index table.Supports sequential, direct and indexed
access.Dynamic access without external
fragmentation, but have overhead of index block.Mapping from logical to physical in a file of
maximum size of 256K words and block size of 512 words. We need only 1 block for index table.Mapping - <Q,R>
Q - displacement into index table R - displacement into block
Indexed File - Linked SchemeIndex block file block
link
link
Indexed Allocation - Multilevel index
Index block
2nd level Index
link
link
Directory ImplementationLinear list of file names with pointers to the
data blockssimple to programtime-consuming to execute - linear search to find entry.Sorted list helps - allows binary search and decreases
search time.Hash Table - linear list with hash data
structuredecreases directory search timecollisions - situations where two file names hash to the
same location.Each hash entry can be a linked list - resolve collisions
by adding new entry to linked list.
Efficiency and PerformanceEfficiency dependent on:
disk allocation and directory algorithms types of data kept in the files directory entry Dynamic allocation of kernel structures
Performance improved by:On-board cache - for disk controllersDisk Cache - separate section of main memory for
frequently used blocks. Block replacement mechanisms LRU Free-behind - removes block from buffer as soon as
next block is requested. Read-ahead - request block and several subsequent
blocks are read and cached.Improve PC performance by dedicating section of
memory as virtual disk or RAM disk.