CCT101: Chapter 9 Files
Jun 02, 2015
CCT101: Chapter 9
Files
OBJECTIVESOBJECTIVES
• Describe the types of data processing files• Describe the types of file organization• Data validation
FILE, RECORD & FIELD - Field
• Data item• e.g. student name
- Record• A group of related data items or fields• e.g. student record
- File• A collection of related records• e.g. Student file
ENTITY SET, ENTITY & ATTRIBUTES
- Attributes
• Describe the properties of the entity (I.e. field)
- Entity• Which or when we store facts (i.e. records)
- Entity set• A collection of logically related entities (i.e. file)
1. Physical file :
– Refers to how the data is stored i.e. the actual arrangement of data in storage device
2. Logical file :
– What a file contains & how the data should be processed
Logical File & Physical Files
• It is a field within the record which is used for locating & processing the recorde.g. student number
Key Field
FILE LENGTH
• Fixed-length records– Each record has the same length– Advantage: Easier to design– Disadvantage: Wasted storage space
• Variable-length records– Each record does not have the same length– Advantage: Saves storage space– Disadvantage: More difficult to design
1. Writing :
– The act of transferring a record from main memory to secondary storage.
2. Insertion :
– Adding a new record to an existing file.
3. Deleting :
– Removing a record from a file.
INFORMATION RETRIEVAL
4. Updating :– Making changes to the contents of a record to show the new
status of information.
5. Sorting :
– Rearranging the records in a file for the purpose of producing ordered reports.
6. Merging :
– Combination of 2 or more files to produce a single output file.
INFORMATION RETRIEVAL
7. Matching :– Where 2 or more output files are compared record
against record to ensure there is a complete set of records for each key. Mismatched records are highlighted for action.
8. Searching :– Involves looking for a record with a certain key value
9. Appending :- Adding a record at the last available space of an
existing file
INFORMATION RETRIEVAL
• The number of records that are changed as a result of updating when compared to the total number of records in the file.
– HIT RATE
=
• Volatility :– Measuring the number of additions and deletions in a file.
• File growth– No of records additions – number of records deletions
number or records affectedtotal records on file
ACTIVITY RATIO (HIT RATE)
1. Master file
– Permanent or semi-permanent data
– Used for reference and updating
– Shows the current status of data
– Never empty except at its time of creation
– E.g. stock master file
TYPES OF DP FILES
2. Transaction file
– Contains source or transaction data
– Used for updating master file
– E.g. sales transaction file
3. Work file
– Temporary file
– Used for storing intermediate data for further processing
– E.g. file used by sort utility
TYPES OF DP FILES
4. Transition file
– Temporary file for specific use
– E.g. meter readings, customer’s detail for printout
5. Security & backup file
– Extra copy of file against damage/loss
6. Audit file
– Enables auditor to check correct functioning of computer based procedures
– Keeps a copy of all transactions
TYPES OF DP FILES
FILE ORGANISATIONS
• 4 Types
1. Serial
2. Sequential
3. Indexed-sequential
4. Random
• Simplest, not in any order
• Placed record in next available space
• Suitable for– Unsorted transaction files
– Print files
– Dump files
– Temporary data files
• Access in order of records placed
SERIAL ORGANISATION
SERIAL ORGANISATION
• Advantages :
– File design is simple
– Efficient for high activity file
– Effective use of low cost file media suitable for batch processing
• Disadvantage :
– File are to be processed from beginning to the end
• Predefined order
• A designated field within the record is selected as basis in ordering records
• This key is also known as Record key or Simply key
• Suitable for master file
• Not for fast response on line enquiring systems
• E.g. Payroll transaction file
SEQUENTIAL ORGANISATION
SEQUENTIAL ORGANISATION• Advantages :
– File design is simple
– Efficient for high activity file
– Effective use of low cost file media suitable for batched transactions
• Disadvantage :
– Entire file must be processed even if activity is low
– Transactions required sorting
• Physical sequence to primary key
• Builds an index separate from the data or
records
• Accessed randomly and sequentially
• 3 main parts– Prime (Home) area
– Overflow area
– Index area
INDEXED SEQUENTIALORGANISATION
INDEXED SEQUENTIALORGANISATION
• When insufficient space in home area (prime area), overflow area will be used
• Overflow areas created at cylinder & track level
• Access controlled by means of pointers
• File reorganization to be done
• Overflow records recovered & indexes rebuilt
- Support three types of processing :
1. Sequential processing
2. Selective sequential processing/ Random access
3. Block is searched record by record until record is found/ Direct access/ Dynamic access
INDEXED-SEQUENTIAL FILES
• Predictable relationship between record key & record’s location on disc
• Not in sequence physically, scattered in random
• Direct addressing
• Key as physical address of record
• Device dependent
RANDOM ORGANISATION
INDEXED-SEQUENTIAL ORGANISATION• Advantages :
– Transactions may be sorted or unsorted
– Only the affected master records are processed during updating
– Response time is reasonably fast
– Facilities file enquiry
– Be processed sequentially and randomly
• Disadvantage :
– Each master file access requires index file access
– Requires direct access storage devices (still costly)
– Storage space required for indexes
RANDOM ORGANIZATION
• Predictable relationship between record key and record location on disc
• Records may be scattered in random
• Direct addressing
RANDOM ORGANIZATION
• Key transformation techniques used
1. Division remainder method
Divide key value by an appropriate number
Remainder of division as address of record
Number used to divide is prime number
2. Mid Square Hashing
The key is squared, specified digits extracted from middle of the
result to yield address of the results
RANDOM ORGANIZATION
3. Hashing By Folding
– Key is divided into 2 or more parts which are then added together
– Truncation to bring result into required range of numbers
RANDOM ORGANISATION• Advantages :
– As index are not required, space and searching time are saved
– Insertion and deletion or records can take place
• Disadvantage :
– Variable-length records are difficult to handle
– Gaps in keys can caused wasted space
– Synonym can occur
– Allocation of efficient overflow areas is difficult
• Double punching method
• Sight verification
DATA VERIFICATION
• Presence
• Size
• Range
• Character check
• Format
• Reasonableness
• Check digits
DATA VALIDATION
• Adequate program checkpoint/ restart facilities
• File dumps
• Generations of backup files
ERROR RECOVERY