1 File Structure File as a stream of characters No structure Consider students registered in a course 320587Joe SmithSC953184923Kathy LeeEN324979231Albert ChanSC943 File as a structured collection of related data A set of related data form a record a file consists of records Information about each student forms a record 320587Joe SmithSC953 184923Kathy LiEN923 249793Albert ChanSC943 What is the meaning of each piece of information about each student?
24
Embed
1 File Structure n File as a stream of characters l No structure l Consider students registered in a course 320587Joe SmithSC953184923Kathy LeeEN324979231Albert.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
File Structure File as a stream of characters
No structure Consider students registered in a course
variable length coding (Huffman) most frequently used letters with least length codes
Irreversible compression from GIF to JPEG save 20 ~ 90 %
14
Reclaiming Space in Files File updates
record addition record deletion record modification
Requirements how to recognize deleted records:
tombstone: * how to utilize space left by deleted records
storage compaction– reconstruct the file to reclaim space occupied by all deleted
records– how often ?
Available List
15
0 1 2 3 4 5 6 7
-1
List Head
Available ListConsider fixed length records Available list is a linked list of deleted records Implemented as a stack Use relative record number (RRN) for physical addresses
Adam Barb Peter Susan Brenda Sue Tim Jack
3 Adam Barb Peter -1 Brenda Sue JackTim
3Adam Barb Peter -1 Brenda Sue Jack
TamerAdam Barb Peter -1 Brenda Sue Jack
6
3
16
Variable Length Records Case Problems
RRNs cannot be used Fitting Fragmentation
internal fragmentation: occurs if variable length records are stored in fixed size slots with padding
external fragmentation: split record leftover may be too small to hold any record
Solutions An available list with the byte offset Placement strategies Storage compaction Coalescing holes
combining adjacent slots to form a bigger one
17
Placement Strategies
First fit unsorted list, the newly deleted record is put at the front insertion uses the first one on the list that fits
Best fit the list is sorted in ascending order insertion uses the first one on the list that fits too much fragmentation
Worst fit the list is sorted in descending order insertion always uses the first one if possible
18
Search ProblemFind a record with a given key value Sequential search: O(n) Binary search: O(log n)
the file must be sorted how to maintain the sorting order?
deleting, insertion
variable length records Sorting
RAM sort: read the whole file into RAM, sort it, and then write it back to disk
Keysort: read the keys into RAM, sort keys in RAM and then rearrange records according to sorted keys
Index
19
Keysorting
320587 Joe Smith SC 95 3184923 Kathy Lee EN 92 3249793 Albert Chan SC 94 3
320587 1184923 2249793 3
Before sortingRRN
320587 Joe Smith SC 95 3184923 Kathy Lee EN 92 3249793 Albert Chan SC 94 3
184923 2249793 3 320587 1
After sorting
Problem: Now the physical file has to be rearranged
20
Indexing A tool used to find things
book index, student record indexes A function from keys to addresses
A record consisting of two fields key: on which the index is searched reference: location of data record associated with the key
Advantages smaller size of the index file makes RAM index possible binary search from files of variable length records rearrange keys without moving records multiple indexes
primary and secondary
21
Operations With an Indexed File
Create original index and data file Load index file into RAM before using it Rewrite index file after using it
file header Update
insertion deletion update
22
Secondary Index
Primary index
CD # physicallocation
ABG379 ...
Composer index
composer CD #
Beethoven ABG379
title CD #
Symphony ABG379
Title index
Provides multiple views of records Example: Consider a collection of music CDs
23
Primary vs Secondary Keys
Uniqueness a primary key is a unique identification of a record a secondary key may be associated with many records
Binding:association of key and address
We may retrieve records using combinations of secondary keys FIND all records WHERE Composer = “ Beethoven” AND Title = “Symphony 9’
24
Binding Association between a key and a physical address Tight binding
bind early the binding takes place when the file is24 constructed
advantage: high performance disadvantage: updates
Lazy binding bind later the binding takes place when they are actually used
advantage: easy updates safer: consistency
Primary index: tight binding; secondary index: later binding