IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
Post on 19-Dec-2015
216 Views
Preview:
Transcript
IELM 230: File Storage and Indexes
Agenda:
- Physical storage of data in Relational DB’s
- Indexes and other means to speed Data access
- Defining indexes in SQL
Physical Data Storage
- All data in a DB is stored on hard disks (HD)
platter arm motion
head
Steppermotor
HD motor(7200rpm)
arm
platter arm motion
head
Steppermotor
HD motor(7200rpm)
arm
- All data in a file series of bits (0, 1)- Each bit is stored (0 magnetised, 1 demagnetised) along points on tracks (concentric circles)
Physical Data Storage..
Data connections, including16 pins to carry data3 pins to specify sector for I/O
Data connections, including16 pins to carry data3 pins to specify sector for I/O
Typical IDE HD controller40-pin socket to motherboard
HD storage details
Track nSector
Cluster of four sectors
R/W Head
Track 0
Track 1023
Track nSector
Cluster of four sectors
R/W Head
Track 0
Track 1023
Schematic of data storage on a 1024-track disk
SECTOR: Smallest unit of data exchange (typical size: 512Byte) [why?]
CLUSTER: group of four sectors
R/W heads move together Four R/W heads can read at the same time
CYLINDER: tracks on different platters that can be read simultaneously
Delays in HD CPU data communication
Block: the amount of data in one sector
CPU,RAM
HD Controller
[request time] Block address
[seek time] Use stepper motor to locate R/W head above correct track
HD buffer[read time] read 1 Block of data from HD and store in HD buffer
[transfer time] send Block of data on Data bus to RAM
1
2
3
4
CPU
RAM
R/W request[sector address]
1 sector of data(1 block)
CPU
RAM
R/W request[sector address]
1 sector of data(1 block)
Typical DB concerns
Fast data access even when
- many users are simultaneously accessing a DB
- data is in a table with millions of rows
Typical operations
- Search for a particular row of data in a table
- Creation a row in a table
- Modify some data in a row of a table
- Deleting a row of data from the table
SELECT…
INSERT…
UPDATE…
DELETE…
How to store tables on the HD
1. Each table is stored as an independent file
2. The attributes in a table are often accessed together [Why ?]
Need to store the attribute values in each record contiguously and
Attributes MUST be stored in the same sequence for each record
3. We can choose the sequence in different records are stored [Why ?]
Storage format for records
Li Richard 99998888 D8
Employee( Lname, Fname, ID, DeptNo)
Record 1
Record 2 Patten Christopher 44444444 D0
field separator record separator
Record 1 Record 2 Record 3 Record4Block n
Block n+1 Record 5 Record 6 Record 7
wasted disk space…
Approximate time for different operations
CPU,RAM
HD Controller
[request time] < 10-6 sec Block address
[seek time] ~3x10-3 sec Use stepper motor to locate R/W head on track
HD buffer
[read time] ~2-4x10-3 sec (including mean latency) read 1 Block of data from HD and store in HD buffer
[transfer time] ~3x10-3 sec send 1 Block of data on Data bus to RAM
1
2
3
4
Capacity: 600GBBuffer (Cache) Size: 16MBBytes per Sector: 512Disk Drive Configurations: Disks: 4. Heads: 8
Performance Specifications:Spindle Speed (RPM): 15000Seek Time: Average Read (ms): 3.4Average Rotational Latency (ms): 2.0Transfer Rate: SCSI (MB/s): 600.
Specifications Seagate Cheetah 15K.7 600GB Hard drive
CPU: Search for a record in one block of data stored in the RAM: ~10-6sec
Time analysis of operation on DB
Total time for an operation (e.g. search for a record in a DB):
1. TRANSFER block of Data to RAM
2. Search for data in BLOCK [transfer data from RAM to CPU] + [examine data] + [report output]
ForEachBlock
few 10-3 secs
few 10-6 secs
Since TRANSFER time dominates, we will ignore CPU timefor all further analysis.
Heap Files
HEAP file: - All records of the table stored in the order of creation - Stored in one large file - Stored on contiguous blocks on HD
Operation: Insert a new record
Method:Get file data (Location of 1st Block, Size of file)Transfer last block from HD RAMIf (enough space) Add Record to Block Transfer updated Record HD (write)Else Increment file size by 1 Block, Add record to new Block, Transfer updated Record HD (write)
t sec
t sec
Worst case time = 2t sec (very fast)
Heap file operations..
Operation: Search for a record
Method: Linear search
(1) Transfer 1st Block RAM
(2) (CPU) Search for record in this block
(3) If no match is found
(3.1) Copy the next block into RAM
(3.2) Go to (2)
Performance:Let: Size of file: B blocksWorst case = (the data is in the last Block, or not in Table)Worst case time = Bt (very slow)Average case time: Bt/2 (very slow)
Heap file operations..
Operation: Update a record
Method: Linear search
(1) Search for the record to update (Linear search)
(2) If found: Modify record; Write the updated Block to HD
Performance:
Let: Size of file: B blocks
Worst case = Step (1): Bt; Step (2): t
Worst case time = Bt+t (very slow)
Average case time: (Bt+t)/2 (very slow)
Heap file operations..
Operation: Delete a record
Method: Same as for Update
Performance: Same as for Update
Problem:
Extra space (‘Hole’) is left in the Block with the deleted record
Typical solutions:
(a) Periodic consolidation of Blocks
(b) Use of 1-bit ‘RECORD_DELETED’ markers
Sorted Files
Main idea:Sort the records in the fileBased on one attribute value (ordering attribute/field).
1008Anders
1002Akers
1001Abbot
1008Anders
1002Akers
1001Abbot
Lname SSN Job Salary
1086Atkins
1055Wong
1024Alex
1086Atkins
1055Wong
1024Alex
1239Jacobs
1208Nathan
1197Arnold
1239Jacobs
1208Nathan
1197Arnold
1412Aaron
1321Adams
1310Anderson
1412Aaron
1321Adams
1310Anderson
1615Ali
1514Zimmer
1413Allen
1615Ali
1514Zimmer
1413Allen
2085Acosta 2085Acosta
Block 1
Block 2
Block 3
Block 4
Block 5
Block n
Table sorted by SSN
Sorted file operations
Operation: Search for a record, given value of ordering field
Method: Binary searchLet file size = b Blocks.
1. Look in the block number b/2If (searched record is in this block), DONE;If (searched value) > (last ordering field value in this block)
Binary search in blocks between ( b/2 + 1), b;Else Binary search in blocks between 1, ( b/2 - 1).
Performance: Worst case: t(1 + lg2b)
Sorted file operations..
Heap file vs. Sorted File, Search time comparison
ASSUME: file size = 8192 blocks.
Heap file:Worst case time = 8192t
Sorted file: Worst case time = t( 1 + lg2 8192) = t(1+ 13) = 14t
Searching in sorted file is 8192/14 ≈ 585 times faster
Sorted file operations…
Operations: Delete a record/update a value in a record
Method: Binary search for record; Modify and Write block
Performance:The worst case time = t(1 + lg2b) + t (fast)
NOTE
1. Still need to perform occasional ‘file compacting’ after deletions
2. What if we want to modify the ordering attribute value?
Worst case search time
Sorted file operations….
Operations: Insert a new record Update the ordering attribute value of a record
Method 1:Insert the record in correct position by ordering field.
1. Search correct block to insert record2. If (Block is full)
2.1. Remove last record in Block2.2. Insert new record and rewrite block2.3. Insert the removed block of step 2.1 in next Block…
Performance: Search for the insertion point ≈ t(1+ lg2b) +
Read and Write each block = 2btVery inefficient
Sorted file operations…..
Operations: Search for a record in Table in Sorted+Overflow files
Method:
1. Binary search in Main file2. Linear search in Overflow file
Performance: [exercise]
Sorted file operations….
Operations: Insert a new record Update the ordering attribute value of a record
Method 2: Overflow filesUse two files to store a Table:
Main file: contains most of the records, SORTEDOverflow file: recently inserted records stored in this, HEAP
At periodic intervals, Overflow file records merged into Main file,
Performance: Insertion time: 2t (constant time) (very fast) + occasional time to consolidate Overflow and Main files
Faster search: Hashing
Main idea: divide data into a series of organized “buckets”
Setting up a hash table:
1. Estimate maximum size of Table (e.g. 10,000 Blocks)
2. Specify maximum search time for a record (e.g. 10t)
3. Determine bucket size (here, 10 Blocks)
4. Determine a hashing attribute
5. Determine a hashing function, h( )
h( hash_attribute_value) = Bucket_number
6. Reserve max_size contiguous Blocks on HD
Using a hash file
Insert a record:Let Bucket size = b blocks;
1. Compute the Bucket address = Addr = h( hash key value)2. Get Block at address Addr to RAM 2.1. If enough space, insert and rewrite Block to HD 2.2. Else Set (Addr = Addr+1); go to Step 2.
NOTE:
1. Selection of h( ) is critical: h( hash_key_values) must be uniformly distributed on 1,..n Buckets
2. What happens if a Bucket is full ?
Performance: Constant time for Search, Insert, Delete, Update
Indexes
A primary index file is an index that is constructed usingthe sorting attribute of the main file.
- Hash files sacrifice extra disk space [Why?] for operation speed
- Another way to use extra space for faster operations: Index files
- default sorting attribute: primary key
Primary Index
Block 1
Block 2
Block 3
Block 4
Block 5
Block n
31197
41310
51413
…
21024
11001
31197
41310
51413
…
21024
11001
n2085 n2085
SSN Block No
Primary IndexKey attributeAnchor value
BlockAddress
Primary Index File Main File
1008Anders
1002Akers
1001Abbot
1008Anders
1002Akers
1001Abbot
Lname SSN Job Salary
1086Atkins
1055Wong
1024Alex
1086Atkins
1055Wong
1024Alex
1239Jacobs
1208Nathan
1197Arnold
1239Jacobs
1208Nathan
1197Arnold
1412Aaron
1321Adams
1310Anderson
1412Aaron
1321Adams
1310Anderson
1615Ali
1514Zimmer
1413Allen
1615Ali
1514Zimmer
1413Allen
2085Acosta 2085Acosta
Example:
Primary Index..
Operation: Search for a record in the main file
Procedure:1. Binary search for Block address of record in primary index file2. Fetch Block of Main file with searched record to RAM 2.1. Search this block for the data
Performance:Let size of Primary Index file = P blocksWorst case time to locate Block address ≈ t(1 + lg2P)
Time to fetch located block from main file = tTotal worst case time ≈ t(1 + lg2P) + t = t(2 + lg2P) (very fast)
Primary Index…
Example: search for record of SSN= ‘1208’
Block 1
Block 2
Block 3
Block 4
Block 5
Block n
31197
41310
51413
…
21024
11001
31197
41310
51413
…
21024
11001
n2085 n2085
SSN Block No
Primary IndexKey attributeAnchor value
BlockAddress
Primary Index File Main File
1008Anders
1002Akers
1001Abbot
1008Anders
1002Akers
1001Abbot
Lname SSN Job Salary
1086Atkins
1055Wong
1024Alex
1086Atkins
1055Wong
1024Alex
1239Jacobs
1208Nathan
1197Arnold
1239Jacobs
1208Nathan
1197Arnold
1412Aaron
1321Adams
1310Anderson
1412Aaron
1321Adams
1310Anderson
1615Ali
1514Zimmer
1413Allen
1615Ali
1514Zimmer
1413Allen
2085Acosta 2085Acosta
Block 1
Block P
1.Binary search in P blocks SSN= ‘1208’ inBlock 3 of Main file
2.Fetch Block 3 of main file;
3.Find data of SSN=‘1208’;
Primary Index….
Operation: Insert a record into main file
Problem: - Main file must be sorted by sorting attribute
insert into correct position is too expensive
Solution: - Newly inserted records are stored in Overflow file
NOTE: Overflow file may be a Hash file (fast), or Heap file
Performance analysis: Constant time (add record to last Block in Overflow file)
Secondary Indexes
Secondary index file is an index constructed on any non-sortingattribute of the Main table.
The Secondary Index is a two column file storing the block addressof every secondary index attribute value of the table.
Secondary Indexes..
Block 1
Block 2
Block 3
Block 4
Block 5
Block n
1Akers
nAcosta
4Adams
…
1Abbot
4Aaron
1Akers
nAcosta
4Adams
…
1Abbot
4Aaron
5Allen
5Ali
…
…
5Allen
5Ali
…
…
Lname Block No
Secondary IndexKey attribute value
BlockAddress
Secondary Index File Main File
1008Anders
1002Akers
1001Abbot
1008Anders
1002Akers
1001Abbot
Lname SSN Job Salary
1086Atkins
1055Wong
1024Alex
1086Atkins
1055Wong
1024Alex
1239Jacobs
1208Nathan
1197Arnold
1239Jacobs
1208Nathan
1197Arnold
1412Aaron
1321Adams
1310Anderson
1412Aaron
1321Adams
1310Anderson
1615Ali
1514Zimmer
1413Allen
1615Ali
1514Zimmer
1413Allen
2085Acosta 2085Acosta
2Wong
5Zimmer
…
2Wong
5Zimmer
…
Example:Secondary Indexon Lname
Secondary Indexes…
Operations and time analysis: Similar to Primary Index
Each table can have only one primary index
You can define more than one secondary index files
Why would we create more than one index for the same table?
Creating, Deleting Indexes in SQL
Example 1: Create an index file for Lname attribute of EMPLOYEE.
CREATE INDEX myLnameIndex ON EMPLOYEE(Lname);
Example 2:You can also create an Index on a combination of attributes.
CREATE INDEX myNamesIndex ON EMPLOYEE(Lname, Fname);
Example 3: Delete the index created in Example 2.
DROP INDEX myNamesIndex;
top related