1 Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to access the records without affecting the existing placement of records on the disk. Each indexing approach have a particular data structure to speed up the search. A variety of indexing techniques are studied here. Types of Single-Level Ordered Indexes - Concept of indexes is similar to an index of terms in a book - Index access structure is usually a single field of a file called indexing field - The index stores each value of the field along with all disk blocks that contain records with this field - The values in the index are ordered so that a binary search can be done - Both the index and data files are ordered, but index file is smaller Several types of ordered indexes: - Primary index specified on a key field - Clustering index, ordering field is not a key field; the data file is called clustered file - A file can have at most one physical ordering field; it can have one primary index, or one clustering index but not both
39
Embed
Chapter 17 Indexing Structures for Files and Physical ...orion.towson.edu/~karne/teaching/c657sl/Ch17Notes.pdf · Indexing Structures for Files and Physical Database Design ... o
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Chapter 17
Indexing Structures for Files and Physical Database Design
We assume that a file already exists with some primary organization
unordered, ordered or hash. The index provides alternate ways to
access the records without affecting the existing placement of records
on the disk.
Each indexing approach have a particular data structure to speed up
the search. A variety of indexing techniques are studied here.
Types of Single-Level Ordered Indexes
- Concept of indexes is similar to an index of terms in a book
- Index access structure is usually a single field of a file called indexing
field
- The index stores each value of the field along with all disk blocks
that contain records with this field
- The values in the index are ordered so that a binary search can be
done
- Both the index and data files are ordered, but index file is smaller
Several types of ordered indexes:
- Primary index specified on a key field
- Clustering index, ordering field is not a key field; the data file is
called clustered file
- A file can have at most one physical ordering field; it can have one
primary index, or one clustering index but not both
2
- Secondary index can be specified on any non-ordering field of a file;
a data file can have several secondary indexes in addition to the
primary access method
Primary Indexes
- Ordered file with 2 fields, PK field and data ptr. PK is the primary key
of the data file. Ptr is the pointer to a disk block; PK is the value for
the first record in the block
- Each block in the data file has one entry in the index file
- The two fields <K(i), P(i)>; P(i) is the pointer for the block in data file
- In general the two fields are: <K(i), X>:
o X may be the physical address of a block (or page)
3
o X may be the record address made up of a block address and a
record id (for offset) with in the block
o X may be a logical address of the block or of the record within
the file and is a relative number that would be mapped to
physical address
Fig. 17-1
- First record in each block of the data file is called an block anchor or
anchor record
- Indexes can be dense or parse
- A dense index has an entry for every search key value; a parse index
has index entries for only for some of the search values
- To retrieve a record given the value of its PK field, we do a binary
search on the index file to find appropriate entry I, and then retrieve
the data field block whose address is P(i).
4
5
Example 1.
Ordered file with records r = 300000
Disk block size = B = 4096 bytes
File records are fixed size and unspanned
Record length = R = 100 bytes
Blocking factor bfr = [B/R] (lower ceiling) = 4096/100 = 40 records per
block
The number of blocks needed for the file = [r/bfr] (upper ceiling)
= 300000/40 = 7500 blocks
A binary search on the data file = log2(7500) (upper ceiling) = 13
Let ordering key field is 9 bytes, block pointer is 6 bytes; total 15 bytes
in each entry of index file
bfr for index file is 4095/15 (lower ceiling) = 273
total number of index entries = total no of blocks
The number of index blocks = [7500/273] (upper ceiling) = 28
To perform binary search on index file: log2(28) (upper ceiling) = 5
To search for a record, we need one additional access to read the data
block, thus we need 5+1 = 6 block accesses using a binary search
Whereas, we need 13 block accesses without an index file
Problems with Primary Index
- Insertion
o Inserting in correct position (make space, change index
entries)
o Move records to make space for new records
6
o Move will change anchor records of some blocks
o Use linked list or overflow records
- Deletion
o Use delete markers
Clustering Indexes
- Ordered on a non-key field (no distinct values), clustering field
- The data field is ordered on a non-key field called a clustered file
- Seed up retrieval of all records that have the same value for the
clustering field
- Includes one index entry for each distinct value of the field, the
index entry points to the first data block that contains records with
the field value
- Another example of non-dense (or parse) index
- Insertion and deletion problems (reserve one or more blocks for
each value of the clustering field);
Example 2:
r = 300000 records
B = 4096 bytes
It is ordered by zip codes; there are 1000 zip codes in the file
Average 300 records per zip code (assume even distribution)
The index 1000 index entries, 5 bytes zip code, 6 bytes block no, 11
bytes total in each entry;
bfr = 4096/11 (lower ceiling) = 372 index entries per block
The number of index blocks = 1000/372 (upper ceiling) = 3
7
Binary search on index file would require log2(3) = 2 block accesses
The index is loaded in main memory 1000*11 = 11000 bytes.
8
9
Secondary Indexes
- The data field records could be unordered, ordered, or hashed
- A secondary index provides a secondary means of accessing a file for
which some primary access already exists
- The secondary index may be on a field which is a candidate key and
has a unique value in every record, or a non-key with duplicate
values
- The index is an ordered file with two fields:
o The first field is of the same data type as some non-ordering
field of the data file that is an indexing field
o The second field is either a block or record pointer
o There can be many secondary indexes (and hence, indexing
fields) for the same file; each represents an additional means
of accessing that file based on some specific field
- Includes one entry for each record in the data file; hence it is a
dense index (records of the data file are not physically ordered by
secondary key)
- A secondary index needs more storage space and longer search time
than primary index, because of its longer number of entries
- Search time is improved as there is no need to do linear search on
records in a data file (records are directly accessed)
10
11
Example 3:
r = 300000 records
size R = 100 bytes
block size B = 4096 bytes
no of records per block bfr = 4096/100 = 40
no of blocks for the data file = b = 300000/40 = 7500
suppose we want to search for a record with a specific value for the
secondary key ….a non-ordering key with 9 bytes value
without a secondary index; to do a linear search on the file would