1 Advanced Topics in DBMS Ch-2: Tree Structured Indexing By Syed khutubddin Ahmed Assistant Professor Dept. of MCA Reva Institute of Technology & mgmt. Syed Khutubuddin, Assistant Prof, REVA ITM REMEMBER Two types of Index Data Structures: 1) Hash based Indexing 2) Tree Based Indexing Syed Khutubuddin, Assistant Prof, REVA ITM 2 Index Data Structure
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Advanced Topics in DBMS
Ch-2: Tree Structured Indexing
By
Syed khutubddin Ahmed
Assistant Professor
Dept. of MCA
Reva Institute of Technology & mgmt.
Syed Khutubuddin, Assistant Prof,
REVA ITM
REMEMBER
Two types of Index Data Structures:
1) Hash based Indexing
2) Tree Based Indexing
Syed Khutubuddin, Assistant Prof,
REVA ITM 2
Index Data Structure
2
• The data entries are arranged in sorted order by search key value.
• and a hierarchical search data structure is maintained.
Syed Khutubuddin, Assistant Prof,
REVA ITM 3
What is Tree-Based Indexing:
Syed Khutubuddin, Assistant Prof,
REVA ITM 4
Tree Structured index
3
• index storage techniques uses 3 alternatives for data entries k*:
– Data record with key value k
– <k, rid of data record with search key value k>
– <k, list of rids of data records with search key k>
• REMEMBER
• Tree-structured indexing techniques support both range searches and equality searches.
Syed Khutubuddin, Assistant Prof,
REVA ITM 5
Tree Structured indexing
• Two techniques available in tree structured indexing:
1. ISAM (indexed sequential access method)
2. B+ Trees Both supports effective range searches
Syed Khutubuddin, Assistant Prof,
REVA ITM 6
Tree Structured indexing
4
ISAM
– it is static index structure that is effective when the file is not frequently updated.
– This method is not suitable for a file that grows and shrinks a lot.
B + Trees A dynamic structure that adjusts to changes in the file gracefully.
Most widely used index structure.
because it adjusts well to changes
Supports equality search and range search
Syed Khutubuddin, Assistant Prof,
REVA ITM 7
Take an example of Range Searches
• ``Find all students with gpa > 3.0’’
– What is the Solution:
– If data is in sorted file, do binary search to find first such student, then scan to find others.
– Remember Cost of binary search can be quite high if data is more as we are working on original data.
• Simple idea: Create an `index’ file.
Syed Khutubuddin, Assistant Prof,
REVA ITM 8
Motivation for tree indexes
5
• Index file may still be quite large. But we can apply the idea repeatedly!
* Leaf pages contain data entries.
P 0
K 1 P
1 K 2 P
2 K
m P m
index entry
Non-leaf
Pages
Pages
Overflow page
Primary pages
Leaf
ISAM
Comments on ISAM File creation:
– Leaf (data) pages allocated sequentially,
– sorted by search key;
– then index pages allocated,
– then space for overflow pages.
• Index entries: <search key value, page id>;
6
Non-leaf Pages
Pages Overflow
page Primary pages
Leaf
ISAM
• Data entries of the ISAM index are in the leaf of the tree
• and additional overflow pages chained to some leaf pages.
ISAM structure is static (except for overflow pages as they will be very few)
ISAM (Index Sequential Access method)
Each tree node is disk page.
When a file is created all leaf pages are allocated sequentially and sorted on the search key value.
The non leaf level pages are then allocated.
If there are several inserts to the file (but is there is no space ) then additional pages are needed because the index is static (these pages are called Overflow pages).
7
basic operations of insertion, deletion, and search are all quite straightforward
equality selection search start at the root node
• For Range Query the starting point in the data (or leaf) level is determined similarly, and data pages are then retrieved sequentially.
Syed Khutubuddin, Assistant Prof,
REVA ITM 13
• For inserts and deletes search the page and then insert it or delete it with overflow pages added if necessary.
• assume that each leaf page can contain two entries.
Syed Khutubuddin, Assistant Prof,
REVA ITM 14
8
• Let us insert the value 23 that is done by adding an overflow page and putting 23* in the overflow page.
• Chains of overflow pages can easily develop.
• For instance, inserting 48*, 41 *, and 42* leads to an overflow chain of two pages.
Syed Khutubuddin, Assistant Prof,
REVA ITM 15
• The deletion of an entry k* is handled by simply removing the entry.
• If this entry is on an overflow page and the overflow page becomes empty, the page can be removed.
• If the entry is on a primary page and deletion makes the primary page empty, the simplest approach is to simply leave the empty primary page as it is; it serves as a placeholder for future insertions.
Syed Khutubuddin, Assistant Prof,
REVA ITM 16
ISAM
9
• once the ISAM file is created, inserts and deletes affect only the contents of leaf pages.
• It does not effect the Non leaf pages. As it is fixed.
• In comparison to B+ trees the non leaf pages are not fixed. This is advantage of ISAM over B+ tree. At the same time it has a disadvantage of being static.
Syed Khutubuddin, Assistant Prof,
REVA ITM 17
Overflow pages, Locking Considerations
• A static structure such as the ISAM index suffers from the problem that long overflow chains can develop as the file grows, leading to poor performance.
• This problem motivated the development of more flexible, dynamic structures that adjust gracefully to inserts and deletes.