B-Trees
B-Trees
Why B-Trees?
�Trees studied so far are for storing data in memory
�B-Trees are better suited for storing data in memory AND on secondary storage.
�Better suited for balancing data than some other three ADTs.
The Problem With Unbalanced Trees
1
2
3
4
5
The levels are sparselyfilled resulting in deeppaths. This defeats thepurpose of binary trees
Possible Solutions To Unbalanced Trees
� Periodically balance the tree
�Don’t let a tree get too unbalanced when inserting or deleting�AVL Trees: Sometimes called HB[1] trees.
Invented by Adel’son-Vel’skii and Landis ~early 1960s.
�B-Trees: Proposed by R. Bayer & E.M. Creight
What Is A B-Tree?
� It is a type of “multiway” tree.
� It is NOT a binary search tree, nor is it a binary tree.
� It provides a fast way to index into a multi-level set of nodes.
�Each node in the B-Tree contains a sorted array of key values.
Motivation For Multiway Tree
� Secondary storage (e.g., disks) is typically divided into equal-sized blocks (e.g., 512, 1024, …, 4096, …)
� The basic I/O operation reads and writes blocks rather than single bytes at a time between secondary storage and memory.
� Goal is to devise a multiway search tree that will minimize file access by exploiting disk reads.
� Each access to secondary storage is approximately equal to 250K instructions … depending on the speed of the CPU
Multiway Search Tree (order m)
�A generalization of a binary search trees.
�Each node has at most mchildren.�If k<=m is the number of children, then the node
has exactly k-1 keys.
�The tree is ordered.
Multiway Search Tree (cont.)
keys < k1 k2 < keys < k3 k5 < keys
k1 k2 k3 k4 k5Nodes ina multiwaytree
Definition Of A B-Tree
� A B-Tree of order m is a m-way tree such that
� All leaves are on the same level
� All internal nodes except the root node are constrained to have at most mnon-empty children and at least m/2 non-empty children
� The root node has at most m non-empty children
� A leaf node must contain atleast ((m/2) – 1) keys
Three Important Properties Of B-Trees
�All nodes in the B-Tree are at least half-full (root node is an exception at times)
�The B-tree is always balanced. That is, an identical number of nodes must be read into memory in order to locate all keys at any given level in the tree.
� A well organized B-Tree will have just a small number of levels relative to the number of nodes.
Where are B-Tree Used?
�B-Trees are commonly found in database and file systems.
�B-Trees allow logarithmic time insertions and deletions.
�They generally grow from the bottom upwards as elements are inserted, whereas most binary trees grow downward.
The Six Rules Governing B-Trees
�R1: A B-Tree might be empty, if not, then each node has some specified MINIMUM number of entries in each node.
�R2: The MAXIMUM number of entries is twice the MINIMUM.
The Six Rules Governing B-Trees (cont)
�R3: The entries of each B-Tree node are stored in a partially filled array, sorted from the smallest entry (at index 0) to the largest entry (at the final position of the array).
....nk*kh
0 n-1
The data in such an array can be stored in a blockon a disk
B-Tree node
* B-Trees cansupport duplicatekeys
The Six Rules Governing B-Trees (cont)
� R4: The number of subtrees below a non-leaf node is always one more than the number of entries in the node.
826755454 entries in a non-leaf node
Keys < 45
Keys > 45& < 55
Keys > 55& < 67
Keys > 67& < 82
Keys > 82
5 subtrees
subtree 0
subtree 1
subtree 2
subtree 3
subtree 4
0 1 2 3
The Six Rules Governing B-Trees (cont)
�R5: For any non-leaf node:�An entry at index i is greater than all the
entries in subtree i of the node, and�An entry at index i is less than all the entries
at entry i+1 of the node.
�R6: Every leaf node in a B-Tree has the same depth (i.e., at the same level)
Example B-Tree
MIN = 1MAX = 230 80
50 60
35 40
20 90
9572 82 85552510
Searching For A Target In B-Trees
� Start with root node and search for target in the array at that node. If found, then done and return success.
� If the target is not in the root and there are no children, then also done, but return failure.
� If the target is not in the root node, and there are children, then if the target exists, then it can only be in one subtree.
� Compare the target with the listed keys and traverse first subtree i for which target is < key_array[ i] …while search key_array from left to right … up to data_count .
Repeat the process at the new root node
Inserting Into A B-Tree
Add the new keyto the appropriate leaf
node
Split the node into two nodeson the same level, and promote
the median key
Overflow?
Yes No
Example
176
22194
MIN = 1MAX = 2
12
Insert 186 | 17
412 18 | 19 | 22
Excess Entry(problem child)
Contnd.
6, 17, 19
4 122218
Split problem child, andpromote middle key toparent node. Still have excess.
6
4 122218
17
19
Fix excess by repeating the process. Split node and promotemiddle key to new root node.
MIN = 1MAX = 2
Insert In Class Exercise
176
22194
MIN = 1MAX = 2
12
� Insert 5, then insert 7 and 15.
Deleting From A B-Tree
Deletion (cont.)
� Case 1: The key is in a leaf , which has more than the minimum number of keys. If subset[i] has extra entries, then just delete the data
� Delete 21
2, 4 10,13 19, 21, 22
6, 17
2, 4 10, 13 19, 22
6, 17
MIN = 2MAX = 4
Deletion
� Case 2: Key is in a leaf which has just the minimum number of keys. If subset[i-1] has extra entries, then transfer the entry to subset[i]
� Delete 22
2, 4 10, 12, 15 19,22
6, 17
2, 4 10, 12 17, 19
6, 15
MIN = 2MAX = 4
Deletion (cont.)
�Case 3: If subset[i+1] has extra entries, then transfer the entry to subset[i] (Similar to Case 2)
�Delete 13
2, 4 10,13 19, 21, 22
6, 17
2, 4 10, 17 21, 22
6, 19
MIN = 2MAX = 4
Deletion (cont.)
�Case 4: The key is in a leaf and the leaf and its siblings have just the minimum number of keys. Combine subset[i] with subset[i-1]
2, 4 10, 12 19,22
6, 17
2, 4 10, 12, 17, 19
6
Delete 22
Deletion (cont.)� Case 5 : key is in an internal node. Child node that has the successor of the
key is located and if this node has more entries, then the key to be deleted is replaced by the successor and that value in the leaf is deleted.
Delete 95
25, 45
97,100,150
85,95
62
75, 80
90 , 92
50, 54
30,40
15,20
Contd. Case 5
25, 45
100,150
85,97
62
75, 80
90 , 92
50, 54
30,40
15,20