Spring 2006 Copyright (c) All rights reserved Le onard Wesley 1 B-Trees CMPE126 Data Structures CMPE126 Data Structures
Dec 27, 2015
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 1
B-Trees
CMPE126 Data StructuresCMPE126 Data Structures
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 2
Why B-Trees?
Trees studied so far are for storing data in memory
B-Trees are better suited for storing data in memory AND on secondary storage.
Better suited for balancing data than some other three ADTs.
Can store multiple keys with the same value, unlike some other trees, such as AVL trees.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 3
The Problem With Unbalanced Trees
1
2
3
4
5
The levels are sparselyfilled resulting in deeppaths. This defeats thepurpose of binary trees
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 4
Possible Solutions To Unbalanced Trees Periodically balance the tree
Don’t let a tree get too unbalanced when inserting or deleting AVL Trees: Sometimes called HB[1] trees. Invented by
Adel’son-Vel’skii and Landis ~early 1960s. (an in-memory solution … not ideally suited secondary storage)
B-Trees: Proposed by R. Bayer & E.M. Creight (see pg. 542 Main & Savitch for ref.)
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 5
What Is A B-Tree?
It is a type of “multiway” tree.
It is NOT a binary search tree, nor is it a binary tree.
It provides a fast way to index into a multi-level set of nodes.
Each node in the B-Tree contains a sorted array of key values.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 6
Motivation For Multiway Tree
Secondary storage (e.g., disks) is typically divided into equal-sized blocks (e.g., 512, 1024, …, 4096, …)
The basic I/O operation reads and writes blocks rather than single bytes at a time between secondary storage and memory.
Goal is to devise a multiway search tree that will minimize file access by exploiting disk reads.
Each access to secondary storage is approximately equal to 250K instructions … depending on the speed of the CPU
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 7
ISAM
ISAM = Indexed Sequential Access Method.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 8
ISAM: The Idea
Disk
Platter
Block512, 1024, …bytes
Track
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 9
ISAM: Index & Keys
Block Key
Data
• All data in the block will have keys ≤ the block key, or have keys ≥ the block key. Pick one inequality and stick with it.
A Block on a track.
Block #
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 10
ISAM: Block Index
Block Index
Block # Key
This index could be stored in memory
0 G
1 K
2 N
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 11
ISAM: Disk Index
• • •
Disk # Key
This index could be stored in memory also
0 G
1 V
2 X
Disk 0
Disk n
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 12
ISAM: Insertion/Deletion
Insertion:Might involve moving data across blocksCan leave extra space when inserting into a block
Deletion:Might involve contracting data across blocksNeed not contract every time, i.e., leave some
space for possible future expansion
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 13
Multiway Search Tree (order m)
A generalization of a binary search trees.
Each node has at most m children.If k<=m is the number of children, then the node has
exactly k-1 keys.
The tree is ordered.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 14
Multiway Search Tree (cont.)
keys < k1 k2 < keys < k3 k5 < keys
k1 k2 k3 k4 k5Nodes ina multiwaytree
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 15
Definition Of A B-Tree
A B-Tree of order m is a m-way tree such that
All leaves are on the same level
All internal nodes except the root node are constrained to have at most m non-empty children and at least m/2 non-empty children.
The root node has at most m non-empty children
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 16
Three Important Properties Of B-Trees
All nodes in the B-Tree are at least half-full (root node is an exception at times)
The B-tree is always balanced. That is, an identical number of nodes must be read into memory in order to locate all keys at any given level in the tree.
A well organized B-Tree will have just a small number of levels relative to the number of nodes.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 17
Where are B-Tree Used?
B-Trees are commonly found in database and file systems.
B-Trees allow logarithmic time insertions and deletions.
They generally grow from the bottom upwards as elements are inserted, whereas most binary trees grow downward.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 18
The Six Rules Governing B-Trees
R1: A B-Tree might be empty, if not, then each node has some specified MINIMUM number of entries in each node.
R2: The MAXIMUM number of entries is twice the MINIMUM.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 19
The Six Rules Governing B-Trees (cont)
R3: The entries of each B-Tree node are stored in a partially filled array, sorted from the smallest entry (at index 0) to the largest entry (at the final position of the array).
h k k* n . . . .
0 n-1
The data in such an array can be stored in a blockon a disk
B-Tree node
* B-Trees cansupport duplicatekeys
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 20
The Six Rules Governing B-Trees (cont)
R4: The number of subtrees below a non-leaf node is always one more than the number of entries in the node.
45 55 67 824 entries in a non-leaf node
Keys < 45
Keys > 45& < 55
Keys > 55& < 67
Keys > 67& < 82
Keys > 82
5 subtrees
subtree 0
subtree 1
subtree 2
subtree 3
subtree 4
0 1 2 3
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 21
The Six Rules Governing B-Trees (cont)
R5: For any non-leaf node:An entry at index i is greater than all the
entries in subtree i of the node, andAn entry at index i is less than all the entries
at entry i+1 of the node.
R6: Every leaf node in a B-Tree has the same depth (i.e., at the same level)
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 22
Example B-Tree
MIN = 1MAX = 230 80
50 60
35 40
20 90
9572 82 85552510
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 23
Searching For A Target In B-Trees
Start with root node and search for target in the array at that node. If found, then done and return success.
If the target is not in the root and there are no children, then also done, but return failure.
If the target is not in the root node, and there are children, then if the target exists, then it can only be in one subtree.
Compare the target with the listed keys and traverse first subtree i for which target is < key_array[i] … while search key_array from left to right … up to data_count.
Repeat the process at the new root node
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 24
Inserting Into A B-Tree
Add the new keyto the appropriate leaf
node
Split the node into two nodeson the same level, and promote
the median key
Overflow?
Yes No
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 25
Loose Insertion (pg. 551 Maini & Savitch, one of several ways)
6 17
19 224
MIN = 1MAX = 2
12
Insert 186 | 17
412 18 | 19 | 22
Excess Entry(problem child)
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 26
Fixing A Loose Insertion
6, 17, 19
4 122218
Split problem child, andpromote middle key toparent node. Still have excess.
6
4 122218
17
19
Fix excess by repeating the process. Split node and promotemiddle key to new root node.
MIN = 1MAX = 2
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 27
Pseudo Code For Loose Insert
1. Make a local variable, i, equal to the first index such that data[i] is not less than the new entry to insert. If there is no such index, then set i equal to data_count, indicating that all of the entries are less than the target.
2. If (we found the new entry at data[i])a) Return false with no further work (since the new entry is already
in the tree)else if (the root has no children)b) Add the new entry to the root at data[i]. The original entries at
data[i] and afterwards must be shifted right to make room for the new entry. Return to indicate that we added the entry.
elsec) Save the value from this recursive call:
subset[i]->loose_insert(entry); Then check whether the root of subset[i] now has an excess entry; if so, then fix that problem. Return the saved value from the recursive call.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 28
Insert In Class Exercise
6 17
19 224
MIN = 1MAX = 2
12
Insert 5, then insert 7.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 30
Deleting From A B-Tree Example #1
6, 17
4 1219, 22
Delete 17
6
4 1219, 22
Violates # subtrees = # keys +1 B-Tree Rule 4
Min = 1Max = 2
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 31
Solution To Example #1
6, 19
4 12 22
Min = 1Max = 2
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 32
Deleting From A B-Tree Example #2
6, 17
2, 4
10, 12 19, 22 Delete 22
10, 12 19Violates # keys !< MIN B-Tree Property
Min = 2Max = 4
6, 17
2, 4
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 33
Solution #1 For Example #2
2, 4 10, 12 19
Min = 2Max = 4
6, 17
2, 4 10, 12, 17, 19
6
Case 3 Solution: combine subset [i] with subset[i-1]If excess entries in siblings are not available
pg. 561 Main & Savitch
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 34
Solution #2 To Fix A Shortage
Case 1: Transfer an extra entry from subset[i-1] to subset[i] pg 560 Main & Savitch
2, 4 10, 12, 15 19
6, 17
2, 4 10, 12 17, 19
6, 15
Min = 2Max = 4
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 35
Solution #3 To Fix A Shortage
Case 2: Transfer an extra entry from subset[i+1] Pg 561 Main & Savitch
2, 4 10 19, 21, 22
6, 17
2, 4 10, 17 21, 22
6, 19
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 36
Deleting From A B-Tree (Loose Erase)
1. Make a local variable, i, = first index such that data[i] is !< target to delete. If there is no such index, then set i = to data_count, indicating that all of the entries are less than the target.
2. Deal with one of the following four possibilities:a. Root has no children, and we did not find the target (i.e., noting to do)
b. Root has no children, and we found the target. Just remove target.
c. Root has children, did not find target in root. Make recursive call to search subset[i].
d. Root has children, found target in root. Remove largest from subset[i], insert into data[i].
Elaborate on 2c and 2d on following slides …
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 37
Delete From B-Tree: Elaborate 2c
Target not found in root node, but target might be in subset[i]. Make recursive callsubset[i]->loose_erase(target)
This will remove the target from subset[i] if it is in subset[i]. If so, then subset[i] might have < MIN entries. If so, then it needs to be fixed. subset[i]->fix_shortage(size_t i);
Will discuss later
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 38
Delete From B-Tree: Elaborate 2d
Target is found in root node, but cannot be remove because there are children.
subset[i]->loose_erase(target)
Go to subset[i] and remove the largest item in the subset. Create a copy of this largest item and insert it in data[i] (which contains the target) In effect this removes the target. However, removing the largest can cause a shortage. If so, call
subset[i]->fix_shortage(i);
Will discuss NOW!!
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 39
Fix Shortage
Case 1: If subset[i-1] has extra entries, then transfer the entry to subset[i] (pg 560 Main & Savitch) Transfer data[i-1] (i.e., 17) down to the front of subset[i]->data Shift over as necessary & update data count Transfer the final item of subset[i-1] (i.e., 15) up to replace data[i-1]
and update data_count If subset[i-1] has children, transfer the final child of subset[i-1] over
to the front of subset[i] … update data_count
2, 4 10, 12, 15 19
6, 17
2, 4 10, 12 17, 19
6, 15
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 40
Fix Shortage (cont.)
Case 2: If subset[i+1] has extra entries, then transfer the entry to subset[i] (pg 561 Main & Savitch) Similar to Case 1
2, 4 10 19, 21, 22
6, 17
2, 4 10, 17 21, 22
6, 19
Spring 2006 Copyright (c) All rights reserved Leonard Wesley 41
Fix Shortage (cont.)
Case 3: Combine subset[i] with subset[i-1] (pg 561 Main & Savitch) If subset[i-1] is present (i.e., i > 0) but subset[i-1] only has the minimum
# items/keys (i.e., no excess keys/items). Transfer data[i-1] down from the end of subset[i-1]->data …(see a pg
562) Transfer all of the items and children from subset[i] to the end of
subset[i-1] … (see b pg 562) Delete the node subset[1] and shift subset[i+1], subset[i+2], and so on
left… (see c pg 562)
2, 4 10, 12 19
6, 17
2, 4 10, 12, 17, 19
6 Deleted 22