1 B trees • Nodes have more than 2 children • Each internal node has between k and 2k children and between k-1 and 2k-1 keys • A leaf has between k-1 and 2k-1 keys • The root has at least 2 children • All leaves are at the same distance from the root
44
Embed
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
B trees
• Nodes have more than 2 children• Each internal node has between k
and 2k children and between k-1 and 2k-1 keys
• A leaf has between k-1 and 2k-1 keys
• The root has at least 2 children• All leaves are at the same distance
from the root
2
2-4 tree and General k
• k=2• Each node has 2,3,or 4 children• WHAT IS BETTER: k =2 or k >> 2??• Depth?
Large k better
• But what about degree? Small k better
• Overall: nk klog
3
A 4-node
10 30 35
key < 10
10 ≤ key < 30
30 ≤ key < 35
35 ≤ key
4
B vs. B+
• In a B tree items are in every node
• In B+ tree items are at the leaves; internal nodes have keys to direct the search
• The leaves are (possibly) also maintained in a linked list to allow fast sequential access
5
A 2-4+ tree
15 30
10
10
1 3 30 40 5016 17
4 7 9
5 7 9
6
The height
• The root has at least 2 children• At level 2 we have at least 2k nodes• At level 3 we have at least 2k2 nodes• At level h we have at least 2kh-1 nodes
1 l2 oghkn h nk
7
Red-Black Trees
• n = 230 = 109 (approx).• 30 <= height <= 60.• When the red-black tree resides on a
disk, up to 60 disk access are made for a search.
• Disk access takes about 5 millisecond (10-4 sec)
• Memory access takes about 100 nano (10-7 sec)
8
B-trees
• B-trees are used when the tree resides in secondary storage.
• k is picked according to the size of a disk block
• Since the height is smaller we do less I/O, we get more in each single access
9
B-Trees
• Large degree B-trees are used to represent very large dictionaries that reside on disk.
• Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties.
10
Node’s structure
• ai is a pointer to a subtree.
• pi is a key
j a0 p1 a1 p2 a2 … pj aj
Can search linearly each node.total time ≈ kh ≈ klogkn time
Can maintain a little red-black tree or an array in each node so search takes ≈ log2k h ≈ log2n
k ≤ j ≤ 2k
11
Insert
15 30
14
14
1 3 30 40 5016 175 9
5 9
Insert(2,T).
12
Insert
15 30
14
14
30 40 5016 175 9
5 9
Insert(2,T).
1 2 3
13
Insert
15 30
14
14
30 40 5016 175 9
5 9
Insert(4,T).
1 2 3
14
Insert
15 30
14
14
30 40 5016 175 9
5 9
Insert(4,T).
1 2 3 4
15
Split
15 30
14
14
30 40 5016 175 9
5 9
Insert(4,T).
1 2 3 4
16
Split
15 30
14
14
30 40 5016 175 9
5 9
Insert(4,T).
1 2 3 4
17
Split
15 30
14
14
30 40 5016 175 9
Insert(4,T).
1 2 3 4
3 5 9
18
15 30
14
14
30 40 5016 175 9
Insert(6,T).
1 2 3 4
3 5 9
19
15 30
14
14
30 40 5016 179
Insert(6,T).
1 2 3 4
3 5 9
5 6
20
15 30
14
14
30 40 5016 179
Insert(7,T).
1 2 3 4
3 5 9
5 6
21
15 30
14
14
30 40 5016 179
Insert(7,T).
1 2 3 4
3 5 9
5 6 7
22
15 30
14
14
30 40 5016 179
Insert(8,T).
1 2 3 4
3 5 9
5 6 7
23
15 30
14
14
30 40 5016 179
Insert(8,T).
1 2 3 4
3 5 9
5 6 7 8
24
Split
15 30
14
14
30 40 5016 179
Insert(8,T).
1 2 3 4
3 5 9
5 6 7 8
25
Split
15 30
14
14
30 40 5016 179
Insert(8,T).
1 2 3 4 5 6 7 8
3 5 7 9
26
Split
15 30
14
14
30 40 5016 179
Insert(8,T).
1 2 3 4 5 6 7 8
3 5 7 9
27
Split
15 30
14 30 40 5016 179
Insert(8,T).
1 2 3 4 5 6 7 8
7 9
5 14
3
28
Insert -- definition
Add the new key in its position. Say in a node v.
(*) If v has 4 keys split v into a 2-node u, a 1-node w, and a key k, (or two 2-nodes and a key if v is a leaf)
If v was the root then create a new root r parent of u and w and stop.
Replace v by u and w as children of p(v).
Repeat (*) for v := p(v).
29
Split
(2k) a0 p1 a1 p2 a2 … p2k a2k
(k-1) a0 p1 a1 p2 a2 … pk-1 ak-1
(k) ak pk+1 ak+1 … p2k a2k
• pk is inserted in parent.
30
Split
(2k) a0 p1 a1 p2 a2 … p2k a2k
(k-1) a0 p1 a1 p2 a2 … pk-1 ak-1
(k) ak pk ak+1 … p2k a2k
• pk is inserted in parent.
Takes O(k) time
32
Insert (summary)
• O(logn) time and at most O(logkn) each split takes O(k) time
• Can show that the amortized # of splits is O(1) per insert
33
Delete
15 30
14 30 40 5016 179
delete(14,T).
1 2 3 4 5 6 7 8
7 9
5 14
3
34
Delete
30 40 5016 179
delete(14,T).
1 2 3 4 5 6 7 8
7 9
5 14
303
35
Delete
30 40 5016 179
delete(17,T).
1 2 3 4 5 6 7 8
7 9
5 14
303
36
Delete
30 40 509
delete(17,T).
1 2 3 4 5 6 7 8
7 9
5 14
30
16
3
37
Delete
30 40 509
delete(16,T).
1 2 3 4 5 6 7 8
7 9
5 14
30
16
3
38
Delete
30 40 509
delete(16,T).
1 2 3 4 5 6 7 8
7 9
5 14
303
39
Borrow
30 40 509
delete(16,T).
1 2 3 4 5 6 7 8
7 9
5 14
303
40
Borrow
30 40 509
delete(16,T).
1 2 3 4 5 6 7 8
5 9
303 7
41
30 40 509
delete(9,T).
1 2 3 4 5 6 7 8
5 9
303 7
42
30 40 50
delete(9,T).
1 2 3 4 5 6 7 8
5 9
303 7
43
30 40 50
delete(9,T).
1 2 3 4 5 6 7 8
5 9
3
Fusion
7 30
44
30 40 50
delete(9,T).
1 2 3 4 5 6 7 8
3
Fusion
7 30
5
45
Delete -- definition
Remove the key.
If it is the only key in the node remove the node, and let v be the parent that loses a child, otherwise return
(*) If v has one child, and v is the root discard v.
Otherwise (v is not a root), if v has a sibling w of degree 3 or 4, borrow a child from w to v and terminate.
Otherwise, fuse v with its sibling to a degree 3 node and repeat (*) with the parent of v.