Top Banner
Spring 2006 Copyright (c) All rights reserved Le onard Wesley 1 B-Trees CMPE126 Data Structures CMPE126 Data Structures
42

Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Dec 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 1

B-Trees

CMPE126 Data StructuresCMPE126 Data Structures

Page 2: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 2

Why B-Trees?

Trees studied so far are for storing data in memory

B-Trees are better suited for storing data in memory AND on secondary storage.

Better suited for balancing data than some other three ADTs.

Can store multiple keys with the same value, unlike some other trees, such as AVL trees.

Page 3: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 3

The Problem With Unbalanced Trees

1

2

3

4

5

The levels are sparselyfilled resulting in deeppaths. This defeats thepurpose of binary trees

Page 4: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 4

Possible Solutions To Unbalanced Trees Periodically balance the tree

Don’t let a tree get too unbalanced when inserting or deleting AVL Trees: Sometimes called HB[1] trees. Invented by

Adel’son-Vel’skii and Landis ~early 1960s. (an in-memory solution … not ideally suited secondary storage)

B-Trees: Proposed by R. Bayer & E.M. Creight (see pg. 542 Main & Savitch for ref.)

Page 5: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 5

What Is A B-Tree?

It is a type of “multiway” tree.

It is NOT a binary search tree, nor is it a binary tree.

It provides a fast way to index into a multi-level set of nodes.

Each node in the B-Tree contains a sorted array of key values.

Page 6: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 6

Motivation For Multiway Tree

Secondary storage (e.g., disks) is typically divided into equal-sized blocks (e.g., 512, 1024, …, 4096, …)

The basic I/O operation reads and writes blocks rather than single bytes at a time between secondary storage and memory.

Goal is to devise a multiway search tree that will minimize file access by exploiting disk reads.

Each access to secondary storage is approximately equal to 250K instructions … depending on the speed of the CPU

Page 7: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 7

ISAM

ISAM = Indexed Sequential Access Method.

Page 8: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 8

ISAM: The Idea

Disk

Platter

Block512, 1024, …bytes

Track

Page 9: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 9

ISAM: Index & Keys

Block Key

Data

• All data in the block will have keys ≤ the block key, or have keys ≥ the block key. Pick one inequality and stick with it.

A Block on a track.

Block #

Page 10: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 10

ISAM: Block Index

Block Index

Block # Key

This index could be stored in memory

0 G

1 K

2 N

Page 11: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 11

ISAM: Disk Index

• • •

Disk # Key

This index could be stored in memory also

0 G

1 V

2 X

Disk 0

Disk n

Page 12: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 12

ISAM: Insertion/Deletion

Insertion:Might involve moving data across blocksCan leave extra space when inserting into a block

Deletion:Might involve contracting data across blocksNeed not contract every time, i.e., leave some

space for possible future expansion

Page 13: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 13

Multiway Search Tree (order m)

A generalization of a binary search trees.

Each node has at most m children.If k<=m is the number of children, then the node has

exactly k-1 keys.

The tree is ordered.

Page 14: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 14

Multiway Search Tree (cont.)

keys < k1 k2 < keys < k3 k5 < keys

k1 k2 k3 k4 k5Nodes ina multiwaytree

Page 15: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 15

Definition Of A B-Tree

A B-Tree of order m is a m-way tree such that

All leaves are on the same level

All internal nodes except the root node are constrained to have at most m non-empty children and at least m/2 non-empty children.

The root node has at most m non-empty children

Page 16: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 16

Three Important Properties Of B-Trees

All nodes in the B-Tree are at least half-full (root node is an exception at times)

The B-tree is always balanced. That is, an identical number of nodes must be read into memory in order to locate all keys at any given level in the tree.

A well organized B-Tree will have just a small number of levels relative to the number of nodes.

Page 17: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 17

Where are B-Tree Used?

B-Trees are commonly found in database and file systems.

B-Trees allow logarithmic time insertions and deletions.

They generally grow from the bottom upwards as elements are inserted, whereas most binary trees grow downward.

Page 18: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 18

The Six Rules Governing B-Trees

R1: A B-Tree might be empty, if not, then each node has some specified MINIMUM number of entries in each node.

R2: The MAXIMUM number of entries is twice the MINIMUM.

Page 19: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 19

The Six Rules Governing B-Trees (cont)

R3: The entries of each B-Tree node are stored in a partially filled array, sorted from the smallest entry (at index 0) to the largest entry (at the final position of the array).

h k k* n . . . .

0 n-1

The data in such an array can be stored in a blockon a disk

B-Tree node

* B-Trees cansupport duplicatekeys

Page 20: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 20

The Six Rules Governing B-Trees (cont)

R4: The number of subtrees below a non-leaf node is always one more than the number of entries in the node.

45 55 67 824 entries in a non-leaf node

Keys < 45

Keys > 45& < 55

Keys > 55& < 67

Keys > 67& < 82

Keys > 82

5 subtrees

subtree 0

subtree 1

subtree 2

subtree 3

subtree 4

0 1 2 3

Page 21: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 21

The Six Rules Governing B-Trees (cont)

R5: For any non-leaf node:An entry at index i is greater than all the

entries in subtree i of the node, andAn entry at index i is less than all the entries

at entry i+1 of the node.

R6: Every leaf node in a B-Tree has the same depth (i.e., at the same level)

Page 22: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 22

Example B-Tree

MIN = 1MAX = 230 80

50 60

35 40

20 90

9572 82 85552510

Page 23: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 23

Searching For A Target In B-Trees

Start with root node and search for target in the array at that node. If found, then done and return success.

If the target is not in the root and there are no children, then also done, but return failure.

If the target is not in the root node, and there are children, then if the target exists, then it can only be in one subtree.

Compare the target with the listed keys and traverse first subtree i for which target is < key_array[i] … while search key_array from left to right … up to data_count.

Repeat the process at the new root node

Page 24: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 24

Inserting Into A B-Tree

Add the new keyto the appropriate leaf

node

Split the node into two nodeson the same level, and promote

the median key

Overflow?

Yes No

Page 25: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 25

Loose Insertion (pg. 551 Maini & Savitch, one of several ways)

6 17

19 224

MIN = 1MAX = 2

12

Insert 186 | 17

412 18 | 19 | 22

Excess Entry(problem child)

Page 26: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 26

Fixing A Loose Insertion

6, 17, 19

4 122218

Split problem child, andpromote middle key toparent node. Still have excess.

6

4 122218

17

19

Fix excess by repeating the process. Split node and promotemiddle key to new root node.

MIN = 1MAX = 2

Page 27: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 27

Pseudo Code For Loose Insert

1. Make a local variable, i, equal to the first index such that data[i] is not less than the new entry to insert. If there is no such index, then set i equal to data_count, indicating that all of the entries are less than the target.

2. If (we found the new entry at data[i])a) Return false with no further work (since the new entry is already

in the tree)else if (the root has no children)b) Add the new entry to the root at data[i]. The original entries at

data[i] and afterwards must be shifted right to make room for the new entry. Return to indicate that we added the entry.

elsec) Save the value from this recursive call:

subset[i]->loose_insert(entry); Then check whether the root of subset[i] now has an excess entry; if so, then fix that problem. Return the saved value from the recursive call.

Page 28: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 28

Insert In Class Exercise

6 17

19 224

MIN = 1MAX = 2

12

Insert 5, then insert 7.

Page 29: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 29

Deleting From A B-Tree

Page 30: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 30

Deleting From A B-Tree Example #1

6, 17

4 1219, 22

Delete 17

6

4 1219, 22

Violates # subtrees = # keys +1 B-Tree Rule 4

Min = 1Max = 2

Page 31: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 31

Solution To Example #1

6, 19

4 12 22

Min = 1Max = 2

Page 32: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 32

Deleting From A B-Tree Example #2

6, 17

2, 4

10, 12 19, 22 Delete 22

10, 12 19Violates # keys !< MIN B-Tree Property

Min = 2Max = 4

6, 17

2, 4

Page 33: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 33

Solution #1 For Example #2

2, 4 10, 12 19

Min = 2Max = 4

6, 17

2, 4 10, 12, 17, 19

6

Case 3 Solution: combine subset [i] with subset[i-1]If excess entries in siblings are not available

pg. 561 Main & Savitch

Page 34: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 34

Solution #2 To Fix A Shortage

Case 1: Transfer an extra entry from subset[i-1] to subset[i] pg 560 Main & Savitch

2, 4 10, 12, 15 19

6, 17

2, 4 10, 12 17, 19

6, 15

Min = 2Max = 4

Page 35: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 35

Solution #3 To Fix A Shortage

Case 2: Transfer an extra entry from subset[i+1] Pg 561 Main & Savitch

2, 4 10 19, 21, 22

6, 17

2, 4 10, 17 21, 22

6, 19

Page 36: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 36

Deleting From A B-Tree (Loose Erase)

1. Make a local variable, i, = first index such that data[i] is !< target to delete. If there is no such index, then set i = to data_count, indicating that all of the entries are less than the target.

2. Deal with one of the following four possibilities:a. Root has no children, and we did not find the target (i.e., noting to do)

b. Root has no children, and we found the target. Just remove target.

c. Root has children, did not find target in root. Make recursive call to search subset[i].

d. Root has children, found target in root. Remove largest from subset[i], insert into data[i].

Elaborate on 2c and 2d on following slides …

Page 37: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 37

Delete From B-Tree: Elaborate 2c

Target not found in root node, but target might be in subset[i]. Make recursive callsubset[i]->loose_erase(target)

This will remove the target from subset[i] if it is in subset[i]. If so, then subset[i] might have < MIN entries. If so, then it needs to be fixed. subset[i]->fix_shortage(size_t i);

Will discuss later

Page 38: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 38

Delete From B-Tree: Elaborate 2d

Target is found in root node, but cannot be remove because there are children.

subset[i]->loose_erase(target)

Go to subset[i] and remove the largest item in the subset. Create a copy of this largest item and insert it in data[i] (which contains the target) In effect this removes the target. However, removing the largest can cause a shortage. If so, call

subset[i]->fix_shortage(i);

Will discuss NOW!!

Page 39: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 39

Fix Shortage

Case 1: If subset[i-1] has extra entries, then transfer the entry to subset[i] (pg 560 Main & Savitch) Transfer data[i-1] (i.e., 17) down to the front of subset[i]->data Shift over as necessary & update data count Transfer the final item of subset[i-1] (i.e., 15) up to replace data[i-1]

and update data_count If subset[i-1] has children, transfer the final child of subset[i-1] over

to the front of subset[i] … update data_count

2, 4 10, 12, 15 19

6, 17

2, 4 10, 12 17, 19

6, 15

Page 40: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 40

Fix Shortage (cont.)

Case 2: If subset[i+1] has extra entries, then transfer the entry to subset[i] (pg 561 Main & Savitch) Similar to Case 1

2, 4 10 19, 21, 22

6, 17

2, 4 10, 17 21, 22

6, 19

Page 41: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 41

Fix Shortage (cont.)

Case 3: Combine subset[i] with subset[i-1] (pg 561 Main & Savitch) If subset[i-1] is present (i.e., i > 0) but subset[i-1] only has the minimum

# items/keys (i.e., no excess keys/items). Transfer data[i-1] down from the end of subset[i-1]->data …(see a pg

562) Transfer all of the items and children from subset[i] to the end of

subset[i-1] … (see b pg 562) Delete the node subset[1] and shift subset[i+1], subset[i+2], and so on

left… (see c pg 562)

2, 4 10, 12 19

6, 17

2, 4 10, 12, 17, 19

6 Deleted 22

Page 42: Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.

Spring 2006 Copyright (c) All rights reserved Leonard Wesley 42

In Class Delete Example #2

Go through Loose Erase Section In Main & Savitch

pg. 558.