Top Banner
Copyright © 2003-2006 Curt Hill B Trees Disk based tree index
56

Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Dec 31, 2015

Download

Documents

Charles Pitts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

B TreesDisk based tree

index

Page 2: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Introduction

• A BTree is a multiway, tree that usually resides on disk– Most CS tree are binary and in

memory

• The basis of many ISAM or VSAM implementations as well as DB clustered and unclustered indices

• B is for Bayer the person who worked out the original scheme– Not binary!

Page 3: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

History• Bayer and McCreight published in

1972• IBM made the BTree the basis of

its Virtual Storage Access Method shortly there after– Enlarged to replace Indexed

Sequential Access Method later

• Since then this has been the ISAM of choice

Page 4: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Older versions of ISAM• Before BTrees ISAM files existed

– Allowed both random and sequential access

• Had some problems– Could not grow gracefully

• Index tree was fixed

– Had overflow areas– When performance degraded it had to

be rebuilt

Page 5: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Older ISAM Problems

35 38 40

20

3015

2 7 8 9 17 18 20

ISAM file with emptyand overflow block

12 14 15

Page 6: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Building ISAM Files• Build entire sequential sequence• Create each page with some free space• Build the index over the sequential file• The index once built is never modified

– Deleting a key removes it from leaf, but it cannot be removed from the index

– Too many inserts causes use of an overflow area

• Must be rebuilt from scratch– Unavailable during rebuild– The build process could be lengthy

Page 7: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Characteristics:• Always balanced• Leaves are at same level• Do not waste space, all nodes are

at least half full• Insertion and deletion do not cause

the rewriting of the large parts of tree

• Concurrency is well supported with minimal locking of pages

Page 8: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Variations• The standard BTree will be

discussed first– Seldom actually used

• The B+Tree and the B*Tree are the common variants– The modifications for these will also

be discussed

Page 9: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Terminology• The three terms that need to be

understood are pointers, records and pages

• Pointer is a page identifier– The address of a page in any form– Usually compact

• The record is the item to be stored– Always includes key and data– Key may have any form

• The page is the block or page of the file system– Must be larger than the record

Page 10: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

BTree Node• BTrees are a set of nodes of three

types• The root node is the beginning of

the tree– The rest of the tree are descendents

of the root• A leaf node has no descendents• Interior nodes have both ancestors

and descendents

Page 11: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

A tree

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Page 12: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Nodes• Always one more pointer than record• Always has between N and 2N

records– Except root

• N is chosen based upon– Page size– How many records and pointer fit

• The previous tree is a 4-5 tree– 4 records– 5 pointers– N = 2

Page 13: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Examples and Reality• This presentation will show trees

with small Ns: 1 or 2– These diagram nicely in PowerPoint

• Real trees have large Ns 50 – 100• The N determines fan-out

– High fan-out is good– If fan out is 2 then 50% of tree is

eliminated from a search at each level– If fan out is 100 then 99% of tree is

eliminated from a search at each level– High fan out makes a flat tree

Page 14: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Example numbers• Suppose that a BTree has average

fan-out of 50• Suppose that BTree has 1 million

entries• 1 disk access gets root • 3 disk accesses later the obtains

leaf• A sequential search requires an

average of 10000 disk accesses• Even a binary search requires 20

disk accesses

Page 15: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Disk and Memory• Although the BTree will require

fewer disk accesses it will require more comparisons

• In previous example the BTree will do 75 comparisons while the binary tree 20

• The important delay is disk access speed

• In the delay of one disk page retrieval thousands or millions of comparisons could be done

Page 16: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Searching a BTree1. Start at root 2. Is this key in this node?3. Yes – stop you are done4. No - Is this a leaf?5. Yes – this key does not exist - Stop6. No – find the pointer that is between

the two surrounding values7. Fetch this node – Go to step 2.

Page 17: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Look for MZ

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Page 18: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Search root

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Page 19: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Not in root

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Page 20: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Access next node

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Page 21: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Search next node

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Page 22: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Found

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Since this is a leaf must terminate here

Page 23: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Insertion and Deletion• The means by which new data is

inserted and old is deleted is crucial to maintaining a BTree

• These techniques were developed Bayer and their effectiveness caused this form of tree to catch on

• The tree is never reorganized – This was a disadvantage of older ISAMs– Insertion and deletion do all the work

Page 24: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Insertion

1. Find the leaf that should contain the inserted value

2. Insert the record3. Does the node have 2N or fewer

records?4. Yes – Stop5. No - Split the node

1. Make two nodes of N records 1. First N records and last N records

2. Promote the middle item into the ancestor3. Go back to 2

Page 25: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Example Insertion• Suppose a 2-3 tree

– N = 1– Key is a simple integer

• Consider the following insertions:– 20, 40, 30, 10, 15, 35, 7, 26, 18

Page 26: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Example Insertions – First Five

20 40Start with root

Insert 30 20 30 40

Node is over full – splitMiddle item is promotedRest is divided into first and last nodes

30

20 40

Inserting 15 and 35 is painless

30

15 20 35 40

Page 27: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Example Insertion – Next one

Start with root

Insert 7

Node is over full – splitMiddle item is promotedRest is divided into first and last nodes

30

30

7 15 20

35 40

35 40

15 20

15 30

20

7 35 40

Page 28: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Example Insertion – 26 and 18

Inserting 26 - painless

Insert 18 splits nodeSplits rootCreates a new rootTree is one level deeper

15 30

20 267

35 40

20

3015

7

35 40

18 26

Page 29: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Insertion Comments• Only the path between the root

and the node containing the inserted node may be modified

• This did not appear as significant in the example as it actually is when N is large

• Usually most of tree is undisturbed• The root of the BTree that is

currently being accessed is usually in memory

Page 30: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Deletion1. Find the leaf that contains the

value2. Delete the record3. Does the node have N or more

records?4. Yes – Stop5. No – Merge the nodes

1. Remove an item from the ancestor2. Pull into record that is short3. This may reduce the level4. Go back to 2 to delete the ancestor

Page 31: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Deletion Example

Deleting any one merges three into one

30

15 35

Deleting 20 and 40 is painless

30

15 20 35 40

15 30

Page 32: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Splitting/Merging• Relatively expensive

– Wish to avoid if we can

• Before splitting / merging– Look for a neighbor to carry or borrow

an item– Do not split unless both neighbors are

full– Do not merge unless both neighbors

are at minimum

Page 33: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Carrying Example

Inserting 18 without split

Rotate 18 into root

15 30

20 267 35 40

18 30

7 15 20 26 35 40

Page 34: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Borrowing Example

Rotate 9 and 15

35 40

20

3015

7 9 18 22 26

20

30

35 4022 26

9

157

Deleting 18

Page 35: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Utilization• Each node of a BTree except root is

between ½ full and full– Expected utilization is the ¾ full when

the space occupied by root is negligible• Carrying records to adjacent nodes

tends to increase this• Borrowing tends to decrease• Stable trees may also be compacted

– Look for adjacent nodes that could be merged

Page 36: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

BTree Variants: B*Tree• B*Tree

– A BTree with only keys in the interior and root nodes

– Data is all in leaves– Since the key is generally much

smaller than the data this greatly increases fan-out

– Two different kinds of pages• Leaves and root/interior nodes

Page 37: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

B*Tree

35 40

20

3015

7 15 18 20 26 30

Interior node values are duplicated in leaves.A leaf will usually hold fewer items than interior nodes.

Page 38: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

BTree Variants: B+Tree• A B*Tree

– Keys only in ancestral nodes– Data only in leaves

• Connect the leaves into a linked list• Foundation of most ISAMs• Follow the leaves for sequential

access– Not slower than normal heap file

• Search key for random access

Page 39: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Fanout of B+Tree• Suppose that:

– Key is 10 bytes– Data is 79 bytes– Pointer is 4 bytes– Page size is 512

• N = 3 is largest node for a BTree– 3-6 items per node

• N = 62 is the largest node for a B*Tree– 62-124 items per interior/root node– N = 3 for leaf nodes

Page 40: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

B+Tree

29 32

10 20

27 32 40 552 5

0 2

16 18

22 27

3 5

7 10

12 16

17 18

35 40

42 55

62 70

4-5 tree with2 record leaves

Tree pointer

List pointer

19 20

Page 41: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Key Lengths• Many keys are relatively long

– An integer is typically 4 bytes– Character string keys may be much

longer• Names• Product codes

– These are very sparse keys– They also take up too much space in

the tree

Page 42: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Key Compression• The whole key is not always

needed in the tree• Abbreviate the key to shorten it• Lose the ability to determine if the

key is present without going to the leaves

• Gain greater fan-out and flatter tree

Page 43: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Key Compression Savings

• The fan-out may be increased by reducing the size of the key

• Suppose a 19 character key and page id of 4 bytes in a 512 byte page

• This makes for N = 11– A 22-23 tree

• Reduce the key to 4 bytes – N = 32, a 64-65 tree

Page 44: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Creating a B+Tree• Two ways:

– Insert as has been shown– Bulk load

• The normal insertion scheme works well for regular insertions

• Bulk load works best for large number of insertions to create a new B+Tree

Page 45: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Bulk Load• Sort the data• Create the leaves in order and

build the index over them• The index does a normal split

mechanism

Page 46: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Bulk Loading

2 3

Add 2,3,5,7

5 7

3

Add 10, 11, 15 173 7 11

2 3 5 7 15 1710 11

Page 47: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Clustered• All the examples so far are

clustered• The lowest interior node has only

one pointer to a block– Between two keys is a page of several

entries– One key addresses many data items

• B+Trees may also handle an unclustered index

Page 48: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Unclustered BTree Index• Essentially the same as a regular

B+Tree with several exceptions• Uses another trees leaves• No data at all• In last interior level each key

points at a separate leaf page• The leaves are in a completely

different order than the interior nodes

Page 49: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Example Indices

Cal Kline

Abel Bart Cal

Dan Kline

More

Robb Tee

Lee Mic More

Mule Robb

Sand Tax Tee

Tone Tu Zone

303 307 352 412

601 672 720201 285 295 301 439 450 472 513 600

301 412 600

Page 50: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Keys• Primary keys get a clustered index• The unique attribute usually forces

an unclustered index• An index may be attached to a set

of fields that are not unique

Page 51: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Nodes in Memory• A normal BTree has a list of keys and

pointers within a node– Thus no other organizing information is

stored in the disk node

• Depending on the size the list may seached using a sequential or binary search

• Once it is in memory that sequential search may not be the best thing

• Often we wish to store the top levels of a BTree in memory for speed

Page 52: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Red-Black Trees• A binary tree superimposed upon a

BTree structure• Each pointer must show whether

the target is in the same BTree node or a different one

• The red arrows are within the node, the black arrows span nodes– Hence the name red-black tree

Page 53: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Consider this tree

10 20

30

35 4022 26

14

15 11 12

3 6

7 91 2 4 5

Page 54: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Superimposed Red Black Tree

10 20

30

35 4022 26

14

15 11 12

3 6

7 91 2 4 5

Page 55: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Red-Black Trees Again• However the tree may be searched

like any other binary search tree• Insertions and deletions are made

somewhat more complicated because of having to conform to both patterns:– Binary search tree– BTree

Page 56: Copyright © 2003-2006 Curt Hill B Trees Disk based tree index.

Copyright © 2003-2006 Curt Hill

Why are BTrees Popular?• Self organizing• Any type of key• Usually flat trees

– Small number of accesses

• Average 75% utilization• Can be created easily