Top Banner
B-Trees with Functional and imperative implementation Liu Xinyu * September 6, 2010 Abstract B-Tree is introduced by “Introduction to Algorithms” book[1] as one of the advanced data structures. It is important to the modern file sys- tems, some of them are implemented based on B+ tree, which is extended from B-tree. It is also widely used in many database systems. This post provides some implementation of B-trees both in imperative way as described in [1] and in functional way with a kind of modify-and-fix ap- proach. There are multiple programming languages used, including C++, Haskell, Python and Scheme/Lisp. There may be mistakes in the post, please feel free to point out. This post is generated by L A T E X2ε , and provided with GNU FDL(GNU Free Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.html for detail. Keywords: B-Trees 1 Introduction In “Introduction to Algorithm” book, B-tree is introduced with the the problem of how to access a large block of data on magnetic disks or secondary storage devices[1]. B-tree is commonly used in databases and file-systems. It is also helpful to understand B-tree as a generalization of balanced binary search tree[2]. * Liu Xinyu Email: [email protected] 1
55
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BTree

B-Trees with Functional and imperative

implementation

Liu Xinyu ∗

September 6, 2010

Abstract

B-Tree is introduced by “Introduction to Algorithms” book[1] as oneof the advanced data structures. It is important to the modern file sys-tems, some of them are implemented based on B+ tree, which is extendedfrom B-tree. It is also widely used in many database systems. Thispost provides some implementation of B-trees both in imperative way asdescribed in [1] and in functional way with a kind of modify-and-fix ap-proach. There are multiple programming languages used, including C++,Haskell, Python and Scheme/Lisp.

There may be mistakes in the post, please feel free to point out.This post is generated by LATEX2ε, and provided with GNU FDL(GNU

Free Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.htmlfor detail.

Keywords: B-Trees

1 Introduction

In “Introduction to Algorithm” book, B-tree is introduced with the the problemof how to access a large block of data on magnetic disks or secondary storagedevices[1]. B-tree is commonly used in databases and file-systems.

It is also helpful to understand B-tree as a generalization of balanced binarysearch tree[2].

∗Liu XinyuEmail: [email protected]

1

Page 2: BTree

Refer to the Figure 1, It is easy to found the difference and similarity ofB-tree regarding to binary search tree.

M

C G P T W

A B D E F H I J K N O Q R S U V X Y Z

Figure 1: An example of B-Tree

Let’s review the definition of binary search tree [3].A binary search tree is

• either an empty node;

• or a node contains 3 parts, a value, a left child which is a binary searchtree and a right child which is also a binary search tree.

An it satisfies the constraint that.

• all the values in left child tree is less than the value of of this node;

• the value of this node is less than any values in its right child tree.

The constraint can be represented as the following. for any node n, it satisfiesthe below equation.

∀x ∈ LEFT (n),∀y ∈ RIGHT (n) ⇒ V ALUE(x) < V ALUE(n) < V ALUE(y)(1)

If we extend this definition to allow multiple keys and children, we get thebelow definition.

A B-tree is

• either an empty node;

• or a node contains n keys, and n+1 children, each child is also a B-Tree, wedenote these keys and children as key1, key2, ..., keyn and c1, c2, ..., cn, cn+1.

Figure 2 illustrates a B-Tree node.The keys and children in a node satisfy the following order constraints.

• Keys are stored in non-decreasing order. that is key1 ≤ key2 ≤ ... ≤ keyn;

• for each keyi, all values stored in child ci are no bigger than keyi, whileall values stored in child ci+1 are no less than keyi.

2

Page 3: BTree

C[1] K[1] C[2] K[2] ... C[n] K[n] C[n+1]

Figure 2: A B-Tree node

The constraints can be represented as in equation refeq:btree-order as well.

∀xi ∈ ci, i = 0, ..., n,⇒ x1 ≤ key1 ≤ x2 ≤ key2 ≤ ... ≤ xn ≤ keyn ≤ xn+1 (2)

Finally, if we added some constraints to make the tree balanced, we get thecomplete definition of B-tree.

• All leaves have the same depth;

• We define an integral number, t, as the minimum degree of a B-tree;

– each node can have at most 2t− 1 keys;

– each node can have at least t− 1 keys, except the root;

In this post, I’ll first introduce How to generate B-trees by insertion algo-rithm. Two different methods will be explained. One method is discussed in[1] book, the other is a kind of modify-fix approach which quite similar to thealgorithm Okasaki used in red-black tree[4]. This method is also discussed inwikipedia[2]. After that, how to delete element from B-tree is explained. As thelast part, algorithm for searching in B-tree is also provided.

This article provides example implementation in C, C++, Haskell, Python,and Scheme/Lisp languages.

All source code can be downloaded in appendix 7, please refer to appendixfor detailed information about build and run.

2 Definition

Similar as Binary search tree, B-tree can be defined recursively. Because thereare multiple of keys and children, a collection container can be used to storethem.

Definition of B-tree in C++

ISO C++ support using const integral number as template parameter. Thisfeature can used to define B-tree with different minimum degree as differenttype.

// t : minimum degree o f B−t r e etemplate<class K, int t>struct BTree{

3

Page 4: BTree

typedef K key type ;typedef std : : vector<K> Keys ;typedef std : : vector<BTree∗> Chi ldren ;

BTree ( ){}

˜BTree ( ){for (typename Chi ldren : : i t e r a t o r i t=ch i l d r en . begin ( ) ;

i t != ch i l d r en . end ( ) ; ++i t )delete (∗ i t ) ;

}

bool f u l l ( ){ return keys . s i z e ( ) == 2∗ t−1; }

bool l e a f ( ){return ch i l d r en . empty ( ) ;

}

Keys keys ;Chi ldren ch i l d r en ;

} ;

In order to support random access to keys and children, the inner datastructure uses STL vector. The node will recursively release all its children.and a two simple auxiliary member functions “full” and “leaf” are provided totest if a node is full or is a leaf node.

Definition of B-tree in Python

If the minimum degree is 2, the B-tree is commonly called as 2-3-4 tree. Forillustration purpose, I set 2-3-4 tree as default.

TREE 2 3 4 = 2 #by de f au l t , c r ea t e 2−3−4 t r e e

class BTreeNode :def i n i t ( s e l f , t=TREE 2 3 4 , l e a f=True ) :

s e l f . l e a f = l e a fs e l f . t = ts e l f . keys = [ ] #s e l f . data = . . .s e l f . c h i l d r en = [ ]

It’s quite OK for B-tree not only store keys, but also store satellite data.However, satellite data is omitted in this post.

Also there are some auxiliary member functions defined

class BTreeNode :#. . .def i s f u l l ( s e l f ) :

4

Page 5: BTree

return l en ( s e l f . keys ) == 2∗ s e l f . t−1

This member function is used to test if a node is full.

Definition of B-tree in Haskell

In Haskell, record syntax is used to define BTree, so that keys and children canbe access easily later on. Some auxiliary functions are also provided.

data BTree a = Node{ keys : : [ a ], c h i l d r en : : [ BTree a ], degree : : Int} deriving (Eq, Show)

−− Aux i l i a r y f unc t i on sempty deg = Node [ ] [ ] deg

f u l l : : BTree a −> Boolf u l l t r = ( length $ keys t r ) > 2∗( degree t r )−1

Definition of B-tree in Scheme/Lisp

In Scheme/Lisp, because a list can contain both children and keys at same time,we can organize a B-tree with children and keys interspersed in list. for instance,below list represents a B-tree, the root has one key “c” and two children, theleft child is a leaf node, with keys “A” and “B”, while the right child is also aleaf with keys “D” and “E”.

( ( ”A” ”B” ) ”C” ( ”D” ”E” ) )

However, this definition doesn’t hold the information of minimum degree t.The solution is to pass t as a parameter for all operations.

Some auxiliary functions are provided so that we can access and test a B-treeeasily.

( d e f i n e ( keys t r )( i f ( null ? t r )

’ ( )( i f ( l i s t ? ( car t r ) )

( keys ( cdr t r ) )(cons ( car t r ) ( keys ( cdr t r ) ) ) ) ) )

( d e f i n e ( ch i l d r en t r )( i f ( null ? t r )

’ ( )( i f ( l i s t ? ( car t r ) )

(cons ( car t r ) ( ch i l d r en ( cdr t r ) ) )( ch i l d r en ( cdr t r ) ) ) ) )

5

Page 6: BTree

( d e f i n e ( l e a f ? t r )(or ( null ? t r )

(not ( l i s t ? ( car t r ) ) ) ) )

Here we assume the key is a simple value, such as a number, or a string, butnot a list. In case we find a element is a list, it represents a child B-tree. Allabove functions are defined based on this assumption.

3 Insertion

Insertion is the basic operation to B-tree, a B-tree can be created by insertingkeys repeatedly. The essential idea of insertion is similar to the binary searchtree. If the keys to be inserted is x, we examine the keys in a node to find aposition where all the keys on the left are less than x, while all the keys on theright hand are greater than x. after that we can recursively insert x to the childnode at this position.

However, this basic idea need to be fine tuned. The first thing is what therecursion termination criteria is. This problem can be easily solved by definethe rule that, in case the node to be inserted is a leaf node. We needn’t doinserting recursively. This is because leaf node don’t have children at all. Wecan just put the x between all left hand keys and right hand keys, which causethe keys number of a leaf node increased by one.

The second thing is how to keep the balance properties of a B-tree wheninserting. if a leaf has already 2t− 1 keys, it will break the rule of ’ each nodecan has at most 2t− 1 keys’ after we insert x to it. Below sections will show 2major methods to solve this problem.

3.1 Splitting

Regarding to the problem of insert a key to a node, which has already 2t − 1keys, one solution is to split the node before insertion.

In this case, we can divide the node into 3 parts as shown in Figure 3. theleft part contains first t − 1 keys and t children, while the right part containsthe last t− 1 keys and t children. Both left part and right part are valid B-treenodes. the middle part is just the t-th key. It is pushed up to its parent node (ifit already root node, then the t-th key, with 2 children turn be the new root).

3.1.1 Imperative splitting

If we skip the disk accessing part as explained in [1]. The imperative splittingalgorithms can be shown as below.1: procedure B-TREE-SPLIT-CHILD(node, i)2: x ← CHILDREN(node)[i]3: y ← CREATE −NODE()4: INSERT (KEY S(node), i, KEY S(x)[t])5: INSERT (CHILDREN(node), i + 1, y)

6

Page 7: BTree

K[1] K[2] ... K[t] ... K[2t-1]

C[1] C[2] ... C[t] C[t+1] ... C[2t-1] C[2t]

a. Before split,

K[1] K[2] ... K[t-1]

C[1] C[2] ... C[t]

... K[t] ...

K[t+1] ... K[2t-1]

C[t+1] ... C[2t-1]

b. After split,

Figure 3: Split node

6: KEY S(y) ← KEY S(x)[t + 1...2t− 1]7: KEY S(x) ← KEY S(x)[1...t− 1]8: if y is not leaf then9: CHILDREN(y) ← CHILDREN(x)[t + 1...2t]

10: CHILDREN(x) ← CHILDREN(x)[1...t]11: end if12: end procedure

This algorithm take 2 parameters, one is a B-tree node, the other is theindex to indicate which child of this node will be split.

Split implemented in C++

The algorithm can be implemented in C++ as a member function of B-treenode.

template<class K, int t>struct BTree{

// . . .

void s p l i t c h i l d ( int i ){BTree<K, t>∗ x = ch i l d r en [ i ] ;BTree<K, t>∗ y = new BTree<K, t >() ;keys . i n s e r t ( keys . begin ()+ i , x−>keys [ t −1 ] ) ;c h i l d r en . i n s e r t ( ch i l d r en . begin ()+ i +1, y ) ;y−>keys = Keys (x−>keys . begin ()+t , x−>keys . end ( ) ) ;

7

Page 8: BTree

x−>keys = Keys (x−>keys . begin ( ) , x−>keys . begin ()+t−1);i f ( ! x−> l e a f ( ) ){

y−>ch i l d r en = Chi ldren (x−>ch i l d r en . begin ()+t , x−>ch i l d r en . end ( ) ) ;x−>ch i l d r en = Chi ldren (x−>ch i l d r en . begin ( ) , x−>ch i l d r en . begin ()+ t ) ;

}}

Split implemented in Python

We can define splitting operation as a member method of B-tree as the following.

class BTreeNode :#. . .def s p l i t c h i l d ( s e l f , i ) :

t = s e l f . tx = s e l f . c h i l d r en [ i ]y = BTreeNode ( t , x . l e a f )s e l f . keys . i n s e r t ( i , x . keys [ t−1])s e l f . c h i l d r en . i n s e r t ( i +1, y )y . keys = x . keys [ t : ]x . keys = x . keys [ : t−1]i f not y . l e a f :

y . c h i l d r en = x . ch i l d r en [ t : ]x . c h i l d r en = x . ch i l d r en [ : t ]

3.1.2 Functional splitting

For functional algorithm, splitting will return a tuple, which contains the leftpart and right as B-Trees, along with a key.1: function B-TREE-SPLIT(node)2: ks ← KEY S(node)[1...t− 1]3: ks′ ← KEY S(node)[t + 1...2t− 1]4: if node is not leaf then5: cs ← CHILDREN(node)[1...t]6: cs′ ← CHILDREN(node)[t...2t]7: end if8: return (CREATE −B − TREE(ks, cs),KEY S(node)[t], CREATE −

B − TREE(ks′, cs′))9: end function

Split implemented in Haskell

Haskell prelude provide take/drop functions to get the part of the list. Thesefunctions just returns empty list if the list passed in is empty. So there is noneed to test if the node is leaf.

8

Page 9: BTree

sp l i t : : BTree a −> (BTree a , a , BTree a )sp l i t (Node ks cs t ) = ( c1 , k , c2 ) where

c1 = Node ( take ( t−1) ks ) ( take t cs ) tc2 = Node (drop t ks ) (drop t cs ) tk = head (drop ( t−1) ks )

Split implemented in Scheme/Lisp

As mentioned previously, the minimum degree t is passed as a parameter. Thesplitting is performed according to t.

( d e f i n e ( sp l i t t r t )( i f ( l e a f ? t r )

( l i s t ( l i s t−head t r (− t 1 ) )( l i s t− r e f t r (− t 1 ) )( l i s t− t a i l t r t ) )

( l i s t ( l i s t−head t r (− (∗ t 2) 1 ) )( l i s t− r e f t r (− (∗ t 2) 1 ) )( l i s t− t a i l t r (∗ t 2 ) ) ) ) )

When splitting a leaf node, because there is no child at all, the programsimply take the first t − 1 keys and the last t − 1 keys to form two child, andleft the t-th key as the only key of the new node. It will return these 3 parts ina list. When splitting a branch node, children must be also taken into account,that’s why the first 2t− 1 and the last 2t− 1 elements are taken.

3.2 Split before insert method

Note that the split solution will push a key up to its parent node, It is possiblethat the parent node be full if it has already 2t− 1 keys.

Regarding to this issue, the [1] provides a solution to check every node fromroot along the path until leaf, in case there is a node in this path is full. thesplit is applied. Since the parent of this node has been examined except theroot node, which ensure the parent node has less than 2t− 1 keys, the pushingup of one key won’t make the parent full. This approach need only a single passdown the tree without need of any back-tracking.

The main insert algorithm will first check if the root node need split. Ifyes, it will create a new node, and set the root as the only child, then performssplitting. and set the new node as the root. After that, the algorithm will tryto insert the key to the non-full node.1: function B-TREE-INSERT(T, k)2: r ← T3: if r is full then4: s ← CREATE −NODE()5: APPEND(CHILDREN(s), r)6: B − TREE − SPLIT − CHILD(s, 1)7: r ← s

9

Page 10: BTree

8: end if9: B − TREE − INSERT −NONFULL(r, k)

10: return r11: end function

The algorithm B − TREE − INSERT − NONFUL assert that the nodepassed in is not full. If it is a leaf node, the new key is just inserted to theproper position based on its order. If it is a branch node. The algorithm findsa proper child node to which the new key will be inserted. If this child node isfull, the splitting will be performed firstly.1: procedure B-TREE-INSERT-NONFUL(T, k)2: if T is leaf then3: i ← 14: while i ≤ LENGTH(KEY S(T )) and k > KEY S(T )[i] do5: i ← i + 16: end while7: INSERT (KEY S(T ), i, k)8: else9: i ← LENGTH(KEY S(T ))

10: while i > 1andk < KEY S(T )[i] do11: i ← i− 112: end while13: if CHILDREN(T )[i] is full then14: B − TREE − SPLIT − CHILD(T, i)15: if k > KEY S(T )[i] then16: i ← i + 117: end if18: end if19: B − TREE − INSERT −NONFULL(CHILDREN(T )[i], k)20: end if21: end procedure

Note that this algorithm is actually recursive. Consider B-tree typically hasminimum degree t relative to magnetic disk structure, and it is balanced tree,Even small depth can support huge amount of data (with t = 10, maximum to10 billion data can be stored in a B-tree with height of 10). Of course it is easyto eliminate the recursive call to improve the algorithm.

In the below language specific implementations, I’ll eliminate recursion inC++ program, and show the recursive version in Python program.

Insert implemented in C++

The main insert program in C++ examine if the root is full and performssplitting accordingly. Then it will call insert nonfull to do the further process.

template<class K, int t>BTree<K, t>∗ i n s e r t (BTree<K, t>∗ tr , K key ){

BTree<K, t>∗ root ( t r ) ;

10

Page 11: BTree

i f ( root−> f u l l ( ) ){BTree<K, t>∗ s = new BTree<K, t >() ;s−>ch i l d r en . push back ( root ) ;s−>s p l i t c h i l d ( 0 ) ;root = s ;

}return i n s e r t n o n f u l l ( root , key ) ;

}The recursion is eliminated in insert nonfull function. If the current node

is leaf, it will call ordered insert to insert the key to the correct position. Ifit is branch node, the program will find the proper child tree and set it as thecurrent node in next loop. Splitting is performed if the child tree is full.

template<class K, int t>BTree<K, t>∗ i n s e r t n o n f u l l (BTree<K, t>∗ tr , K key ){

typedef typename BTree<K, t > : : Keys Keys ;typedef typename BTree<K, t > : : Chi ldren Chi ldren ;

BTree<K, t>∗ root ( t r ) ;while ( ! tr−> l e a f ( ) ){

unsigned int i =0;while ( i < tr−>keys . s i z e ( ) && tr−>keys [ i ] < key )

++i ;i f ( tr−>ch i l d r en [ i ]−> f u l l ( ) ){

tr−>s p l i t c h i l d ( i ) ;i f ( key > tr−>keys [ i ] )

++i ;}t r = tr−>ch i l d r en [ i ] ;

}o r d e r e d i n s e r t ( tr−>keys , key ) ;return root ;

}Where the ordered insert is defined as the following.

template<class Coll>void o r d e r e d i n s e r t ( Co l l& co l l , typename Col l : : va lue type x ){

typename Col l : : i t e r a t o r i t = c o l l . begin ( ) ;while ( i t != c o l l . end ( ) && ∗ i t < x )

++i t ;c o l l . i n s e r t ( i t , x ) ;

}For convenience, I defined auxiliary functions to convert a list of keys into

the B-tree.

template<class T>

11

Page 12: BTree

T∗ i n s e r t k e y (T∗ t , typename T : : key type x ){return i n s e r t ( t , x ) ;

}

template<class I t e r a t o r , class T>T∗ l i s t t o b t r e e ( I t e r a t o r f i r s t , I t e r a t o r l a s t , T∗ t ){

return std : : accumulate ( f i r s t , l a s t , t ,s td : : p t r fun ( i n s e r t k ey <T>)) ;

}In order to print the result as human readable string, a recursive convert

function is provided.

template<class T>std : : s t r i n g b t r e e t o s t r (T∗ t r ){

typename T : : Keys : : i t e r a t o r k ;typename T : : Chi ldren : : i t e r a t o r c ;

s td : : o s t r ing s t r eam s ;s<<” ( ” ;i f ( tr−> l e a f ( ) ){

k=tr−>keys . begin ( ) ;s<<∗k++;for ( ; k!= tr−>keys . end ( ) ; ++k)

s<<” , ”<<∗k ;}else {

for ( k=tr−>keys . begin ( ) , c=tr−>ch i l d r en . begin ( ) ;k!= tr−>keys . end ( ) ; ++k , ++c )

s<<b t r e e t o s t r (∗ c)<<” , ”<<∗k<<” , ” ;s<<b t r e e t o s t r (∗ c ) ;

}s<<” ) ” ;return s . s t r ( ) ;

}With all the above defined program, some simple test cases can be fed to

verify the program.

const char∗ s s [ ] = {”G” , ”M” , ”P” , ”X” , ”A” , ”C” , ”D” , ”E” , ”J” , ”K” , \”N” , ”O” , ”R” , ”S” , ”T” , ”U” , ”V” , ”Y” , ”Z” } ;

BTree<std : : s t r i ng , 2>∗ t r234=l i s t t o b t r e e ( ss , s s+s izeof ( s s )/ s izeof (char ∗ ) ,new BTree<std : : s t r i ng , 2>);

s td : : cout<<”2−3−4 t r e e o f ” ;s td : : copy ( ss , s s+s izeof ( s s )/ s izeof (char ∗ ) ,

s td : : o s t r e am i t e r a t o r <std : : s t r i ng >( std : : cout , ” , ” ) ) ;s td : : cout<<”\n”<<b t r e e t o s t r ( t r234)<<”\n” ;delete t r234 ;

12

Page 13: BTree

BTree<std : : s t r i ng , 3>∗ t r = l i s t t o b t r e e ( ss , s s+s izeof ( s s )/ s izeof (char ∗ ) ,new BTree<std : : s t r i ng , 3>);

s td : : cout<<”B−t r e e with t=3 o f ” ;std : : copy ( ss , s s+s izeof ( s s )/ s izeof (char ∗ ) ,

s td : : o s t r e am i t e r a t o r <std : : s t r i ng >( std : : cout , ” , ” ) ) ;s td : : cout<<”\n”<<b t r e e t o s t r ( t r)<<”\n” ;delete t r ;

Run these lines will generate the following result:

2-3-4 tree of G, M, P, X, A, C, D, E, J, K, N, O, R, S, T, U, V, Y, Z,(((A), C, (D)), E, ((G, J, K), M, (N, O)), P, ((R), S, (T), U, (V), X, (Y, Z)))B-tree with t=3 of G, M, P, X, A, C, D, E, J, K, N, O, R, S, T, U, V, Y, Z,((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z))

Figure 4 shows the result.

E P

C M S U X

A D G J K N O R T V Y Z

a. Insert result of a 2-3-4 tree,

D M P T

A C E G J K N O R S U V X Y Z

b. Insert result of a B-tree with minimum degree of 3.

Figure 4: insert result

Insert implemented in Python

Implement the above insertion algorithm in Python is straightforward, we changethe index starts from 0 instead of 1.

def B t r e e i n s e r t ( tr , key ) : # + data parameterroot = t ri f root . i s f u l l ( ) :

s = BTreeNode ( root . t , Fa l se )

13

Page 14: BTree

s . c h i l d r en . i n s e r t (0 , root )s . s p l i t c h i l d (0 )root = s

B t r e e i n s e r t n o n f u l l ( root , key )return root

And the insertion to non-full node is implemented as the following.

def B t r e e i n s e r t n o n f u l l ( tr , key ) :i f t r . l e a f :

o r d e r e d i n s e r t ( t r . keys , key )#d i s k w r i t e ( t r )

else :i = l en ( t r . keys )while i >0 and key < t r . keys [ i −1] :

i = i−1#di s k r e ad ( t r . c h i l d r en [ i ] )i f t r . c h i l d r en [ i ] . i s f u l l ( ) :

t r . s p l i t c h i l d ( i )i f key>t r . keys [ i ] :

i = i+1B t r e e i n s e r t n o n f u l l ( t r . c h i l d r en [ i ] , key )

Where the function “ordered insert” function is used to insert an elementto an ordered list. Since Python standard list don’t support order information.The program is written as below.

def o r d e r e d i n s e r t ( l s t , x ) :i = l en ( l s t )l s t . append (x )while i >0 and l s t [ i ]< l s t [ i −1] :

( l s t [ i −1] , l s t [ i ] ) = ( l s t [ i ] , l s t [ i −1])i=i−1

For the array based collection, append on the tail is much more effectivethan insert in other position, because the later takes O(n) time, if the length ofthe collection is n. This program will first append the new element at the endof the existing collection, then iterate from the last element to the first one, andcheck if the current two elements next to each other are ordered. If not, thesetwo elements will be swapped.

For easily creating a B-tree from a list of keys, we can write a simple helperfunction.

def l i s t t o B t r e e ( l , t=TREE 2 3 4 ) :t r = BTreeNode ( t )for x in l :

t r = B t r e e i n s e r t ( tr , x )return t r

14

Page 15: BTree

By default, this function will create a 2-3-4 tree, and user can specify theminimum degree as the second argument. The first argument is a list of keys.This function will repeatedly insert every key into the B-tree which starts froman empty tree.

In order to print the B-tree out for verification, an auxiliary printing functionis provided.

def B t r e e t o s t r ( t r ) :r e s = ” ( ”i f t r . l e a f :

r e s += ” , ” . j o i n ( t r . keys )else :

for i in range ( l en ( t r . keys ) ) :r e s+= B t r e e t o s t r ( t r . c h i l d r en [ i ] ) + ” , ” + t r . keys [ i ] + ” , ”

r e s += B t r e e t o s t r ( t r . c h i l d r en [ l en ( t r . keys ) ] )r e s += ” ) ”return r e s

After that, some smoke test cases can be use to verify the insertion program.

class BTreeTest :def run ( s e l f ) :

s e l f . t e s t i n s e r t ( )

def t e s t i n s e r t ( s e l f ) :l s t = [ ”G” , ”M” , ”P” , ”X” , ”A” , ”C” , ”D” , ”E” , ”J” , ”K” , \

”N” , ”O” , ”R” , ”S” , ”T” , ”U” , ”V” , ”Y” , ”Z” ]t r = l i s t t o B t r e e ( l s t )print B t r e e t o s t r ( t r )print B t r e e t o s t r ( l i s t t o B t r e e ( l s t , 3 ) )

Run the test cases prints two different B-trees. They are identical to theC++ program outputs.

(((A), C, (D)), E, ((G, J, K), M, (N, O)), P, ((R), S, (T), U, (V), X, (Y, Z)))((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z))

3.3 Insert then fix method

Another approach to implement B-tree insertion algorithm is just find the po-sition for the new key and insert it. Since such insertion may violate B-treeproperties. We can then apply a fixing procedure after that. If a leaf containstoo much keys, we split it into 2 leafs and push a key up to the parent branchnode. Of course this operation may cause the parent node violate the B-treeproperties, so the algorithm need traverse from leaf to root to perform the fixing.

By using recursive implementation these fixing method can also be realizedfrom top to bottom.1: function B-TREE-INSERT’(T, k)

15

Page 16: BTree

2: return FIX −ROOT (RECURSIV E − INSERT (T, k))3: end function

Where FIX−ROOT examine if the root node contains too many keys, anddo splitting if necessary.1: function FIX-ROOT(T )2: if FULL?(T ) then3: T ← B − TREE − SPLIT (T )4: end if5: return T6: end function

And the inner function INSERT (T, k) will first check if T is leaf node orbranch node. It will do directly insertion for leaf and recursively do insertionfor branch.1: function RECURSIVE-INSERT(T, k)2: if LEAF?(T ) then3: INSERT (KEY S(T ), k)4: return T5: else6: initialize empty arrays of k′, k′′, c′, c′′

7: i ← 18: while i <= LENGTH(KEY S(T )) and KEY S(T )[i] < k do9: APPEND(k′,KEY S(T )[i])

10: APPEND(c′, CHILDREN(T )[i])11: i ← i + 112: end while13: k′′ ← KEY S(T )[i...LENGTH(KEY S(T ))]14: c′′ ← CHILDREN(T )[i + 1...LENGTH(CHILDREN(T ))]]15: c ← CHILDREN(T )[i]16: left ← (k′, c′)17: right ← (k′′, c′′)18: return MAKE−B−TREE(left, RECURSIV E−INSERT (c, k), right)19: end if20: end function

Figure 5 shows the branch case. The algorithm first locates the position. forcertain key ki, if the new key k to be inserted satisfy ki−1 < k < ki, Then weneed recursively insert k to child ci.

This position divides the node into 3 parts, the left part, the child ci andthe right part.

The procedure MAKE − B − TREE take 3 arguments, which relative tothe left part, the result after insert k to ci and right part. It tries to merge these3 parts into a new B-tree branch node.

However, insert key into a child may make this child violate the B-treeproperty if it exceed the limitation of the number of keys a node can have.MAKE −B − TREE will detect such situation and try to fix the problem bysplitting.

16

Page 17: BTree

k, K[i-1]<k<K[i]

K[1] K[2] ... K[i-1] K[i] ... K[n]

insert to

C[1] C[2] ... C[i-1] C[i] C[i+1] ... C[n] C[n+1]

a. locate the child to insert,

K[1] K[2] ... K[i-1]

C[1] C[2] ... C[i-1]

k, K[i-1]<k<K[i]

C[i]

recursive insert

K[i] K[i+1] ... K[n]

C[i+1] ... C[n+1]

b. recursive insert,

Figure 5: Insert a key to a branch node

1: function MAKE-B-TREE(L,C, R)2: if FULL?(C) then3: return FIX − FULL(L,C, R)4: else5: T ← CREATE −NEW −NODE()6: KEY S(T ) ← KEY S(L) + KEY S(R)7: CHILDREN(T ) ← CHILDREN(L) + [C] + CHILDREN(R)8: return T9: end if

10: end functionWhere FIX − FULL just calls splitting process.

1: function FIX-FULL(L,C, R)2: (C ′,K, C ′′) ← B − TREE − SPLIT (C)3: T ← CREATE −NEW −NODE()4: KEY S(T ) ← KEY S(L) + [K] + KEY S(R)5: CHILDREN(T ) ← CHILDREN(L) + [C ′, C ′′] + CHILDREN(R)6: return T7: end function

Note that splitting may push one extra key up to the parent node. However,even the push-up causes the violation of B-tree property, it will be recursivelyfixed.

17

Page 18: BTree

Insert implemented in Haskell

Realize the above recursive algorithm in Haskell can implement this insert-fixingprogram.

The main program is provided as the following.

insert : : (Ord a )=> BTree a −> a −> BTree ainsert t r x = f ixRoot $ i n s t r x

It will just call an auxiliary function ‘ins’ then examine and fix the root nodeif contains too many keys.

import quali f ied Data . List as L

−− . . .

i n s : : (Ord a ) => BTree a −> a −> BTree ai n s (Node ks [ ] t ) x = Node (L . insert x ks ) [ ] ti n s (Node ks cs t ) x = make ( ks ’ , cs ’ ) ( i n s c x ) ( ks ’ ’ , cs ’ ’ )

where( ks ’ , ks ’ ’ ) = L . partition (<x ) ks( cs ’ , ( c : cs ’ ’ ) ) = L . splitAt ( length ks ’ ) cs

The ‘ins’ function uses pattern matching to handle the two different cases.If the node to be inserted is leaf, it will call insert function defined in Haskellstandard library, which can insert the new key x into the proper position tokeep the order of the keys.

If the node to be inserted is a branch node, the program will recursivelyinsert the key to the child which has the range of keys cover x. After that,it will call ‘make’ function to combine the result together as a new node. theexamine and fixing are performed also by ‘make’ function.

The function ‘fixRoot’ first check if the root node contains too many keys,if it exceeds the limit, splitting will be applied. The split result will be used tomake a new node, so the total height of the tree increases.

f ixRoot : : BTree a −> BTree af ixRoot (Node [ ] [ t r ] ) = t r −− sh r ink h e i g h tf ixRoot t r = i f f u l l t r then Node [ k ] [ c1 , c2 ] ( degree t r )

else t rwhere

( c1 , k , c2 ) = sp l i t t r

The following is the implementation of ‘make’ function.

make : : ( [ a ] , [ BTree a ] ) −> BTree a −> ( [ a ] , [ BTree a ] ) −> BTree amake ( ks ’ , cs ’ ) c ( ks ’ ’ , cs ’ ’ )

| f u l l c = f i x F u l l ( ks ’ , cs ’ ) c ( ks ’ ’ , cs ’ ’ )| otherwise = Node ( ks ’++ks ’ ’ ) ( cs ’++[c]++cs ’ ’ ) ( degree c )

While ‘fixFull’ are given like below.

18

Page 19: BTree

f i x F u l l : : ( [ a ] , [ BTree a ] ) −> BTree a −> ( [ a ] , [ BTree a ] ) −> BTree af i x F u l l ( ks ’ , cs ’ ) c ( ks ’ ’ , cs ’ ’ ) = Node ( ks ’++[k]++ks ’ ’ )

( cs ’++[c1 , c2]++cs ’ ’ ) ( degree c )where

( c1 , k , c2 ) = sp l i t c

In order to print B-tree content out, an auxiliary function ‘toString’ is pro-vided to convert a B-tree to string.

t oS t r i ng : : (Show a )=>BTree a −> Stringt oS t r i ng (Node ks [ ] ) = ” ( ”++(L . i n t e r c a l a t e ” , ” (map show ks))++” ) ”toS t r i ng t r = ” ( ”++(toSt r ( keys t r ) ( ch i l d r en t r ))++” ) ” where

toSt r ( k : ks ) ( c : c s ) = ( toS t r i ng c)++” , ”++(show k)++” , ”++(toSt r ks cs )toSt r [ ] [ c ] = toS t r i ng c

With all the above definition, the insertion program can be verified withsome simple test cases.

l i stToBTree : : (Ord a )=>[ a]−>Int−>BTree al i stToBTree l s t t = fo ld l insert ( empty t ) l s t

t e s t I n s e r t = doputStrLn $ toS t r i ng $ l i stToBTree ”GMPXACDEJKNORSTUVYZ” 3putStrLn $ toS t r i ng $ l i stToBTree ”GMPXACDEJKNORSTUVYZ” 2

Run ‘testInsert’ will generate the following result.

((’A’, ’C’, ’D’, ’E’), ’G’, (’J’, ’K’), ’M’, (’N’, ’O’), ’P’, (’R’, ’S’),’T’, (’U’, ’V’, ’X’, ’Y’, ’Z’))(((’A’), ’C’, (’D’)), ’E’, ((’G’, ’J’, ’K’), ’M’, (’N’)), ’O’, ((’P’),’R’, (’S’), ’T’, (’U’), ’V’, (’X’, ’Y’, ’Z’)))

Compare the results output by C++ or Python program with this one, asshown in figure 6 we can found that there are different points. However, theB-tree built by Haskell program is still valid because all B-tree properties aresatisfied. The main reason for this difference is because of the approachingchange.

Insert implemented in Scheme/Lisp

The main function for insertion in Scheme/Lisp is given as the following.

( d e f i n e ( b t r e e− i n s e r t t r x t )( d e f i n e ( i n s t r x )

( i f ( l e a f ? t r )( o rdered− in se r t t r x ) ; ; l e a f( let ∗ ( ( r e s ( part i t ion−by t r x ) )

( l e f t ( car r e s ) )( c ( cadr r e s ) )( r i g h t ( caddr r e s ) ) )

19

Page 20: BTree

E O

C M R T V

A D G J K N P S U X Y Z

a. Insert result of a 2-3-4 tree (insert-fixing method),

G M P T

A C D E J K N O R S U V X Y Z

b. Insert result of a B-tree with minimum degree of 3 (insert-fixing method).

Figure 6: insert and fixing results

( make−btree l e f t ( i n s c x ) r i g h t t ) ) ) )( f ix− root ( i n s t r x ) t ) )

The program simply calls an internal function and performs fixing on it. Theinternal ‘ins’ function examine if the current node is a leaf node. In case thenode is a leaf, it only contains keys, we can located the position and insert thenew key there. Otherwise, we partition the node into 3 parts, the left part, thechild which the recursive insertion will performed on, and the right part. Theprogram will do the recursive insertion and then combine these three part to anew node. fixing will be happened during the combination.

Function ‘ordered-insert’ can help to traverse a ordered list and insert thenew key to proper position as below.

( d e f i n e ( ordered− in se r t l s t x )( d e f i n e ( insert−by less−p l s t x )

( i f ( null ? l s t )( l i s t x )( i f ( less−p x ( car l s t ) )

(cons x l s t )(cons ( car l s t ) ( insert−by less−p ( cdr l s t ) x ) ) ) ) )

( i f ( string ? x )( insert−by string<? l s t x )( insert−by < l s t x ) ) )

In order to deal with B-trees with key types both as string and as number,we abstract the less-than function as a parameter and pass it to an internalfunction.

Function ‘partition-by’ uses a similar approach.

20

Page 21: BTree

( d e f i n e ( part i t ion−by t r x )( d e f i n e ( part−by pred t r x )

( i f (= ( length t r ) 1)( l i s t ’ ( ) ( car t r ) ’ ( ) )( i f ( pred ( cadr t r ) x )

( let ∗ ( ( r e s ( part−by pred ( cddr t r ) x ) )( l e f t ( car r e s ) )( c ( cadr r e s ) )( r i g h t ( caddr r e s ) ) )

( l i s t ( cons−pair ( car t r ) ( cadr t r ) l e f t ) c r i g h t ) )( l i s t ’ ( ) ( car t r ) ( cdr t r ) ) ) ) )

( i f ( string ? x )( part−by string<? t r x )( part−by < t r x ) ) )

Where ‘cons-pair’ is a helper function which can put a key, a child in frontof a B-tree.

( d e f i n e ( cons−pair c k l s t )(cons c (cons k l s t ) ) )

In order to fixing the root of a B-tree, which contains too many keys, a‘fix-root’ function is provided.

( d e f i n e ( f u l l ? t r t ) ; ; t : minimum degree(> ( length ( keys t r ) )

(− (∗ 2 t ) 1 ) ) )

( d e f i n e ( f ix− root t r t )(cond ( ( f u l l ? t r t ) ( sp l i t t r t ) )

( e l s e t r ) ) )

When we turn the recursive insertion result to a new node, we need do fixingif the result node contains too many keys.

( d e f i n e ( make−btree l c r t )(cond ( ( f u l l ? c t ) ( f i x− f u l l l c r t ) )

( e l s e (append l (cons c r ) ) ) ) )

( d e f i n e ( f i x− f u l l l c r t )(append l ( sp l i t c t ) r ) )

With all above facilities, we can test the program for verification.In order to build the B-tree easily from a list of keys, some simple helper

functions are given.

( d e f i n e ( l i s t−>btree l s t t )( f o l d− l e f t ( lambda ( t r x ) ( b t r e e− i n s e r t t r x t ) ) ’ ( ) l s t ) )

( d e f i n e ( str−> s l i s t s )

21

Page 22: BTree

( i f ( s t r i ng−nu l l ? s )’ ( )(cons ( str ing−head s 1) ( str−> s l i s t ( s t r i n g− t a i l s 1 ) ) ) ) )

A same simple test case as the Haskell one is feed to our program.

( d e f i n e ( t e s t− i n s e r t )( l i s t−>btree ( str−> s l i s t ”GMPXACDEJKNORSTUVYZBFHIQW”) 3) )

Evaluate ‘test-insert’ function can get a B-tree.

( ( ( ”A” ”B” ) ”C” ( ”D” ”E” ”F” ) ”G” ( ”H” ” I ” ”J” ”K” ) ) ”M”( ( ”N” ”O” ) ”P” ( ”Q” ”R” ”S” ) ”T” ( ”U” ”V” ) ”W” ( ”X” ”Y” ”Z” ) ) )

It is as same as the result output by the Haskell program.

4 Deletion

Deletion is another basic operation of B-tree. Delete a key from a B-tree maycause violating of B-tree balance properties, that a node can’t contains too fewkeys (no less than t− 1 keys, where t is minimum degree).

Similar to the approaches for insertion, we can either do some preparationso that the node from where the key will be deleted contains enough keys; ordo some fixing after the deletion if the node has too few keys.

4.1 Merge before delete method

In textbook[1], the delete algorithm is given as algorithm description. Thepseudo code is left as exercises. The description can be used as a good referencewhen writing the pseudo code.

4.1.1 Merge before delete algorithm implemented imperatively

The first case is the trivial, if the key k to be deleted can be located in node xand x is a leaf node. we can directly remove k from x.

Note that this is a terminal case. For most B-trees which have not only aleaf node as the root. The program will first examine non-leaf nodes.

The second case states that, the key k can be located in node x, however, xisn’t a leaf node. In this case, there are 3 sub cases.

• If the child node y precedes k contains enough keys (more than t). Wereplace k in node x with k′, which is the predecessor of k in child y. Andrecursively remove k′ from y.

The predecessor of k can be easily located as the last key of child y.

• If y doesn’t contains enough keys, while the child node z follows k containsmore than t keys. We replace k in node x with k′′, which is the successorof k in child z. And recursively remove k′′ from z.

The successor of k can be easily located as the first key of child z.

22

Page 23: BTree

• Otherwise, if neither y, nor z contains enough keys, we can merge y, kand z into one new node, so that this new node contains 2t−1 keys. Afterthat, we can then recursively do the removing.

Note that after merge, if the current node doesn’t contain any keys, whichmeans k is the only key in x, y and z are the only two children of x. weneed shrink the tree height by one.

The case 2 is illustrated as in figure 7, 8, and 9.

Figure 7: case 2a. Replace and delete from predecessor.

Note that although we use recursive way to delete keys in case 2, the recur-sion can be turned into pure imperative way. We’ll show such program in C++implementation.

the last case states that, if k can’t be located in node x, the algorithm needtry to find a child node ci of x, so that sub-tree ci may contains k. Before thedeletion is recursively applied in ci, we need be sure that there are at least tkeys in ci. If there are not enough keys, we need do the following adjustment.

• We check the two sibling of ci, which are ci−1 and ci+1. If either one ofthem contains enough keys (at least t keys), we move one key from x downto ci, and move one key from the sibling up to x. Also we need move therelative child from the sibling to ci.

This operation makes ci contains enough keys OK for deletion. we cannext try to delete k from ci recursively.

23

Page 24: BTree

Figure 8: case 2b. Replace and delete from successor.

Figure 9: case 2c. Merge and delete.

24

Page 25: BTree

• In case neither one of the two siblings contains enough keys, we then mergeci, a key from x, and either one of the sibling into a new node, and do thedeletion on this new node.

Case 3 is illustrated in figure 10, 11.

Figure 10: case 3a. Borrow from left sibling.

By implementing the above 3 cases into pseudo code, the B-tree delete al-gorithm can be given as the following.

First there are some auxiliary functions to do some simple test and operationson a B-tree.1: function CAN-DEL(T )2: return number of keys of T ≥ t3: end function

Function CAN −DEL test if a B-tree node contains enough keys (no lessthan t keys).1: procedure MERGE-CHILDREN(T, i) . Merge children i and i + 12: x ← CHILDREN(T )[i]3: y ← CHILDREN(T )[i + 1]4: APPEND(KEY S(x),KEY S(T )[i])5: CONCAT (KEY S(x),KEY S(y))6: CONCAT (CHILDREN(x), CHILDREN(y)

25

Page 26: BTree

Figure 11: case 3b. Borrow Merge and delete.

26

Page 27: BTree

7: REMOV E(KEY S(T ), i)8: REMOV E(CHILDREN(T ), i + 1)9: end procedure

Procedure MERGE−CHILDREN merges the i-th child, the i-th key, andi + 1-th child of node T into a new child, and remove the i-th key and i + 1-thchild after merging.

With these helper functions, the main algorithm of B-tree deletion is de-scribed as below.1: function B-TREE-DELETE(T, k)2: i ← 13: while i <= LENGTH(KEY S(T )) do4: if k = KEY S(T )[i] then5: if T is leaf then . case 16: REMOV E(KEY S(T ), k)7: else . case 28: if CAN −DEL(CHILDREN(T )[i]) then . case 2a9: KEY S(T )[i] ← LAST −KEY (CHILDREN(T )[i])

10: B−TREE−DELETE(CHILDREN(T )[i],KEY S(T )[i])11: else if CAN −DEL(CHILDREN(T )[i+1]) then . case 2b12: KEY S(T )[i] ← FIRST −KEY (CHILDREN(T )[i + 1])13: B−TREE−DELETE(CHILDREN(T )[i+1],KEY S(T )[i])14: else . case 2c15: MERGE − CHILDREN(T, i)16: B − TREE −DELETE(CHILDREN(T )[i], k)17: if KEY S(T ) = NIL then18: T ← CHILDREN(T )[i] . Shrinks height19: end if20: end if21: end if22: return T23: else if k < KEY S(T )[i] then24: BREAK25: else26: i ← i + 127: end if28: end while

29: if T is leaf then30: return T . k doesn’t exist in T at all31: end if32: if not CAN −DEL(CHILDREN(T )[i]) then . case 333: if i > 1 and CAN −DEL(CHILDREN(T )[i− 1]) then . case 3a:

left sibling34: INSERT (KEY S(CHILDREN(T )[i]),KEY S(T )[i− 1])35: KEY S(T )[i−1] ← POP−BACK(KEY S(CHILDREN(T )[i−

27

Page 28: BTree

1]))36: if CHILDREN(T )[i] isn’t leaf then37: c ← POP −BACK(CHILDREN(CHILDREN(T )[i− 1]))38: INSERT (CHILDREN(CHILDREN(T )[i]), c)39: end if40: else if i <= LENGTH(CHILDREN(T )) and CAN−DEL(CHILDREN(T )[i+

1] then . case 3a: right sibling41: APPEND(KEY S(CHILDREN(T )[i]),KEY S(T )[i])42: KEY S(T )[i] ← POP − FRONT (KEY S(CHILDREN(T )[i +

1]))43: if CHILDREN(T )[i] isn’t leaf then44: c ← POP−FRONT (CHILDREN(CHILDREN(T )[i+1]))45: APPEND(CHILDREN(CHILDREN(T )[i]), c)46: end if47: else . case 3b48: if i > 1 then49: MERGE − CHILDREN(T, i− 1)50: else51: MERGE − CHILDREN(T, i)52: end if53: end if54: end if55: B − TREE −DELETE(CHILDREN(T )[i], k) . recursive delete56: if KEY S(T ) = NIL then . Shrinks height57: T ← CHILDREN(T )[1]58: end if59: return T60: end function

4.1.2 Merge before deletion algorithm implemented in C++

The C++ implementation given here isn’t simply translate the above pseudocode into C++. The recursion can be eliminated in a pure imperative program.

In order to simplify some B-tree node operation, some auxiliary memberfunctions are added to the B-tree node class definition.

template<class K, int t>struct BTree{

// . . .// merge ch i l d r en [ i ] , keys [ i ] , and ch i l d r en [ i +1] to one nodevoid merge ch i ld ren ( int i ){

BTree<K, t>∗ x = ch i l d r en [ i ] ;BTree<K, t>∗ y = ch i l d r en [ i +1] ;x−>keys . push back ( keys [ i ] ) ;concat (x−>keys , y−>keys ) ;concat (x−>ch i ld r en , y−>ch i l d r en ) ;keys . e r a s e ( keys . begin ()+ i ) ;

28

Page 29: BTree

ch i l d r en . e r a s e ( ch i l d r en . begin ()+ i +1);y−>ch i l d r en . c l e a r ( ) ;delete y ;

}

key type r ep l a c e k ey ( int i , key type key ){keys [ i ]=key ;return key ;

}

bool can remove ( ){ return keys . s i z e ( ) >=t ; }// . . .

Function ‘replace key’ can update the i-th key of a node with a new value.Typically, this new value is pulled from a child node as described in deletionalgorithm. It will return the new value.

Function ‘can remove’ will test if a node contains enough keys for furtherdeletion.

Function ‘merge children’ can merge the i-th child, the i-th key, and thei + 1-th children into one node. This operation is reverse operation of splitting,it can double the keys of a node, so that such adjustment can ensure a node hasenough keys for further deleting.

Note that, unlike the other languages equipped with GC, in C++ program,the memory must be released after merging.

This function uses ‘concat’ function to concatenate two collections. It isdefined as the following.

template<class Coll>void concat ( Co l l& x , Co l l& y ){

std : : copy (y . begin ( ) , y . end ( ) ,s td : : i n s e r t i t e r a t o r <Coll >(x , x . end ( ) ) ) ;

}With these helper functions, the main program of B-tree deleting is given as

below.

template<class T>T∗ de l (T∗ tr , typename T : : key type key ){

T∗ root ( t r ) ;while ( ! tr−> l e a f ( ) ){

unsigned int i = 0 ;bool l o ca t ed ( fa l se ) ;while ( i < tr−>keys . s i z e ( ) ){

i f ( key == tr−>keys [ i ] ) {l o ca t ed = true ;i f ( tr−>ch i l d r en [ i ]−>can remove ( ) ){ // case 2a

key = tr−>r ep l a c e k ey ( i , tr−>ch i l d r en [ i ]−>keys . back ( ) ) ;tr−>ch i l d r en [ i ]−>keys . pop back ( ) ;

29

Page 30: BTree

t r = tr−>ch i l d r en [ i ] ;}else i f ( tr−>ch i l d r en [ i+1]−>can remove ( ) ){ // case 2b

key = tr−>r ep l a c e k ey ( i , tr−>ch i l d r en [ i+1]−>keys . f r on t ( ) ) ;tr−>ch i l d r en [ i+1]−>keys . e r a s e ( tr−>ch i l d r en [ i+1]−>keys . begin ( ) ) ;t r = tr−>ch i l d r en [ i +1] ;

}else { // case 2c

tr−>merge ch i ld ren ( i ) ;i f ( tr−>keys . empty ( ) ){ // sh r in k s h e i g h t

T∗ temp = tr−>ch i l d r en [ 0 ] ;tr−>ch i l d r en . c l e a r ( ) ;delete t r ;t r = temp ;

}}break ;

}else i f ( key > tr−>keys [ i ] )

i++;else

break ;}i f ( l o ca t ed )

continue ;i f ( ! tr−>ch i l d r en [ i ]−>can remove ( ) ){ // case 3

i f ( i >0 && tr−>ch i l d r en [ i−1]−>can remove ( ) ){// case 3a : l e f t s i b l i n gtr−>ch i l d r en [ i ]−>keys . i n s e r t ( tr−>ch i l d r en [ i ]−>keys . begin ( ) ,

tr−>keys [ i −1 ] ) ;tr−>keys [ i −1] = tr−>ch i l d r en [ i−1]−>keys . back ( ) ;tr−>ch i l d r en [ i−1]−>keys . pop back ( ) ;i f ( ! tr−>ch i l d r en [ i ]−> l e a f ( ) ){

tr−>ch i l d r en [ i ]−> ch i l d r en . i n s e r t ( tr−>ch i l d r en [ i ]−> ch i l d r en . begin ( ) ,tr−>ch i l d r en [ i−1]−>ch i l d r en . back ( ) ) ;

tr−>ch i l d r en [ i−1]−>ch i l d r en . pop back ( ) ;}

}else i f ( i<tr−>ch i l d r en . s i z e ( ) && tr−>ch i l d r en [ i+1]−>can remove ( ) ){

// case 3a : r i g h t s i b l i n gtr−>ch i l d r en [ i ]−>keys . push back ( tr−>keys [ i ] ) ;tr−>keys [ i ] = tr−>ch i l d r en [ i+1]−>keys . f r on t ( ) ;tr−>ch i l d r en [ i+1]−>keys . e r a s e ( tr−>ch i l d r en [ i+1]−>keys . begin ( ) ) ;i f ( ! tr−>ch i l d r en [ i ]−> l e a f ( ) ){

tr−>ch i l d r en [ i ]−> ch i l d r en . push back ( tr−>ch i l d r en [ i+1]−>ch i l d r en . f r on t ( ) ) ;tr−>ch i l d r en [ i+1]−>ch i l d r en . e r a s e ( tr−>ch i l d r en [ i+1]−>ch i l d r en . begin ( ) ) ;

30

Page 31: BTree

}}else {

i f ( i >0)tr−>merge ch i ld ren ( i −1);

elsetr−>merge ch i ld ren ( i ) ;

}}t r = tr−>ch i l d r en [ i ] ;

}tr−>keys . e r a s e ( remove ( tr−>keys . begin ( ) , tr−>keys . end ( ) , key ) ,

tr−>keys . end ( ) ) ;i f ( root−>keys . empty ( ) ){ // sh r in k s h e i g h t

T∗ temp = root−>ch i l d r en [ 0 ] ;root−>ch i l d r en . c l e a r ( ) ;delete root ;root = temp ;

}return root ;

}Please note how the recursion be eliminated. The main loop terminates only

if the current node which is examined is a leaf. Otherwise, the program will gothrough the B-tree along the path which may contains the key to be deleted, anddo proper adjustment including borrowing keys from other nodes, or mergingto make the candidate nodes along this path all have enough keys to performdeleting.

In order to verify this program, a quick and simple parsing function whichcan turn a B-tree description string into a B-tree is provided. Error handling ofparsing is omitted for illusion purpose.

template<class T>T∗ parse ( std : : s t r i n g : : i t e r a t o r& f i r s t , s td : : s t r i n g : : i t e r a t o r l a s t ){

T∗ t r = new T;++f i r s t ; // ’ ( ’while ( f i r s t != l a s t ){

i f (∗ f i r s t==’ ( ’ ){ // c h i l dtr−>ch i l d r en . push back ( parse<T>( f i r s t , l a s t ) ) ;

}else i f (∗ f i r s t == ’ , ’ | | ∗ f i r s t == ’ ’ )

++f i r s t ; // s k i p de l im ina to relse i f (∗ f i r s t == ’ ) ’ ){

++f i r s t ;return t r ;

}else { // key

31

Page 32: BTree

typename T : : key type key ;while (∗ f i r s t != ’ , ’ && ∗ f i r s t != ’ ) ’ )

key+=∗ f i r s t ++;tr−>keys . push back ( key ) ;

}}// shou ld never run herereturn 0 ;

}

template<class T>T∗ s t r t o b t r e e ( std : : s t r i n g s ){

std : : s t r i n g : : i t e r a t o r f i r s t ( s . begin ( ) ) ;return parse<T>( f i r s t , s . end ( ) ) ;

}After that, the testing can be performed as below.

void t e s t d e l e t e ( ){std : : cout<<” t e s t d e l e t e . . . \ n” ;const char∗ s=” ( ( (A, B) , C, (D, E, F) , G, (J , K, L) , M, (N, O) ) , ”

”P, ( (Q, R, S ) , T, (U, V) , X, (Y, Z ) ) ) ” ;typedef BTree<std : : s t r i ng , 3> BTr ;BTr∗ t r = s t r t o b t r e e <BTr>(s ) ;s td : : cout<<” be f o r e d e l e t e :\n”<<b t r e e t o s t r ( t r)<<”\n” ;const char∗ ks [ ] = {”F” , ”M” , ”G” , ”D” , ”B” , ”U” } ;for (unsigned int i =0; i<s izeof ( ks )/ s izeof (char ∗ ) ; ++i )

t r= t e s t d e l ( tr , ks [ i ] ) ;delete t r ;

}

template<class T>T∗ t e s t d e l (T∗ tr , typename T : : key type key ){

std : : cout<<” d e l e t e ”<<key<<”==>\n” ;t r = de l ( tr , key ) ;s td : : cout<<b t r e e t o s t r ( t r)<<”\n” ;return t r ;

}Run ‘test delete’ will generate the below result.

test delete...before delete:(((A, B), C, (D, E, F), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))delete F==>(((A, B), C, (D, E), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))delete M==>(((A, B), C, (D, E), G, (J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))

32

Page 33: BTree

delete G==>(((A, B), C, (D, E, J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))delete D==>((A, B), C, (E, J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))delete B==>((A, C), E, (J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))delete U==>((A, C), E, (J, K), L, (N, O), P, (Q, R), S, (T, V), X, (Y, Z))

Figure 12, 13, and 14 show this deleting test process step by step. Thenodes modified are shaded. The first 5 steps are as same as the example shownin textbook[1] figure 18.8.

P

C G M T X

A B D E F J K L N O Q R S U V Y Z

a. A B-tree before performing deleting;

P

C G M T X

A B D E J K L N O Q R S U V Y Z

b. After delete key ’F’, case 1;

Figure 12: Result of B-tree deleting program (1)

4.1.3 Merge before deletion algorithm implemented in Python

In Python implementation, detailed memory management can be handled byGC. Similar as the C++ program, some auxiliary member functions are addedto B-tree node definition.

class BTreeNode :#. . .def merge ch i ld ren ( s e l f , i ) :

#merge ch i l d r en [ i ] and ch i l d r en [ i +1] by pushing keys [ i ] downs e l f . c h i l d r en [ i ] . keys += [ s e l f . keys [ i ] ]+ s e l f . c h i l d r en [ i +1] . keys

33

Page 34: BTree

P

C G L T X

A B D E J K N O Q R S U V Y Z

c. After delete key ’M’, case 2a;

P

C L T X

A B D E J K N O Q R S U V Y Z

d. After delete key ’G’, case 2c;

Figure 13: Result of B-tree deleting program (2)

C L P T X

A B E J K N O Q R S U V Y Z

e. After delete key ’D’, case 3b, and height is shrunk;

E L P T X

A C J K N O Q R S U V Y Z

f. After delete key ’B’, case 3a, borrow from right sibling;

E L P S X

A C J K N O Q R T V Y Z

g. After delete key ’U’, case 3a, borrow from left sibling;

Figure 14: Result of B-tree deleting program (3)

34

Page 35: BTree

s e l f . c h i l d r en [ i ] . c h i l d r en += s e l f . c h i l d r en [ i +1] . c h i l d r ens e l f . keys . pop ( i )s e l f . c h i l d r en . pop ( i +1)

def r ep l a c e k ey ( s e l f , i , key ) :s e l f . keys [ i ] = keyreturn key

def can remove ( s e l f ) :return l en ( s e l f . keys ) >= s e l f . t

The member function names are same with the C++ program, so that themeaning for each of them can be referred in previous sub section.

In contrast to the C++ program, a recursion approach similar to the pseudocode is used in this Python program.

def B t r e e d e l e t e ( tr , key ) :i = l en ( t r . keys )while i >0:

i f key == tr . keys [ i −1] :i f t r . l e a f : # case 1 in CLRS

t r . keys . remove ( key )#d i s k w r i t e ( t r )

else : # case 2 in CLRSi f t r . c h i l d r en [ i −1] . can remove ( ) : # case 2a

key = t r . r ep l a c e k ey ( i −1, t r . c h i l d r en [ i −1] . keys [−1])B t r e e d e l e t e ( t r . c h i l d r en [ i −1] , key )

e l i f t r . c h i l d r en [ i ] . can remove ( ) : # case 2bkey = t r . r ep l a c e k ey ( i −1, t r . c h i l d r en [ i ] . keys [ 0 ] )B t r e e d e l e t e ( t r . c h i l d r en [ i ] , key )

else : # case 2ct r . merge ch i ld ren ( i −1)B t r e e d e l e t e ( t r . c h i l d r en [ i −1] , key )i f t r . keys ==[] : # tr e e sh r in k s in h e i g h t

t r = t r . c h i l d r en [ i −1]return t r

e l i f key > t r . keys [ i −1] :break

else :i = i−1

# case 3i f t r . l e a f :

return t r #key doesn ’ t e x i s t a t a l li f not t r . c h i l d r en [ i ] . can remove ( ) :

i f i >0 and t r . c h i l d r en [ i −1] . can remove ( ) : #l e f t s i b l i n gt r . c h i l d r en [ i ] . keys . i n s e r t (0 , t r . keys [ i −1])t r . keys [ i −1] = t r . c h i l d r en [ i −1] . keys . pop ( )

35

Page 36: BTree

i f not t r . c h i l d r en [ i ] . l e a f :t r . c h i l d r en [ i ] . c h i l d r en . i n s e r t (0 , t r . c h i l d r en [ i −1] . c h i l d r en . pop ( ) )

e l i f i<l en ( t r . c h i l d r en ) and t r . c h i l d r en [ i +1] . can remove ( ) : #r i g h t s i b l i n gt r . c h i l d r en [ i ] . keys . append ( t r . keys [ i ] )t r . keys [ i ]= t r . c h i l d r en [ i +1] . keys . pop (0 )i f not t r . c h i l d r en [ i ] . l e a f :

t r . c h i l d r en [ i ] . c h i l d r en . append ( t r . c h i l d r en [ i +1] . c h i l d r en . pop ( 0 ) )else : # case 3b

i f i >0:t r . merge ch i ld ren ( i −1)

else :t r . merge ch i ld ren ( i )

B t r e e d e l e t e ( t r . c h i l d r en [ i ] , key )i f t r . keys ==[] : # tr e e sh r in k s in h e i g h t

t r = t r . c h i l d r en [ 0 ]return t r

In order to verify the deletion program, similar test cases are fed to thefunction.

def t e s t d e l e t e ( ) :print ” t e s t d e l e t e ”t = 3t r = BTreeNode ( t , Fa l se )t r . keys=[”P” ]t r . c h i l d r en =[BTreeNode ( t , Fa l se ) , BTreeNode ( t , Fa l se ) ]t r . c h i l d r en [ 0 ] . keys=[”C” , ”G” , ”M” ]t r . c h i l d r en [ 0 ] . c h i l d r en =[BTreeNode ( t ) , BTreeNode ( t ) , BTreeNode ( t ) , BTreeNode ( t ) ]t r . c h i l d r en [ 0 ] . c h i l d r en [ 0 ] . keys=[”A” , ”B” ]t r . c h i l d r en [ 0 ] . c h i l d r en [ 1 ] . keys=[”D” , ”E” , ”F” ]t r . c h i l d r en [ 0 ] . c h i l d r en [ 2 ] . keys=[”J” , ”K” , ”L” ]t r . c h i l d r en [ 0 ] . c h i l d r en [ 3 ] . keys=[”N” , ”O” ]t r . c h i l d r en [ 1 ] . keys=[”T” , ”X” ]t r . c h i l d r en [ 1 ] . c h i l d r en =[BTreeNode ( t ) , BTreeNode ( t ) , BTreeNode ( t ) ]t r . c h i l d r en [ 1 ] . c h i l d r en [ 0 ] . keys=[”Q” , ”R” , ”S” ]t r . c h i l d r en [ 1 ] . c h i l d r en [ 1 ] . keys=[”U” , ”V” ]t r . c h i l d r en [ 1 ] . c h i l d r en [ 2 ] . keys=[”Y” , ”Z” ]print B t r e e t o s t r ( t r )l s t = [ ”F” , ”M” , ”G” , ”D” , ”B” , ”U” ]reduce ( t e s t d e l , l s t , t r )

def t e s t d e l ( tr , key ) :print ” d e l e t e ” , keyt r = B t r e e d e l e t e ( tr , key )print B t r e e t o s t r ( t r )return t r

36

Page 37: BTree

In this test case, the B-tree is constructed manually. It is identical to theB-tree built in C++ deleting test case. Run the test function will generate thefollowing result.

test delete(((A, B), C, (D, E, F), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))delete F(((A, B), C, (D, E), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))delete M(((A, B), C, (D, E), G, (J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))delete G(((A, B), C, (D, E, J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z)))delete D((A, B), C, (E, J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))delete B((A, C), E, (J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z))delete U((A, C), E, (J, K), L, (N, O), P, (Q, R), S, (T, V), X, (Y, Z))

This result is as same as the one output by C++ program.

4.2 Delete and fix method

From previous sub-sections, we see how complex is the deletion algorithm, Thereare several cases, and in each case, there are sub cases to deal.

Another approach to design the deleting algorithm is a kind of delete-then-fixway. It is similar to the insert-then-fix strategy.

When we need delete a key from a B-tree, we firstly try to locate whichnode this key is contained. This will be a traverse process from the root nodetowards leaves. We start from root node, If the key doesn’t exist in the node,we’ll traverse deeper and deeper until we rich a node.

If this node is a leaf node, we can remove the key directly, and then examineif the deletion makes the node contains too few keys to maintain the B-treebalance properties.

If it is a branch node, removing the key will break the node into two parts,we need merge them together. The merging is a recursive process which can beshown in figure 15.

When do merging, if the two nodes are not leaves, we merge the keys to-gether, and recursively merge the last child of the left part and the first child ofthe right part as one new child node. Otherwise, if they are leaves, we merelyput all keys together.

Till now, we do the deleting in straightforward way. However, deleting willdecrease the number of keys of a node, and it may result in violating the B-tree balance properties. The solution is to perform a fixing along the path wetraversed from root.

When we do recursive deletion, the branch node is broken into 3 parts.The left part contains all keys less than k, say k1, k2, ..., ki−1, and children

37

Page 38: BTree

Figure 15: Delete a key from a branch node. Removing ki breaks the node into2 parts, left part and right part. Merging these 2 parts is a recursive process.When the two parts are leaves, the merging terminates.

38

Page 39: BTree

Figure 16: Denote c′i as the result of recursively deleting key k, from child ci,we should do fixing when making the left part, c′i and right part together to anew node.

c1, c2, ..., ci−1, the right part contains all keys greater than k, say ki, ki+1, ..., kn+1,and children ci+1, ci+2, ..., cn+1, the child ci which recursive deleting applied be-comes c′i. We need make these 3 parts to a new node as shown in figure 16.

At this time point, we can examine if c′i contains enough keys, it the numberof keys is to less (less than t − 1, but not t in contrast to merge and deleteapproach), we can either borrow a key-child pair from left part or right part,and do a inverse operation of splitting. Figure 17 shows an example of borrowfrom left part.

In case both left part and right part is empty, we can simply push c′i up.

4.2.1 Delete and fix algorithm implemented functionally

By summarize all above analysis, we can draft the delete and fix algorithm.1: function B-TREE-DELETE’(T, k)2: return FIX −ROOT (DEL(T, k))3: end function4: function DEL(T, k)5: if CHILDREN(T ) = NIL then . leaf node6: DELETE(KEY S(T ), k)7: return T8: else . branch node

39

Page 40: BTree

Figure 17: Borrow a key-child pair from left part and un-split to a new child.

9: n ← LENGTH(KEY S(T ))10: i ← LOWER−BOUND(KEY S(T ), k)11: if KEY S(T )[i] = k then12: kl ← KEY S(T )[1, ..., i− 1]13: kr ← KEY S(T )[i + 1, ..., n]14: cl ← CHILDREN(T )[1, ..., i]15: cr ← CHILDREN(T )[i + 1, ..., n + 1]16: return MERGE(CREATE − B − TREE(kl, cl), CREATE −

B − TREE(kr, cr))17: else18: kl ← KEY S(T )[1, ..., i− 1]19: kr ← KEY S(T )[i, ..., n]20: c ← CHILDREN(T )[i]21: cl ← CHILDREN(T )[1, ..., i− 1]22: cr ← CHILDREN(T )[i + 1, ..., n + 1]23: return MAKE((kl, cl), c, (kr, cr))24: end if25: end if26: end function

The main delete function will call an internal DEL function to performs thework, after that, it will apply FIX −ROOT to check if need to shrink the treeheight. So the FIX−ROOT function we defined in insertion section should be

40

Page 41: BTree

updated as the following.1: function FIX-ROOT(T )2: if KEY S(T ) = NIL then . Single child, shrink the height3: T ← CHILDREN(T )[1]4: else if FULL?(T ) then5: T ← B − TREE − SPLIT (T )6: end if7: return T8: end function

For the recursive merging, the algorithm is given as below. The left partand right part are passed as arguments. If they are leaves, we just put all keystogether. Otherwise, we recursively merge the last child of left and the firstchild of right to a new child, and make this new merged child and the other twoparts it breaks into a new node.1: function MERGE(L,R)2: if L,R are leaves then3: T ← CREATE −NEW −NODE()4: KEY S(T ) ← KEY S(L) + KEY S(R)5: return T6: else7: mgetsLENGTH(KEY S(L))8: ngetsLENGTH(KEY S(R))9: kl ← KEY S(L)

10: kr ← KEY S(R)11: cl ← CHILDREN(L)[1, ..., m− 1]12: cr ← CHILDREN(R)[2, ..., n]13: c ← MERGE(CHILDREN(L)[m], CHILDREN(R)[1])14: return MAKE −B − TREE((kl, cl), c, (kr, cr))15: end if16: end function

In order to make the three parts, the left L, the right R and the child c′i intoa node, we need examine if ci contains enough keys, together with the process ofensure it contains not too much keys during insertion, we updated the algorithmlike the following.1: function MAKE-B-TREE(L,C, R)2: if FULL?(C) then3: return FIX − FULL(L,C, R)4: else if LOW?(C) then5: return FIX − LOW (L,C, R)6: else7: T ← CREATE −NEW −NODE()8: KEY S(T ) ← KEY S(L) + KEY S(R)9: CHILDREN(T ) ← CHILDREN(L) + [C] + CHILDREN(R)

10: return T11: end if

41

Page 42: BTree

12: end functionWhere FIX − LOW is defined as the following. In case the left part isn’t

empty, it will borrow a key-child pair from the left, and do un-split to make thechild contains enough keys, then recursively call MAKE − B − TREE; If theleft part is empty, it will try to borrow key-child pair from the right part, and ifboth sides are empty, it will returns the child node as result, so that the heightshrinks.1: function FIX-LOW(L,C, R)2: kl, cl ← L3: kr, cr ← R4: m ← LENGTH(kl)5: n ← LENGTH(kr)6: if kl 6= NIL then7: k′l ← kl[1, ..., m− 1]8: c′l ← cl[1, ..., m− 1]9: C ′ ← UN − SPLIT (cl[m], kl[m], C)

10: return MAKE −B − TREE((k′l, c′l), C

′, R)11: else if kr 6= NIL then12: k′r ← kr[2, ..., n]13: c′r ← cr[2, ..., n]14: C ′ ← UN − SPLIT (C, kr[1], cr[1])15: return MAKE −B − TREE(L,C ′, (k′r, c

′r))

16: else17: return C18: end if19: end function

Function UN − SPLIT defines as the inverses operation of splitting.1: function UN-SPLIT(L, k,R)2: T ← CREATE −B − TREE −NODE()3: KEY S(T ) ← KEY S(L) + [k] + KEY S(R)4: CHILDREN(T ) ← CHILDREN(L) + CHILDREN(R)5: return T6: end function

4.2.2 Delete and fix algorithm implemented in Haskell

Based on the analysis of delete-then-fixing approach, a Haskell program can beprovided accordingly.

The core deleting function is simple, it just call an internal removing func-tion, then examine the root node to see if the height of the tree can be shrunk.

import quali f ied Data . List as L

delete : : (Ord a ) => BTree a −> a −> BTree adelete t r x = f ixRoot $ de l t r x

42

Page 43: BTree

de l : : (Ord a ) => BTree a −> a −> BTree ade l (Node ks [ ] t ) x = Node (L . delete x ks ) [ ] tde l (Node ks cs t ) x =

case L . elemIndex x ks ofJust i −> merge (Node ( take i ks ) ( take ( i +1) cs ) t )

(Node (drop ( i +1) ks ) (drop ( i +1) cs ) t )Nothing −> make ( ks ’ , cs ’ ) ( de l c x ) ( ks ’ ’ , cs ’ ’ )

where( ks ’ , ks ’ ’ ) = L . partition (<x ) ks( cs ’ , ( c : cs ’ ’ ) ) = L . splitAt ( length ks ’ ) cs

Let’s focus on the ‘del’ function, if try to delete a key from a leaf node, itjust calls delete function defined in Data.List library. If the key doesn’t existat all, the pre-defined delete function will simply return the list without anymodification. For the case of deleting a key from a branch node, it will firstexamine if the key can be located in this node, and apply recursive merge afterremove this key. Otherwise, it will locate the proper child and do recursivedelete-then-fixing on this child.

Note that ‘partition’ and ’splitAt’ functions defined in Data.List can help tosplit the key and children list at the position that all elements on the left is lessthan the key while the right part are greater than the key.

The recursive merge program has two patterns, merge two leaves and mergetwo branches. It is given as the following.

merge : : BTree a −> BTree a −> BTree amerge (Node ks [ ] t ) (Node ks ’ [ ] ) = Node ( ks++ks ’ ) [ ] tmerge (Node ks cs t ) (Node ks ’ cs ’ ) = make ( ks , in i t cs )

( merge ( last cs ) (head cs ’ ) )( ks ’ , t a i l cs ’ )

Where ‘init’, ‘last’, ’tail’ functions are used to manipulate list which aredefined in Haskell prelude.

The fixing part of delete-then-fixing is defined inside ’make’ function.

make : : ( [ a ] , [ BTree a ] ) −> BTree a −> ( [ a ] , [ BTree a ] ) −> BTree amake ( ks ’ , cs ’ ) c ( ks ’ ’ , cs ’ ’ )

| f u l l c = f i x F u l l ( ks ’ , cs ’ ) c ( ks ’ ’ , cs ’ ’ )| low c = fixLow ( ks ’ , cs ’ ) c ( ks ’ ’ , cs ’ ’ )| otherwise = Node ( ks ’++ks ’ ’ ) ( cs ’++[c]++cs ’ ’ ) ( degree c )

Where function ‘low’ is used to test if a node contains too few keys.

low : : BTree a −> Boollow t r = ( length $ keys t r ) < ( degree t r )−1

The real fixing is implemented by try to borrow keys either from left siblingor right sibling as the following.

fixLow : : ( [ a ] , [ BTree a ] ) −> BTree a −> ( [ a ] , [ BTree a ] ) −> BTree afixLow ( ks ’@( : ) , cs ’ ) c ( ks ’ ’ , cs ’ ’ ) = make ( in i t ks ’ , in i t cs ’ )

43

Page 44: BTree

( un sp l i t ( last cs ’ ) ( last ks ’ ) c )( ks ’ ’ , cs ’ ’ )

f ixLow ( ks ’ , cs ’ ) c ( ks ’ ’@( : ) , cs ’ ’ ) = make ( ks ’ , cs ’ )( un sp l i t c (head ks ’ ’ ) (head cs ’ ’ ) )( ta i l ks ’ ’ , ta i l cs ’ ’ )

f ixLow c = c

Note that by using ‘x@( : )’ like pattern can help to ensure ’x’ is not empty.Here function ‘unsplit’ is used which will do inverse splitting operation likebelow.

un sp l i t : : BTree a −> a −> BTree a −> BTree aun sp l i t c1 k c2 = Node ( ( keys c1)++[k]++(keys c2 ) )

( ( ch i l d r en c1)++( ch i l d r en c2 ) ) ( degree c1 )

In order to verify the Haskell program, we can provide some simple testcases.

import Control .Monad ( foldM , mapM )

t e s tDe l e t e = foldM delShow ( l i stToBTree ”GMPXACDEJKNORSTUVYZBFHIQW” 3) ”EGAMU”where

delShow t r x = dolet tr ’ = delete t r xputStrLn $ ” d e l e t e ”++(show x )putStrLn $ toS t r i ng tr ’return tr ’

Where function ‘listToBTree’ and ‘toString’ are defined in previous sectionwhen we explain insertion algorithm.

Run this function will generate the following result.

delete ’E’(((’A’, ’B’), ’C’, (’D’, ’F’), ’G’, (’H’, ’I’, ’J’, ’K’)), ’M’,((’N’, ’O’), ’P’, (’Q’, ’R’, ’S’), ’T’, (’U’, ’V’), ’W’, (’X’, ’Y’, ’Z’)))delete ’G’(((’A’, ’B’), ’C’, (’D’, ’F’), ’H’, (’I’, ’J’, ’K’)), ’M’,((’N’, ’O’), ’P’, (’Q’, ’R’, ’S’), ’T’, (’U’, ’V’), ’W’, (’X’, ’Y’, ’Z’)))delete ’A’((’B’, ’C’, ’D’, ’F’), ’H’, (’I’, ’J’, ’K’), ’M’, (’N’, ’O’),’P’, (’Q’, ’R’, ’S’), ’T’, (’U’, ’V’), ’W’, (’X’, ’Y’, ’Z’))delete ’M’((’B’, ’C’, ’D’, ’F’), ’H’, (’I’, ’J’, ’K’, ’N’, ’O’), ’P’,(’Q’, ’R’, ’S’), ’T’, (’U’, ’V’), ’W’, (’X’, ’Y’, ’Z’))delete ’U’((’B’, ’C’, ’D’, ’F’), ’H’, (’I’, ’J’, ’K’, ’N’, ’O’), ’P’,(’Q’, ’R’, ’S’, ’T’, ’V’), ’W’, (’X’, ’Y’, ’Z’))

If we try to delete the same key from the same B-tree as in merge and fixingapproach, we can found that the result is different by using delete-then-fixing

44

Page 45: BTree

methods. Although the results are not as same as each other, both satisfy theB-tree properties, so they are all correct.

M

C G P T W

A B D E F H I J K N O Q R S U V X Y Z

a. A B-tree before performing deleting;

M

C G P T W

A B D F H I J K N O Q R S U V X Y Z

b. After delete key ’E’

Figure 18: Result of delete-then-fixing (1)

4.2.3 Delete and fix algorithm implemented in Scheme/Lisp

In order to implement delete program in Scheme/Lisp, we provide an extrafunction to test if a node contains too few keys after deletion.

( d e f i n e ( low? t r t ) ; ; t : minimum degree(< ( length ( keys t r ) )

(− t 1 ) ) )

And some general purpose list manipulation functions are defined.

( d e f i n e ( rest l s t k )( l i s t− t a i l l s t (− ( length l s t ) k ) ) )

( d e f i n e ( except−rest l s t k )( l i s t−head l s t (− ( length l s t ) k ) ) )

( d e f i n e ( f i r s t l s t )( i f ( null ? l s t ) ’ ( ) ( car l s t ) ) )

( d e f i n e ( last l s t )

45

Page 46: BTree

M

C H P T W

A B D F I J K N O Q R S U V X Y Z

c. After delete key ’G’;

H M P T W

B C D F I J K N O Q R S U V X Y Z

d. After delete key ’A’;

Figure 19: Result of delete-then-fixing (2)

H P T W

B C D F I J K N O Q R S U V X Y Z

e. After delete key ’M’;

H P W

B C D F I J K N O Q R S T V X Y Z

f. After delete key ’U’;

Figure 20: Result of delete-then-fixing (3)

46

Page 47: BTree

( i f ( null ? l s t ) ’ ( ) ( car ( l a s t−pa i r l s t ) ) ) )

( d e f i n e ( i n i t s l s t )( i f ( null ? l s t ) ’ ( ) ( except− last−pair l s t ) ) )

Function ‘rest’ can extract the last k elements from a list, while ‘except-rest’used to extract all except the last k elements. ‘first’ can be treat as a safe‘car’, it will return empty list but not throw exception when the list is empty.Function ‘last’ returns the last element of a list, and if the list is empty, it willreturn empty result. Function ‘inits’ returns all excluding the last element.

And a inversion operation of splitting is provided.

( d e f i n e ( un−spl i t l s t )( let ( ( c1 ( car l s t ) )

( k ( cadr l s t ) )( c2 ( caddr l s t ) ) )

(append c1 ( l i s t k ) c2 ) ) )\end{ l s t l i s i n g }

The main func t i on o f d e l e t i o n i s de f i ned as the f o l l ow i n g .

\begin { l s t l i s t i n g }( d e f i n e ( bt ree−de l e te t r x t )

( d e f i n e ( de l t r x )( i f ( l e a f ? t r )

( delete x t r )( let ∗ ( ( r e s ( part i t ion−by t r x ) )

( l e f t ( car r e s ) )( c ( cadr r e s ) )( r i g h t ( caddr r e s ) ) )

( i f ( equal? ( f i r s t r i g h t ) x )( merge−btree (append l e f t ( l i s t c ) ) ( cdr r i g h t ) t )( make−btree l e f t ( de l c x ) r i g h t t ) ) ) ) )

( f ix− root ( de l t r x ) t ) )

It is implemented in a similar way as the insertion, call an internal defined‘del’ function then apply fixing process on it. In the internal deletion fiction, ifthe B-tree is a leaf node, the standard list deleting function defined in standardlibrary is applied. If it is a branch node, we call the ‘partition-by’ functiondefined previously. This function will divide the node into 3 parts, all childrenand keys less than x as the left part, a child node next, all keys not less than(greater than or equal to) x and children s the right part.

If the first key in right part is equal to x, it means x can be located inthis node, we remove x from right and then call ‘merge-btree’ to merge left+c,right-x to one new node.

( d e f i n e ( merge−btree t r1 t r2 t )( i f ( l e a f ? t r1 )

47

Page 48: BTree

(append t r1 t r2 )( make−btree ( i n i t s t r1 )

( merge−btree ( last t r1 ) ( car t r2 ) t )( cdr t r2 )t ) ) )

Otherwise, x may be located in c, so we need recursively try to delete x fromc.

Function ‘fix-root’ is updated to handle the cases for deletion as below.

( d e f i n e ( f ix− root t r t )(cond ( ( null ? t r ) ’ ( ) ) ; ; empty t r e e

( ( f u l l ? t r t ) ( sp l i t t r t ) )( ( null ? ( keys t r ) ) ( car t r ) ) ; ; s h r ink h e i g h t( e l s e t r ) ) )

We added one case to handle if a node contains too few keys after deletingin ‘make-btree’.

( d e f i n e ( make−btree l c r t )(cond ( ( f u l l ? c t ) ( f i x− f u l l l c r t ) )

( ( low? c t ) ( f ix− low l c r t ) )( e l s e (append l (cons c r ) ) ) ) )

Where ‘fix-low’ is defined to try to borrow a key and a child either from leftsibling or right sibling.

( d e f i n e ( f ix− low l c r t )(cond ( (not ( null ? ( keys l ) ) )

( make−btree ( except−rest l 2)( un−sp l i t (append ( rest l 2) ( l i s t c ) ) )r t ) )

( (not ( null ? ( keys r ) ) )( make−btree l

( un−sp l i t (cons c ( l i s t−head r 2 ) ) )( l i s t− t a i l r 2) t ) )

( e l s e c ) ) )

In order to verify the the deleting program, a simple test is fed to the abovedefined function.

( d e f i n e ( t e s t−de l e t e )( d e f i n e ( del−and−show t r x )

( let ( ( r ( bt ree−de l e te t r x 3 ) ) )( begin ( d i sp l ay r ) ( d i sp l ay ”\n” ) r ) ) )

( f o l d− l e f t del−and−show( l i s t−>btree ( str−> s l i s t ”GMPXACDEJKNORSTUVYZBFHIQW”) 3)( str−> s l i s t ”EGAMU” ) ) )

Run the test will generate the following result.

48

Page 49: BTree

( ( (A B) C (D F) G (H I J K) ) M ( (N O) P (Q R S) T (U V) W (X Y Z) ) )( ( (A B) C (D F) H ( I J K) ) M ( (N O) P (Q R S) T (U V) W (X Y Z) ) )( (B C D F) H ( I J K) M (N O) P (Q R S) T (U V) W (X Y Z) )( (B C D F) H ( I J K N O) P (Q R S) T (U V) W (X Y Z) )( (B C D F) H ( I J K N O) P (Q R S T V) W (X Y Z) )

Compare with the output by the Haskell program in previous section, it canbe found they are same.

5 Searching

Although searching in B-tree can be considered as a generalized form of treesearch which extended from binary search tree, it’s good to mention that indisk access case, instead of just returning the satellite data corresponding tothe key, it’s more meaningful to return the whole node, which contains the key.

5.1 Imperative search algorithm

When searching in Binary tree, there are only 2 different directions, left andright to go further searching, however, in B-tree, we need extend the searchdirections to cover the number of children in a node.1: function B-TREE-SEARCH(T, k)2: loop3: i ← 14: while i ≤ LENGTH(KEY S(T )) and k > KEY S(T )[i] do5: k ← k + 16: end while7: if i ≤ LENGTH(KEY S(T )) and k = KEY S(T )[i] then8: return (T, i)9: end if

10: if T is leaf then11: return NIL . k doesn’t exist at all12: else13: T ← CHILDREN(T )[i]14: end if15: end loop16: end function

When doing search, the program examine each key from the root node bytraverse from the smallest towards the biggest one. in case it find a matchedkey, it returns the current node as well as the index of this keys. Otherwise, ifit finds this key satisfying ki < k < ki+1, The program will update the currentnode to be examined as child node ci+1. If it fails to find this key in a leaf node,empty value is returned to indicate the fail case.

Note that in “Introduction to Algorithm”, this program is described withrecursion, Here the recursion is eliminated.

49

Page 50: BTree

search program in C++

In C++ implementation, we can use pair provided in STL library as the returntype.

template<class T>std : : pair<T∗ , unsigned int> search (T∗ t , typename T : : key type k ){

for ( ; ; ) {unsigned int i ( 0 ) ;for ( ; i < t−>keys . s i z e ( ) && k > t−>keys [ i ] ; ++i ) ;i f ( i < t−>keys . s i z e ( ) && k == t−>keys [ i ] )

return std : : make pair ( t , i ) ;i f ( t−> l e a f ( ) )

break ;t = t−>ch i l d r en [ i ] ;

}return std : : make pair ( (T∗ )0 , 0 ) ; //not found

}And the test cases are given as below.

void t e s t s e a r c h ( ){std : : cout<<” t e s t search . . . \ n” ;const char∗ s s [ ] = {”G” , ”M” , ”P” , ”X” , ”A” , ”C” , ”D” , ”E” , ”J” , ”K” , \

”N” , ”O” , ”R” , ”S” , ”T” , ”U” , ”V” , ”Y” , ”Z” } ;BTree<std : : s t r i ng , 3>∗ t r = l i s t t o b t r e e ( ss , s s+s izeof ( s s )/ s izeof (char ∗ ) ,

new BTree<std : : s t r i ng , 3>);s td : : cout<<”\n”<<b t r e e t o s t r ( t r)<<”\n” ;for (unsigned int i =0; i<s izeof ( s s )/ s izeof (char ∗ ) ; ++i )

t e s t s e a r c h ( tr , s s [ i ] ) ;t e s t s e a r c h ( tr , ”W” ) ;

delete t r ;}

template<class T>void t e s t s e a r c h (T∗ t , typename T : : key type k ){

std : : pair<T∗ , unsigned int> r e s = search ( t , k ) ;i f ( r e s . f i r s t )

std : : cout<<” found ”<<r e s . f i r s t −>keys [ r e s . second]<<”\n” ;else

std : : cout<<”not found ”<<k<<”\n” ;}Run ‘test search’ function will generate the following result.

test search...((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z))found Gfound M

50

Page 51: BTree

...found Znot found W

Here the program can find all keys we inserted.

search program in Python

Change a bit the above algorithm in Python gets the program corresponding tothe pseudo code mentioned in “Introduction to Algorithm” textbook.

def B tr e e s e a r ch ( tr , key ) :for i in range ( l en ( t r . keys ) ) :

i f key<= tr . keys [ i ] :break

i f key == tr . keys [ i ] :return ( tr , i )

i f t r . l e a f :return None

else :i f key>t r . keys [ −1 ] :

i=i+1#di s k r e adreturn B tr e e s e a r ch ( t r . c h i l d r en [ i ] , key )

There is a minor modification from the original pseudo code. We uses for-loop to iterate the keys, the the boundary check is done by compare the lastkey in the node and adjust the index if necessary.

Let’s feed some simple test cases to this program.

def t e s t s e a r c h ( ) :l s t = [ ”G” , ”M” , ”P” , ”X” , ”A” , ”C” , ”D” , ”E” , ”J” , ”K” , \

”N” , ”O” , ”R” , ”S” , ”T” , ”U” , ”V” , ”Y” , ”Z” ]t r = l i s t t o B t r e e ( l s t , 3)print ” t e s t search \n” , B t r e e t o s t r ( t r )for i in l s t :

t e s t s e a r c h ( tr , i )t e s t s e a r c h ( tr , ”W” )

def t e s t s e a r c h ( tr , k ) :r e s = B t r e e s e a r ch ( tr , k )i f r e s i s None :

print k , ”not found”else :

( node , i ) = r e sprint ” found” , node . keys [ i ]

Run the function ‘test search’ will generate the following result.

51

Page 52: BTree

found Gfound M...found ZW not found

5.2 Functional search algorithm

The imperative algorithm can be turned into Functional by performing recursivesearch on a child in case key can’t be located in current node.1: function B-TREE-SEARCH(T, k)2: i ← FIND − FIRST (λxx >= k, KEY S(T ))3: if i exists and k = KEY S(T )[i] then4: return (T, i)5: end if6: if T is leaf then7: return NIL . k doesn’t exist at all8: else9: return B − TREE − SEARCH(CHILDREN(T )[i], k)

10: end if11: end function

Search program in Haskell

In Haskell program, we first filter out all keys less than the key to be searched.Then check the first element in the result. If it matches, we return the currentnode along with the index as a tuple. Where the index start from ‘0’. If itdoesn’t match, We then do recursive search till leaf node.

search : : (Ord a )=> BTree a −> a −> Maybe (BTree a , Int )search tr@ (Node ks cs ) k

| matchFirst k $ drop l en ks = Just ( tr , l en )| otherwi se = i f nu l l c s then Nothing

else search ( cs ! ! l en ) kwhere

matchFirst x (y : ) = x==ymatchFirst x = Falsel en = length $ f i l t e r (<k ) ks

The verification test cases are provided as the following.

t e s tSea r ch = mapM ( showSearch ( l i stToBTree l s t 3 ) ) $ l s t++”L”where

showSearch t r x = docase search t r x o f

Just ( , i ) −> putStrLn $ ” found” ++ ( show x)Nothing −> putStrLn $ ”not found” ++ ( show x)

l s t = ”GMPXACDEJKNORSTUVYZBFHIQW”

52

Page 53: BTree

Here we construct a B-tree from a series of string, then we check if eachelement in this string can be located. Finally, an non-existed element “L” is fedto verify the failure case.

Run this test function generates the following results.

found’G’found’M’...found’W’not found’L’

Search program in Scheme/Lisp

Because we intersperse children and keys in one list in Scheme/Lisp B-treedefinition, the search function just move one step a head to locate the key in anode.

( d e f i n e ( btree−search t r x ); ; f i n d the sma l l e s t index where keys [ i ]>= x( d e f i n e ( f ind− index t r x )

( let ( ( pred ( i f ( string ? x ) string>=? >=)))( i f ( null ? t r )

0( i f (and (not ( l i s t ? ( car t r ) ) ) ( pred ( car t r ) x ) )

0(+ 1 ( f ind− index ( cdr t r ) x ) ) ) ) ) )

( let ( ( i ( f ind− index t r x ) ) )( i f (and (< i ( length t r ) ) ( equal? x ( l i s t− r e f t r i ) ) )

(cons t r i )( i f ( l e a f ? t r ) #f ( btree−search ( l i s t− r e f t r (− i 1 ) ) x ) ) ) ) )

The program defines an inner function to find the index of the first elementwhich is greater or equal to the key we are searching.

If the key pointed by this index matches, we are done. Otherwise, this indexpoints to a child which may contains this key. The program will return falseresult in case the current node is a leaf node.

We can run the below testing function to verify this searching program.

( d e f i n e ( te s t− search )( d e f i n e ( search−and−show t r x )

( i f ( btree−search t r x )( d i sp l ay ( l i s t ” found ” x ) )( d i sp l ay ( l i s t ”not found ” x ) ) ) )

( let ∗ ( ( l s t ( str−> s l i s t ”GMPXACDEJKNORSTUVYZBFHIQW” ) )( t r ( l i s t−>btree l s t 3 ) ) )

(map ( lambda (x ) ( search−and−show t r x ) ) (cons ”L” l s t ) ) ) )\end{ l s t l i s i t n g }

53

Page 54: BTree

A non−existed key ‘ ‘L ’ ’ i s f i r s t l y fed , and then a l l e lementswhich used to form the B−tree are looked up f o r v e r i f i c a t i o n .

\begin { l s t l i s t i n g }(not found L) ( found G) ( found M) . . . ( found W)

6 Notes and short summary

In this post, we explained the B-tree data structure as a kind of extensionfrom binary search tree. The background knowledge of magnetic disk accessis skipped, user can refer to [1] for detail. For the three main operations, in-sertion, deletion, and searching, both imperative and functional algorithms areillustrated. The complexity isn’t discussed here, However, since B-tree are de-fined to maintain the balance properties, all operations mentioned here performO(lgN) where N is the number of the keys in a B-tree.

7 Appendix

All programs provided along with this article are free for downloading.

7.1 Prerequisite software

GNU Make is used for easy build some of the program. For C++ and ANSIC programs, GNU GCC and G++ 3.4.4 are used. For Haskell programs GHC6.10.4 is used for building. For Python programs, Python 2.5 is used for testing,for Scheme/Lisp program, MIT Scheme 14.9 is used.

all source files are put in one folder. Invoke ’make’ or ’make all’ will buildC++ and Haskell program.

Run ’make Haskell’ will separate build Haskell program. the executable fileis “htest” (with .exe in Window like OS). It is also possible to run the programin GHCi.

7.2 Tools

Besides them, I use graphviz to draw most of the figures in this post. In orderto translate the B-tree output to dot script. A Haskell tool is provided. It canbe used like this.

bt2dot filename.dot "string"

Where filename.dot is the output file for the dot script. It can parse thestring which describes B-tree content and translate it into dot script.

This source code of this tool is BTr2dot.hs, it can also be downloaded withthis article.

download position: http://sites.google.com/site/algoxy/btree/btree.zip

54

Page 55: BTree

References

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and CliffordStein. “Introduction to Algorithms, Second Edition”. The MIT Press, 2001.ISBN: 0262032937.

[2] B-tree, Wikipedia. http://en.wikipedia.org/wiki/B-tree

[3] Liu Xinyu. “Comparison of imperative and functional implementation ofbinary search tree”. http://sites.google.com/site/algoxy/bstree

[4] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a FunctionalSetting”. J. Functional Programming. 1998

55