Chapter 15

Chapter 15

Augmenting Data Structures

Introduction

• “Text book” data structures are sufficient for many tasks, but not all

• Rarely need to create new data structures• Often sufficient to “augment” an existing

data structure with additional information and operations

• Not always straightforward - must be able to maintain added information with existing operations

§15.1 Dynamic Order Statistics

• Recall from Chapter 10 that from an unordered set, we can retrieve the ith order statistic from a set of n elements in O(n) time

• Red-black trees can be augmented to allow for fast retrieval of order statistics

• We shall also allow for quick determination of the rank of an element

Order Statistic Trees

• Standard red-black tree with an additional size field (the bottom number in each node)

• size contains # of nodes in subtree rooted at x, including x

• If nil->size = 0, then:x->size = x->left->size + x-

>right->size + 1

147

104

72

31

121

162

141

1712

192

214

211

201

281

351

391

383

305

471

417

2620

Retrieving Elements of a Given Rank

ostree::Select(node *x, int i)

{

int r = x->left->size + 1;

if ( i == r )

return x;

else if ( i < r )

return Select(x->left, i);

else

return Select(x->right, i-r);

}

• x->left->size contains the number of nodes that come before x in an inorder tree walk– x’s rank within it’s subtree is

therefore x->left->size + 1

• Recursive selection similar to the algorithms we saw in Chapter 10– If x is correct order stat, return– If x > correct order stat, recurse

left– If x < correct order stat, recurse

right, looking for the (i-r)th order statistic in the right subtree

What Is The 16th Order Statistic?

147

104

72

31

121

162

141

1712

192

214

211

201

281

351

391

383

305

471

417

2620

Analysis of ostree::Select

• Each level of recursion descends one level of the OS tree– Therefore, ostree::Select is at worst O(h), where h is the height of the tree

– Since the height of the tree is known to be O(lgn), ostree::Select has running time O(lgn)

Determining the Rank of an Element

ostree::Rank(node *x)

{


node *y = x;

while ( y != root )

{

if ( y == y->parent->right)

r += y->parent->left->size + 1;

y = y->parent;

}

return r;

}

• The rank of a node x = the # of nodes that precede it, + 1 for itself

• r is maintained as the rank of x in the subtree rooted at y - which denotes our position in the tree– To start, r is the rank

of x in it’s subtree

Determining the Rank of an Element

ostree::Rank(node *x)

{


node *y = x;

while ( y != root )

{

if ( y == y->parent->right)

r += y->parent->left->size + 1;

y = y->parent;

}

return r;

}

• Each loop ascends the tree, and calculates x’s rank in that subtree– If y is a left child, the

rank is unchanged– If y is a right child, the

rank is equal to the size of the left subtree, plus 1 for the parent node

What Is The Rank of 28?

147

104

72

31

121

162

141

1712

192

214

211

201

281

351

391

383

305

471

417

2620

Analysis of ostree::Rank

• Each loop ascends one level of the OS tree– Therefore, ostree::Rank is at worst O(h), where h is the height of the tree

– Since the height of the tree is known to be O(lgn), ostree::Rank has running time O(lgn)

Maintaining Subtree Sizes

• ostree::Select & ostree::Rank are only useful if we can efficiently maintain the size field

• To be truly efficient, these fields must be maintained through the basic maintenance operations of the tree

Maintaining Subtree Sizes• Insertion

– Recall the two operations: bst::Insert, and performing rotations

• bst::Insert– As we descend the tree to perform insertion,

increment size field of all traversed nodes

• Rotation– Only the size fields of the rotated nodes are affected– The new parent node simply assumes the size of the

old parent node– The rotated node must now assume calculate it’s

size as the sum of it’s children, plus one

Maintaining Subtree Sizes9319

4211

6 4

7x

y

9312

4219

6

4 7

y

x

RightRotate(y)

LeftRotate(x)

Size Maintenance Through Rotation:y->size = x->size;

x->size = x->left->size + x->right->size + 1;

What is total added cost to rotation?


• Deletion– Also has two phases: one to delete the

node, the other to maintain the tree with at most three rotations

– We already know the added cost of rotation– When we splice out a node, we can traverse

up the tree, and decrement one from every node along it’s path

• This requires O(lgn) additional time


• Analysis:– Insertion is changed by at most O(1)– Deletion is changed by at most O(lgn)– Thus, the total asymptotic running

time of insertion and deletion is unchanged at O(lgn)

§15.2 How To Augment A Data Structure

• Four steps:– Choosing an underlying data structure;– Determining additional information to be

maintained in the underlying data structure;– Verifying that the additional information can be

maintained by the basic modifying operations on the underlying data structure; and

– Developing new operations

• Note: this isn’t a “formula”, but a good starting point

Augmenting Red-Black Trees for Order Statistics

• Step 1:– We chose red-black trees as the underlying data

structure due to efficient support of other dynamic-set operations

• Step 2:– Augmented nodes with the size field, to allow the desired

operations to be more efficient

• Step 3:– We ensured that insert and delete can maintain the new

field and still operate in O(lgn) time

• Step 4:– We developed ostree::Select and ostree::Rank

Why Augment Red-Black Trees?

• From Theorem 15.1:– If the new field can computed and

maintained using only the information in nodes x, x->left, and x->right, then we can maintain the values of the new field in all nodes during insertion and deletion without asymptotically affecting the O(lgn) performance of these operations

§15.3 Interval Trees

• An interval is a pair of real numbers used to specify a range of values– A closed interval [t1, t2] specifies a range that includes

the endpoints

– An open interval (t1, t2) specifies a range that excludes the endpoints

– A half-open interval [t1, t2) or (t1, t2] excludes one of the endpoints

• E.g., consider a log that stores events sorted by time– We may want to query the log to find out what happened

during a given time interval

Intervals

• Assume intervals are represented as structs with two fields: lo and hi

• Consider two intervals x and y– Any two intervals must satisfy the

interval trichotomy:•x and y overlap (x.lo <= y.hi && y.lo <= x.hi)•x lies completely to the left of y (x.hi < y.lo)•x lies completely to the right of y (y.hi < x.lo)

Interval Trees

• Interval Tree is a red-black tree that maintains a dynamic set of nodes– Each node contains an interval

• Support these operations:– Insertion - adds an element to the tree– Deletion - removes an element from the

tree– Search - searches for an interval that

overlaps the requested interval

Interval Trees[16,21]30

[8,9]23

[5,8]10

[15,23]23

[0,3]3

[6,10]10

[25,30]30

[17,19]20

[26,26]26

[19,20]20

An interval tree sorted by the low endpoint of each interval

Interval Trees

• The interval tree stores intervals, and is sorted by the low endpoint

• Each node contains an additional field, max, which is the maximum value of any interval endpoint stored in its subtree– Maintained through insertion/deletion with

this O(1) statement:x->max = max(x->interval->hi, x->left->max, x->right->max)

– What about through rotations? The same operation applies, to the new parent node

Interval Trees: New Operations

• The only new operation is the Search operation:intervalTree::Search(interval i)

{

node *x = root;

while ( x != NULL && !Overlap(x->interval, i) )

{

if ( x->left != NULL &&

x->left->max >= i->lo )

x = x->left;

else

x = x->right;

}

}

Interval Tree Search

[16,21]30

[8,9]23

[5,8]10

[15,23]23

[0,3]3

[6,10]10

[25,30]30

[17,19]20

[26,26]26

[19,20]20

• Search this tree for the interval [22,25]

• Now search for [11,14]

• What is the asymptotic growth of Search?

• Why does it work?

Interval Tree Search

• Interval tree search algorithm finds the first overlapping interval

• How could we find all overlapping intervals?

Why Interval Tree Search Works

• Recall this part of the interval tree search algorithm:

if ( x->left != NULL && x->left->max >= i->lo )

x = x->left;

else

x = x->right;• If the else is executed, then the left branch is NULL, or the

lo end of the interval we’re searching for is to the right of highest hi endpoint in the left subtree, so if an interval exists it must be in the right subtree

• If the first branch is executed, then the max value in the left subtree is greater than lo value of the interval we’re searching for - so there can be an overlapping interval in the left subtree

Why Interval Tree Search Works

• Why aren’t there any in the right subtree?– Assume there are no overlapping intervals– The tree is sorted by the lo endpoint– All nodes in the right subtree have lo endpoints > all

nodes in the right subtree– Since i->lo < x->left->max, then for some node in the

left subtree, there’s a node with hi endpoint = x->left->max, and lo endpoint < all lo endpoints in right subtree

– If there are no overlapping intervals, it follows that i->lo < that node’s lo interval (otherwise it would overlap), so therefore if an overlapping interval existed it would have to be in the left subtree

Assignment

• Page 286: 15.1-3, 15.1-5• Page 295: 15.3-2, 15.3-3, 15.3-5,

15.3-6

Chapter 15

Education

xsize xsize

rank of x

node x

node size

r return x

size fields

left subtree

x correct order stat