Chapter 15 Augmenting Data Structures
Chapter 15
Augmenting Data Structures
Introduction
• “Text book” data structures are sufficient for many tasks, but not all
• Rarely need to create new data structures• Often sufficient to “augment” an existing
data structure with additional information and operations
• Not always straightforward - must be able to maintain added information with existing operations
§15.1 Dynamic Order Statistics
• Recall from Chapter 10 that from an unordered set, we can retrieve the ith order statistic from a set of n elements in O(n) time
• Red-black trees can be augmented to allow for fast retrieval of order statistics
• We shall also allow for quick determination of the rank of an element
Order Statistic Trees
• Standard red-black tree with an additional size field (the bottom number in each node)
• size contains # of nodes in subtree rooted at x, including x
• If nil->size = 0, then:x->size = x->left->size + x-
>right->size + 1
147
104
72
31
121
162
141
1712
192
214
211
201
281
351
391
383
305
471
417
2620
Retrieving Elements of a Given Rank
ostree::Select(node *x, int i)
{
int r = x->left->size + 1;
if ( i == r )
return x;
else if ( i < r )
return Select(x->left, i);
else
return Select(x->right, i-r);
}
• x->left->size contains the number of nodes that come before x in an inorder tree walk– x’s rank within it’s subtree is
therefore x->left->size + 1
• Recursive selection similar to the algorithms we saw in Chapter 10– If x is correct order stat, return– If x > correct order stat, recurse
left– If x < correct order stat, recurse
right, looking for the (i-r)th order statistic in the right subtree
What Is The 16th Order Statistic?
147
104
72
31
121
162
141
1712
192
214
211
201
281
351
391
383
305
471
417
2620
Analysis of ostree::Select
• Each level of recursion descends one level of the OS tree– Therefore, ostree::Select is at worst O(h), where h is the height of the tree
– Since the height of the tree is known to be O(lgn), ostree::Select has running time O(lgn)
Determining the Rank of an Element
ostree::Rank(node *x)
{
int r = x->left->size + 1;
node *y = x;
while ( y != root )
{
if ( y == y->parent->right)
r += y->parent->left->size + 1;
y = y->parent;
}
return r;
}
• The rank of a node x = the # of nodes that precede it, + 1 for itself
• r is maintained as the rank of x in the subtree rooted at y - which denotes our position in the tree– To start, r is the rank
of x in it’s subtree
Determining the Rank of an Element
ostree::Rank(node *x)
{
int r = x->left->size + 1;
node *y = x;
while ( y != root )
{
if ( y == y->parent->right)
r += y->parent->left->size + 1;
y = y->parent;
}
return r;
}
• Each loop ascends the tree, and calculates x’s rank in that subtree– If y is a left child, the
rank is unchanged– If y is a right child, the
rank is equal to the size of the left subtree, plus 1 for the parent node
What Is The Rank of 28?
147
104
72
31
121
162
141
1712
192
214
211
201
281
351
391
383
305
471
417
2620
Analysis of ostree::Rank
• Each loop ascends one level of the OS tree– Therefore, ostree::Rank is at worst O(h), where h is the height of the tree
– Since the height of the tree is known to be O(lgn), ostree::Rank has running time O(lgn)
Maintaining Subtree Sizes
• ostree::Select & ostree::Rank are only useful if we can efficiently maintain the size field
• To be truly efficient, these fields must be maintained through the basic maintenance operations of the tree
Maintaining Subtree Sizes• Insertion
– Recall the two operations: bst::Insert, and performing rotations
• bst::Insert– As we descend the tree to perform insertion,
increment size field of all traversed nodes
• Rotation– Only the size fields of the rotated nodes are affected– The new parent node simply assumes the size of the
old parent node– The rotated node must now assume calculate it’s
size as the sum of it’s children, plus one
Maintaining Subtree Sizes9319
4211
6 4
7x
y
9312
4219
6
4 7
y
x
RightRotate(y)
LeftRotate(x)
Size Maintenance Through Rotation:y->size = x->size;
x->size = x->left->size + x->right->size + 1;
What is total added cost to rotation?
Maintaining Subtree Sizes
• Deletion– Also has two phases: one to delete the
node, the other to maintain the tree with at most three rotations
– We already know the added cost of rotation– When we splice out a node, we can traverse
up the tree, and decrement one from every node along it’s path
• This requires O(lgn) additional time
Maintaining Subtree Sizes
• Analysis:– Insertion is changed by at most O(1)– Deletion is changed by at most O(lgn)– Thus, the total asymptotic running
time of insertion and deletion is unchanged at O(lgn)
§15.2 How To Augment A Data Structure
• Four steps:– Choosing an underlying data structure;– Determining additional information to be
maintained in the underlying data structure;– Verifying that the additional information can be
maintained by the basic modifying operations on the underlying data structure; and
– Developing new operations
• Note: this isn’t a “formula”, but a good starting point
Augmenting Red-Black Trees for Order Statistics
• Step 1:– We chose red-black trees as the underlying data
structure due to efficient support of other dynamic-set operations
• Step 2:– Augmented nodes with the size field, to allow the desired
operations to be more efficient
• Step 3:– We ensured that insert and delete can maintain the new
field and still operate in O(lgn) time
• Step 4:– We developed ostree::Select and ostree::Rank
Why Augment Red-Black Trees?
• From Theorem 15.1:– If the new field can computed and
maintained using only the information in nodes x, x->left, and x->right, then we can maintain the values of the new field in all nodes during insertion and deletion without asymptotically affecting the O(lgn) performance of these operations
§15.3 Interval Trees
• An interval is a pair of real numbers used to specify a range of values– A closed interval [t1, t2] specifies a range that includes
the endpoints
– An open interval (t1, t2) specifies a range that excludes the endpoints
– A half-open interval [t1, t2) or (t1, t2] excludes one of the endpoints
• E.g., consider a log that stores events sorted by time– We may want to query the log to find out what happened
during a given time interval
Intervals
• Assume intervals are represented as structs with two fields: lo and hi
• Consider two intervals x and y– Any two intervals must satisfy the
interval trichotomy:•x and y overlap (x.lo <= y.hi && y.lo <= x.hi)•x lies completely to the left of y (x.hi < y.lo)•x lies completely to the right of y (y.hi < x.lo)
Interval Trees
• Interval Tree is a red-black tree that maintains a dynamic set of nodes– Each node contains an interval
• Support these operations:– Insertion - adds an element to the tree– Deletion - removes an element from the
tree– Search - searches for an interval that
overlaps the requested interval
Interval Trees[16,21]30
[8,9]23
[5,8]10
[15,23]23
[0,3]3
[6,10]10
[25,30]30
[17,19]20
[26,26]26
[19,20]20
An interval tree sorted by the low endpoint of each interval
Interval Trees
• The interval tree stores intervals, and is sorted by the low endpoint
• Each node contains an additional field, max, which is the maximum value of any interval endpoint stored in its subtree– Maintained through insertion/deletion with
this O(1) statement:x->max = max(x->interval->hi, x->left->max, x->right->max)
– What about through rotations? The same operation applies, to the new parent node
Interval Trees: New Operations
• The only new operation is the Search operation:intervalTree::Search(interval i)
{
node *x = root;
while ( x != NULL && !Overlap(x->interval, i) )
{
if ( x->left != NULL &&
x->left->max >= i->lo )
x = x->left;
else
x = x->right;
}
}
Interval Tree Search
[16,21]30
[8,9]23
[5,8]10
[15,23]23
[0,3]3
[6,10]10
[25,30]30
[17,19]20
[26,26]26
[19,20]20
• Search this tree for the interval [22,25]
• Now search for [11,14]
• What is the asymptotic growth of Search?
• Why does it work?
Interval Tree Search
• Interval tree search algorithm finds the first overlapping interval
• How could we find all overlapping intervals?
Why Interval Tree Search Works
• Recall this part of the interval tree search algorithm:
if ( x->left != NULL && x->left->max >= i->lo )
x = x->left;
else
x = x->right;• If the else is executed, then the left branch is NULL, or the
lo end of the interval we’re searching for is to the right of highest hi endpoint in the left subtree, so if an interval exists it must be in the right subtree
• If the first branch is executed, then the max value in the left subtree is greater than lo value of the interval we’re searching for - so there can be an overlapping interval in the left subtree
Why Interval Tree Search Works
• Why aren’t there any in the right subtree?– Assume there are no overlapping intervals– The tree is sorted by the lo endpoint– All nodes in the right subtree have lo endpoints > all
nodes in the right subtree– Since i->lo < x->left->max, then for some node in the
left subtree, there’s a node with hi endpoint = x->left->max, and lo endpoint < all lo endpoints in right subtree
– If there are no overlapping intervals, it follows that i->lo < that node’s lo interval (otherwise it would overlap), so therefore if an overlapping interval existed it would have to be in the left subtree
Assignment
• Page 286: 15.1-3, 15.1-5• Page 295: 15.3-2, 15.3-3, 15.3-5,
15.3-6