September 20, 2006, Simonas Šaltenis Supporting Frequent Updates in R-Trees: A Bottom-Up Approach Christian S. Jensen Aalborg University, Denmark Mong Li Lee Wynne Hsu Bin Cui Keng Lik Teo National University of Singapore, Singapore VLDB 2003 presented by Simonas Šaltenis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
September 20, 2006, Simonas Šaltenis
Supporting Frequent Updates in R-Trees:
A Bottom-Up Approach
Christian S. JensenAalborg University, Denmark
Mong Li Lee Wynne Hsu Bin Cui Keng Lik Teo National University of Singapore, Singapore
VLDB 2003
presented bySimonas Šaltenis
September 20, 2006, Simonas Šaltenis 2
Motivation
• New data management applications monitor continuous processes.• Tracking 2D moving objects
• Updates are frequent.
• Updates are likely to exhibit locality.
• Existing R-tree updates work in a top-down manner, performing two index traversals.
• Particularly the delete operation is expensive.• Traverses several partial or full paths from the root to the leaf level
• Key idea: do localized updates that consider less placements of updated values.
September 20, 2006, Simonas Šaltenis 3
Outline
• Motivation
• Background – the R-tree
• Generalized bottom-up update• Data structure
• Algorithms
• Optimizations, tuning parameters
• Related work
• Performance study
• Strong and weak points
• Conclusion
p2
p1
p14
p9
p8
p5
p3
p4
p10
p13
p15
p11
p12
p6
p7
R1
R3
R7R6
R5
R4
R8
R2
p2
p1
p7p5
p3
p4
p6
p14
p9
p8
p10
p13
p15
p11
p12
p11 p12p8 p9 p10p5 p6 p7p3 p4p1 p2 p13 p14 p15
R5
R3
R7R6
R4
R8
p2
p1
p5
p3
p4
p6
p14
p9
p8
p10
p13
p15
p11
p12
p7
p11 p12p8 p9 p10p5 p6 p7p3 p4p1 p2 p13 p14 p15
R3 R4 R5 R6 R7 R8
R5
R3
R7R6
R4
R8
R1
R2
p2
p1
p5
p3
p4
p6
p14
p9
p8
p10
p13
p15
p11
p12
p7
p11 p12p8 p9 p10p5 p6 p7p3 p4p1 p2 p13 p14 p15
R4 R7R3 R5 R6 R8
R1 R2
R3 R4 R5 R6 R7 R8
R5
R3
R7R6
R4
R8
R1
R2
p2
p1
p5
p3
p4
p6
p14
p9
p8
p10
p13
p15
p11
p12
p7
p11 p12p8 p9 p10p5 p6 p7p3 p4p1 p2 p13 p14 p15
R4 R7R3 R5 R6 R8
R1 R2R1 R2
R5
R3
R7R6
R4
R8
R1
R2
p2
p1
p5
p3
p4
p6
p14
p9
p8
p10
p13
p15
p11
p12
p7
p11 p12p8 p9 p10p5 p6 p7p3 p4p1 p13 p14 p15
R7R3 R5 R6 R8
R2R1
p2
R4
September 20, 2006, Simonas Šaltenis 10
R-tree updates
• An update in the R-tree is a pair of operations: • Delete(obj_id, (xold,yold))
• Insert (obj_id, (xnew,ynew))
• Insert:• Traverse one path down the tree, at each node using a heuristic choice of a subtree
• Traverse up the tree as high as necessary propagating splits and/or adjustments of MBRs
• Delete:• Perform a query (xold,yold) to find the point
• Potentially several paths down the tree are traversed!
• Traverse up the tree as high as necessary propagating adjustments of MBRs
• Four tree traversals in total!
September 20, 2006, Simonas Šaltenis 11
Outline
• Motivation
• Background – the R-tree
• Generalized bottom-up update• Data structure
• Algorithms
• Optimizations, tuning parameters
• Related work
• Performance study
• Strong and weak points
• Conclusion
September 20, 2006, Simonas Šaltenis 12
Data structure
• Unmodified R-tree is used
• ID-index is added• A disk-based hash table mapping obj_IDs to leaf page numbers
• Main-memory summary of the tree is maintained:• For each non-leaf node: level, MBR, child pointers, pointer to a corresponding disk page
• For each leaf node: one bit recording whether the node is full
• For standard node fan-outs, the size of the main-memory summary is much less than 1% of the total index size
p11 p12p8 p9 p10 p13 p14 p15
R6 R7 R8
1
2
3
0
0
1
4
5
6
1
0
1
leaf
R1 R2
p5 p6 p7p3 p4p1 p2
R3 R4 R5
Hash table
3 R ....
2 R1 ....
2 R2 ....
level BR childptrs
root
internal
R1
R3
R7R6
R5
R4
R8
R2
p2
p1
p14
p9
p8
p6
p7p5
p3
p4
p10
p13
p15
p11
p12
R
R
R3
R5
R4
p1
p6
p7p5
p4
p3
p2
R1
• standard top-down update
Case 1: new location is outside the root BR
p5 p6 p7p4p1 p2
Hash table
3 R ....
2 R1 ....
2 R2 ....
level BR childptrs
root
internal
1
2
3
0
0
1
4
5
6
1
0
1
leaf
R1 R2
p3
R3 R4 R5
R1 R2
p3
R3 R4 R5
R1 R2
p5 p6 p7p3 p4p1
R4 R5
Hash table
3 R ....
2 R1 ....
2 R2 ....
level BR childptrs
root
internal
1
2
3
0
0
1
4
5
6
1
0
1
leaf
R3
R5
R4
p1
p6
p7p5
p2
Case 2: new location remainsinside its rectangle
p4
p3
• write new p2 location
R1
R3
p2
R3
p2
R1 R2
p5 p6 p7p1 p2
R3 R5
Hash table
3 R ....
2 R1 ....
2 R2 ....
level BR childptrs
root
internal
1
2
3
0
0
1
4
5
6
1
0
1
leaf
R3
R4
p1
p6
p7p3
p4
p2
• enlarge rectangle
• if p3 inside R4• write new R4• write new p3 location
R1
Case 3: new location is outside its rectangle
p4
p5
R5
p3
R4
p3
R4
R1 R2
p6 p7p3p1 p2
R3 R5
3 R ....
2 R1 ....
2 R2 ....
level BR childptrs
root
internal1
2
3
0
0
1
4
5
6
1
0
1
leaf
p4
R4
R3
R5
R4
p1
p6
p7
p4
p5p3
p2
R1
• enlarging does not help• deletion does not cause an underflow
Case 4: new location is far outside its rectangle
• if new p5 is in the BR of a non-full sibling• delete old p5• get sibling node• insert new p5 into sibling
•
p5p5 p5
R4 R5
Hash table
Hash tableHash table
R3
R5
R4
p1
p6
p7
p4
p8
p3
p2
R1
Case 5: new location is far outside its rectangle
R1
p5 p6 p7p3 p4p1
Hash table
p2 p10 p11p8 p9
R7
... ...... ...
... … ...
... ...
R ...
... ... ...
... … ...
3 R ....
2 R1 ....
2 R2 ....
level BR childptrs
. ... ....
p5
p9R2
R6
• delete old p2• findParent(pNode, newLocation)• do a standard R-tree insert at• the found parent node
R3 R4 R5
R2
R6
p2
R3 R4 R5
R2
R6
p2 p2
• enlarging does not help• no siblings, no underflow
September 20, 2006, Simonas Šaltenis 19
Epsilion ε
p3
ε
ε
p4
September 20, 2006, Simonas Šaltenis 20
Movement of Objects Between Siblings
• When moving an object to a sibling, redistribute otherobjects
R3
R4
p4p7
p6
p1
p3
p2
p5
September 20, 2006, Simonas Šaltenis 21
Outline
• Motivation
• Background – the R-tree
• Generalized bottom-up update• Data structure
• Algorithms
• Optimizations, tuning parameters
• Related work
• Performance study
• Strong and weak points
• Conclusion
September 20, 2006, Simonas Šaltenis 22
Related Work
• Lazy updates for R-tree [Kwon et al. 2002]• Leaf-level bounding rectangles are enlarged equally in all directions.
• Parent pointers are added to the R-tree:• Expensive to maintain!
• Query performance deteriorates because of increases in BR overlap.
• It can be called localized bottom-up update (LBU) approach
September 20, 2006, Simonas Šaltenis 23
Effect of ε
0
5
10
15
20
0 0,01 0,02 0,03
Epsilion
Avg Disk I/O
TD
LBU
GBU
0
100
200
300
0 0,01 0,02 0,03
EpsilionTotal CPU Time (s)
TD
LBU
GBU
Updates
September 20, 2006, Simonas Šaltenis 24
Effect of ε
0
100
200
300
0 0,01 0,02 0,03
Epsilion
Avg Disk I/O
TD
LBU
GBU
0
100
200
300
0 0,01 0,02 0,03
EpsilionTotal CPU Time (s)
TD
LBU
GBU
Queries
September 20, 2006, Simonas Šaltenis 25
Varying Buffer Size
0
5
10
15
20
25
0 2 4 6 8 10
Buffer Size (% of size of dataset)
Avg Disk I/O
TD
LBU
GBU
Updates
70
75
80
85
90
0 2 4 6 8 10
Buffer Size (% of size of dataset)
Avg Disk I/O
TD
LBU
GBU
Queries
September 20, 2006, Simonas Šaltenis 26
Scalability
0
5
10
15
20
25
0 2 4 6 8 10
Size of Data Set (in milions)
Avg Disk I/O
TD
LBU
GBU
Updates
0
100
200
300
400
500
600
700
0 2 4 6 8 10
Size of Data Set (in milions)Avg Disk I/O
TD
LBU
GBU
Queries
September 20, 2006, Simonas Šaltenis 27
Outline
• Motivation
• Background – the R-tree
• Generalized bottom-up update• Data structure
• Algorithms
• Optimizations, tuning parameters
• Related work
• Performance study
• Strong and weak points
• Conclusion
September 20, 2006, Simonas Šaltenis 28
Strong points
• Content:• The study is rather deep – explores the possibilities in-between localized updates and top-down updates
• Concurrency is addressed
• Extensive experiments
• Cost model is presented showing theoretically the merit of the proposed approach
• Form• Good order of presentation:
• First simpler algorithm, then a general one
September 20, 2006, Simonas Šaltenis 29
Weak points
• Content:• It requires a non-constant amount of main-memory to work
• It does not utilize all the available main-memory
• Data structure and algorithms are rather complex
• Too many parameters to adjust
• Form:• A couple of errors in the pseudo-codes
• Pseudo-code does not have line numbers
• Algorithm 3 pseudo-code is not very clear
• Symbols used in formulas are not always explained (e.g., section4.2)
September 20, 2006, Simonas Šaltenis 30
Conclusion
• Addressed the problem of handling frequent updates in R-trees
• Proposed a generalized bottom-up update strategy for R-trees
• Significantly better performance than top-down and localized bottom-up update.
• Future work• Application to other multi-dimensional indexes
• Better theoretical analysis of tradeoff between global-ness and update cost
• Acknowledgment: • Christian S. Jensen for most of the slides