Supporting Frequent Updates in R-Trees: A Bottom-Up Approachpeople.cs.aau.dk/~simas/dat5_06/presentations/rtreeupdate2.pdf · p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 R3

September 20, 2006, Simonas Šaltenis

Supporting Frequent Updates in R-Trees:

A Bottom-Up Approach

Christian S. JensenAalborg University, Denmark

Mong Li Lee Wynne Hsu Bin Cui Keng Lik Teo National University of Singapore, Singapore

VLDB 2003

presented bySimonas Šaltenis

September 20, 2006, Simonas Šaltenis 2

Motivation

• New data management applications monitor continuous processes.• Tracking 2D moving objects

• Updates are frequent.

• Updates are likely to exhibit locality.

• Existing R-tree updates work in a top-down manner, performing two index traversals.

• Particularly the delete operation is expensive.• Traverses several partial or full paths from the root to the leaf level

• Key idea: do localized updates that consider less placements of updated values.


Outline

• Motivation

• Background – the R-tree

• Generalized bottom-up update• Data structure

• Algorithms

• Optimizations, tuning parameters

• Related work

• Performance study

• Strong and weak points

• Conclusion

p2

p1

p14

p9

p8

p5

p3

p4

p10

p13

p15

p11

p12

p6

p7

R1

R3

R7R6

R5

R4

R8

R2

p2

p1

p7p5

p3

p4

p6

p14

p9

p8

p10

p13

p15

p11

p12

p11 p12p8 p9 p10p5 p6 p7p3 p4p1 p2 p13 p14 p15

R5

R3

R7R6

R4

R8

p2

p1

p5

p3

p4

p6

p14

p9

p8

p10

p13

p15

p11

p12

p7


R3 R4 R5 R6 R7 R8

R5

R3

R7R6

R4

R8

R1

R2

p2

p1

p5

p3

p4

p6

p14

p9

p8

p10

p13

p15

p11

p12

p7


R4 R7R3 R5 R6 R8

R1 R2

R3 R4 R5 R6 R7 R8

R5

R3

R7R6

R4

R8

R1

R2

p2

p1

p5

p3

p4

p6

p14

p9

p8

p10

p13

p15

p11

p12

p7


R4 R7R3 R5 R6 R8

R1 R2R1 R2

R5

R3

R7R6

R4

R8

R1

R2

p2

p1

p5

p3

p4

p6

p14

p9

p8

p10

p13

p15

p11

p12

p7

p11 p12p8 p9 p10p5 p6 p7p3 p4p1 p13 p14 p15

R7R3 R5 R6 R8

R2R1

p2

R4


R-tree updates

• An update in the R-tree is a pair of operations: • Delete(obj_id, (xold,yold))

• Insert (obj_id, (xnew,ynew))

• Insert:• Traverse one path down the tree, at each node using a heuristic choice of a subtree

• Traverse up the tree as high as necessary propagating splits and/or adjustments of MBRs

• Delete:• Perform a query (xold,yold) to find the point

• Potentially several paths down the tree are traversed!

• Traverse up the tree as high as necessary propagating adjustments of MBRs

• Four tree traversals in total!


Outline

• Motivation



• Algorithms


• Related work



• Conclusion


Data structure

• Unmodified R-tree is used

• ID-index is added• A disk-based hash table mapping obj_IDs to leaf page numbers

• Main-memory summary of the tree is maintained:• For each non-leaf node: level, MBR, child pointers, pointer to a corresponding disk page

• For each leaf node: one bit recording whether the node is full

• For standard node fan-outs, the size of the main-memory summary is much less than 1% of the total index size

p11 p12p8 p9 p10 p13 p14 p15

R6 R7 R8

1

2

3

0

0

1

4

5

6

1

0

1

leaf

R1 R2

p5 p6 p7p3 p4p1 p2

R3 R4 R5

Hash table

3 R ....

2 R1 ....

2 R2 ....

level BR childptrs

root

internal

R1

R3

R7R6

R5

R4

R8

R2

p2

p1

p14

p9

p8

p6

p7p5

p3

p4

p10

p13

p15

p11

p12

R

R

R3

R5

R4

p1

p6

p7p5

p4

p3

p2

R1

• standard top-down update

Case 1: new location is outside the root BR

p5 p6 p7p4p1 p2

Hash table

3 R ....

2 R1 ....

2 R2 ....

level BR childptrs

root

internal

1

2

3

0

0

1

4

5

6

1

0

1

leaf

R1 R2

p3

R3 R4 R5

R1 R2

p3

R3 R4 R5

R1 R2

p5 p6 p7p3 p4p1

R4 R5

Hash table

3 R ....

2 R1 ....

2 R2 ....

level BR childptrs

root

internal

1

2

3

0

0

1

4

5

6

1

0

1

leaf

R3

R5

R4

p1

p6

p7p5

p2

Case 2: new location remainsinside its rectangle

p4

p3

• write new p2 location

R1

R3

p2

R3

p2

R1 R2

p5 p6 p7p1 p2

R3 R5

Hash table

3 R ....

2 R1 ....

2 R2 ....

level BR childptrs

root

internal

1

2

3

0

0

1

4

5

6

1

0

1

leaf

R3

R4

p1

p6

p7p3

p4

p2

• enlarge rectangle

• if p3 inside R4• write new R4• write new p3 location

R1

Case 3: new location is outside its rectangle

p4

p5

R5

p3

R4

p3

R4

R1 R2

p6 p7p3p1 p2

R3 R5

3 R ....

2 R1 ....

2 R2 ....

level BR childptrs

root

internal1

2

3

0

0

1

4

5

6

1

0

1

leaf

p4

R4

R3

R5

R4

p1

p6

p7

p4

p5p3

p2

R1

• enlarging does not help• deletion does not cause an underflow

Case 4: new location is far outside its rectangle

• if new p5 is in the BR of a non-full sibling• delete old p5• get sibling node• insert new p5 into sibling

•

p5p5 p5

R4 R5

Hash table

Hash tableHash table

R3

R5

R4

p1

p6

p7

p4

p8

p3

p2

R1

Case 5: new location is far outside its rectangle

R1

p5 p6 p7p3 p4p1

Hash table

p2 p10 p11p8 p9

R7

... ...... ...

... … ...

... ...

R ...

... ... ...

... … ...

3 R ....

2 R1 ....

2 R2 ....

level BR childptrs

. ... ....

p5

p9R2

R6

• delete old p2• findParent(pNode, newLocation)• do a standard R-tree insert at• the found parent node

R3 R4 R5

R2

R6

p2

R3 R4 R5

R2

R6

p2 p2

• enlarging does not help• no siblings, no underflow


Epsilion ε

p3

ε

ε

p4


Movement of Objects Between Siblings

• When moving an object to a sibling, redistribute otherobjects

R3

R4

p4p7

p6

p1

p3

p2

p5


Outline

• Motivation



• Algorithms


• Related work



• Conclusion


Related Work

• Lazy updates for R-tree [Kwon et al. 2002]• Leaf-level bounding rectangles are enlarged equally in all directions.

• Parent pointers are added to the R-tree:• Expensive to maintain!

• Query performance deteriorates because of increases in BR overlap.

• It can be called localized bottom-up update (LBU) approach


Effect of ε

0

5

10

15

20

0 0,01 0,02 0,03

Epsilion

Avg Disk I/O

TD

LBU

GBU

0

100

200

300

0 0,01 0,02 0,03

EpsilionTotal CPU Time (s)

TD

LBU

GBU

Updates


Effect of ε

0

100

200

300

0 0,01 0,02 0,03

Epsilion

Avg Disk I/O

TD

LBU

GBU

0

100

200

300

0 0,01 0,02 0,03

EpsilionTotal CPU Time (s)

TD

LBU

GBU

Queries


Varying Buffer Size

0

5

10

15

20

25

0 2 4 6 8 10

Buffer Size (% of size of dataset)

Avg Disk I/O

TD

LBU

GBU

Updates

70

75

80

85

90

0 2 4 6 8 10

Buffer Size (% of size of dataset)

Avg Disk I/O

TD

LBU

GBU

Queries


Scalability

0

5

10

15

20

25

0 2 4 6 8 10

Size of Data Set (in milions)

Avg Disk I/O

TD

LBU

GBU

Updates

0

100

200

300

400

500

600

700

0 2 4 6 8 10

Size of Data Set (in milions)Avg Disk I/O

TD

LBU

GBU

Queries


Outline

• Motivation



• Algorithms


• Related work



• Conclusion


Strong points

• Content:• The study is rather deep – explores the possibilities in-between localized updates and top-down updates

• Concurrency is addressed

• Extensive experiments

• Cost model is presented showing theoretically the merit of the proposed approach

• Form• Good order of presentation:

• First simpler algorithm, then a general one


Weak points

• Content:• It requires a non-constant amount of main-memory to work

• It does not utilize all the available main-memory

• Data structure and algorithms are rather complex

• Too many parameters to adjust

• Form:• A couple of errors in the pseudo-codes

• Pseudo-code does not have line numbers

• Algorithm 3 pseudo-code is not very clear

• Symbols used in formulas are not always explained (e.g., section4.2)


Conclusion

• Addressed the problem of handling frequent updates in R-trees

• Proposed a generalized bottom-up update strategy for R-trees

• Significantly better performance than top-down and localized bottom-up update.

• Future work• Application to other multi-dimensional indexes

• Better theoretical analysis of tradeoff between global-ness and update cost

• Acknowledgment: • Christian S. Jensen for most of the slides

Supporting Frequent Updates in R-Trees: A Bottom-Up Approachpeople.cs.aau.dk/~simas/dat5_06/presentations/rtreeupdate2.pdf · p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 R3

Documents