R-Trees 2-dimensional indexing structure
R-Trees2-dimensional indexing structure
R-trees• 2-dimensional version of the B-tree:
B-tree of maximum degree 8; degree between 3 and 8
Internal nodes with k children have k-1 split values
R-trees• Can store:
– a set of polygons (regions of a subdivision)– a set of polygonal lines (or boundaries)– a set of points– a mix of the above
• Stored objects may overlap
R-trees
• Originally by Guttman, 1984• Dozens of variations and optimizations since• Suitable for windowing, point location and
intersection queries• Heuristic structure, no order bounds ( O(..) )• Tree with higher degree: suitable for
background storage (short search paths);one node per disk block
Definition R-tree
• Every internal node contains entries (rectangle, pointer to child node)
• All leaves contain entries (rectangle, pointer to object) in database or file
• Rectangles are minimal bounding rectangles (MBR)
• The root has 2 and M entries
• All other nodes have at least m and at most M entries
• All leaves have the same depth
• m > 1 and M > 2m(e.g. m = 200; M = 1000)
Object descriptions
Grouping of objects
Windowing query: the fewer rectangles intersected, the fewer subtrees to descend into
Grouping of objects
• Objects close together in same leaves small rectangles queries descend in only few subtrees
• Group the child nodes under a parent node such that small rectangles arise
Heuristics for fast queries
• Small area of rectangles• Small perimeter of rectangles• Little overlap among rectangles• Well-filled nodes (tree less deep fewer disk
accesses on each search path)
Example R-tree
Object descriptions
point containment query
point containment query
Searching in an R-tree
• Q is query object (point, window, object)• For each rectangle R in the current node,
if Q and R intersect,– search recursively in the subtree under the pointer
at R (at an internal node)– get the object corresponding to R and test for
intersection with R (at a leaf)
Inserting in an R-tree
• Determine minimal bounding rectangle (MBR) of new object
• When not yet at a leaf (choose subtree):– determine rectangle whose area increment after
insertion of R is smallest– increase this rectangle if necessary and insert R
• At a leaf:– if there is space, insert, otherwise Split Node
Split Node
• Divide the M+1 rectangles into two groups, each with at least m and at most M rectangles
• Make a node for each group, with the rectangles and corresponding subtrees as entries
• Hang the two new nodes under the parent node in the place of the overfull node; determine the new MBRs (if the root was overfull, make a new root with two child nodes)
• If the parent has M+1 children, repeat Split Node with this parent
Split Node, example
New MBRs
Strategies for Split Node, I
• Determine R1 and R2 with largest MBR: the seeds for sets S1 and S2
• While |S1| , |S2| < M - m and not all rectangles distributed:– Take not yet distributed rectangle Rj , add to
the set whose MBR increases least
Linear R-tree of Guttman, 1984
Example Split Node I
Strategies for Split Node, II
• Determine R1 and R2 with largest area(MBR) -area(R1) - area(R2): the seeds for sets S1 and S2
• While |S1| , |S2| < M - m and not all distributed:– Determine of every not yet distributed rectangle Rj :
d1 = area increment of MBR(S1 Rj) (* w.r.t. MBR(S1) *)d2 = area increment of MBR(S2 Rj) (* w.r.t. MBR(S2) *)
– Choose Ri with maximal | d1 - d2 | ; add it to the set with smallest area increment
Quadratic R-tree of Guttman, 1984
Example Split Node, II
Strategies for Split Node, III
• Determine R1 and R2 with largest area(MBR) - area(R1) - area(R2): the seeds for sets S1 and S2 (* same as quadratic R-tree *)
• Determine axis with largest normalized separation of R1 and R2
( x-separation / x-range of MBR(R1 R2), ory-separation / y-range of MBR(R1 R2) )
• Sort rectangles according to that axis (lower left corner) and split evenly in subsets of size (M+1) / 2
Greene’s split, 1989
Example Split Node, III
Y-axis has largestnormalizedseparation
Deletion from an R-tree
• Find the leaf (node) and delete object; determine new (possibly smaller) MBR
• If the node is too empty (< m entries):– delete the node recursively at its parent– insert all entries of the deleted node into the R-tree
• Note: Insertions of entries/subtrees always occurs at the level where it came from
Insert as rectangle on middle level
Insert in a leaf
object
R*-trees
• Experimentally determined measures for choices at insertion (Choose Subtree, Split Node)
• Experimentally determined algorithms for:– Choose Subtree– Split Node
R*-trees; Choose Subtree
• At nodes directly above leaves: Choose entry (rectangle) with smallest overlap-increase
• At higher nodes: Choose entry (rectangle) with smallest area-increase (same as before)
p
kiiikk RRR
;1
)(area)(overlap R ,…, R are the entry rectangles
1 p
R*-trees; Split Node
Determine split axis:• For both the x- and the y-axis:
– sort the rectangles by smallest and largest coordinate– determine the M - 2m + 2 allowed distributions into two
groups– determine for each: the perimeter of the two MBRs– add the M - 2m + 2 perimeter lengths
• Choose the axis with smallest sum of perimeters
m mM - 2m + 1
R*-trees; Split Node
Determine split index (given the split axis):• Choose the distribution, among the M - 2m + 2,
with the smallest area of intersection of the MBRs
Nearest neighbor queries
• An R-tree can be used for nearest neighbor queries
• The idea is to perform a DFS, maintain the closest object so far and use the distance for pruning
closest object so far
queried
pruned
1
2
3
4
5
Forced reinsert
• Build R-tree by repeated insertion: first inserted rectangles are possibly badly placed
• Experiment:– make R-tree by inserting 20.000 rectangles– again, but afterwards, delete the first inserted
10.000 and insert them again!
• Search time improvement of 20-50% !
Summary R-trees
• Versatile 2-dimensional search tree (referred to as: indexing structure, or spatial index)
• Some variant used in most GIS• Well-suited for windowing, point location,
intersection, and nearest neighbor queries• Heuristic structure, no order bounds ( O(..) )• Dynamic; insertions and deletions supported• Tree with higher degree: well-suited for
background storage (short search paths)