Top Banner
R-Trees Extension of B+-trees. Collection of d-dimensional rectangles. A point in d-dimensions is a trivial rectangle.
44

R-Trees Extension of B+-trees. Collection of d-dimensional rectangles. A point in d-dimensions is a trivial rectangle.

Dec 24, 2015

Download

Documents

Hester Lamb
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

R-Trees

• Extension of B+-trees. Collection of d-dimensional rectangles. A point in d-dimensions is a trivial rectangle.

Page 2: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Non-rectangular Data

• Non-rectangular data may be represented by minimum bounding rectangles (MBRs).

Page 3: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Operations

• Insert

• Delete

• Find all rectangles that intersect a query rectangle.

• Good for large rectangle collections stored on disk.

Page 4: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

R-Trees—Structure

• Data nodes (leaves) contain rectangles.

• Index nodes (non-leaves) contain MBRs for data in subtrees.

• MBR for rectangles or MBRs in a non-root node is stored in parent node.

Page 5: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

R-Trees—Structure

• R-tree of order M. Each node other than the root has between m

<= ceil(M/2) and M rectangles/MBRs.• Assume m = ceil(M/2) henceforth.

Typically, m = ceil(M/2). Root has between 2 and M rectangles/MBRs. Each index node has as many MBRs as

children. All data nodes are at the same level.

Page 6: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Example• R-tree of order 4.

Each node may have up to 4 rectangles/MBRs.

Page 7: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Example• Possible partitioning of our example data

into 12 leaves.

Page 8: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Example• Possible R-tree of order 4 with 12 leaves.

a b c d e f g h i j k l

m n o p

Leaves are data nodes that contain 4 input rectangles each.

a-p are MBRs

Page 9: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Example• Possible corresponding

grouping.

a

b

cd

m

a b cd e f g h i j k l

m n o p

Page 10: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Example

a

b

cd

me fn

• Possible corresponding grouping.

a b cd e f g h i j k l

m n o p

Page 11: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Example

a

b

cd

me fn

h

g

i

o p

• Possible corresponding grouping.

a b cd e f g h i j k l

m n o p

Page 12: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Query• Report all rectangles that intersect a given

rectangle.

Page 13: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Query

• Start at root and find all MBRs that overlap query.• Search corresponding subtrees recursively.

Page 14: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Query

mn

o p

x

a b cd e f g h i j k l

m n o p

Page 15: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Query• Search m.

mn

o p

a

b

cd

x x

a b cd e f g h i j k l

m n o p

Page 16: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert• Similar to insertion into B+-tree but may

insert into any leaf; leaf splits in case capacity exceeded. Which leaf to insert into? How to split a node?

Page 17: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Leaf Selection• Follow a path from root to leaf.• At each node move into subtree whose MBR area

increases least with addition of new rectangle.

mn

o p

Page 18: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Leaf Selection• Insert into m.

m

Page 19: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Leaf Selection• Insert into n.

n

Page 20: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Leaf Selection• Insert into o.

o

Page 21: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Leaf Selection• Insert into p.

p

Page 22: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A

and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.

M = 8, m = 4

Page 23: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A

and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.

M = 8, m = 4

Page 24: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A

and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.

M = 8, m = 4

Page 25: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Exhaustive search for best A and B.

Compute area(MBR(A)) + area(MBR(B)) for each possible A.

Note—for each A, the B is unique. Select partition that minimizes this sum.

• When |A| = m = ceil(M/2), number of choices for A is

(M+1)!

m!(M+1-m)!

Impractical for large M.

Page 26: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Grow A and B using a clustering strategy.

Start with a seed rectangle a for A and b for B. Grow A and B one rectangle at a time. Stop when the M+1 rectangles have been

partitioned into A and B.

Page 27: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Quadratic Method—seed selection.

Let S be the set of M+1 rectangles to be partitioned.

Find a and b inS that maximizearea(MBR(a,b)) – area(a) – area(b)

M = 8, m = 4

Page 28: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Quadratic Method—seed selection.

Let S be the set of M+1 rectangles to be partitioned.

Find a and b inS that maximizearea(MBR(a,b)) – area(a) – area(b)

M = 8, m = 4

Page 29: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Quadratic Method—assign remaining

rectangles/MBRs. Find an unassigned rectangle c that maximizes

|area(MBR(A,c)) – area(MBR(A))

- (area(MBR(B,c)) – area(MBR(B)))|

M = 8, m = 4

Page 30: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Quadratic Method—assign remaining

rectangles/MBRs. Find an unassigned rectangle c that maximizes

|area(MBR(A,c)) – area(MBR(A))

- (area(MBR(B,c)) – area(MBR(B)))|

M = 8, m = 4

Page 31: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Quadratic Method—assign remaining

rectangles/MBRs. Assign c to partition whose area increases least.

M = 8, m = 4

Page 32: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Quadratic Method—assign remaining rectangles/MBRs.

Continue assigning in this way until all remaining rectangles must necessarily be assigned to one of the two partitions for that partition to have m rectangles.

M = 8, m = 4

Page 33: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Page 34: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Separation in x-dimension

Page 35: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Rectangles with max x-separation

Page 36: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Divide by x-width to normalize

Page 37: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Separation in y-dimension

Page 38: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Rectangles with max y-separation

Page 39: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—seed selection.

Choose a and b to have maximum normalized separation.

M = 8, m = 4

Divide by y-width to normalize

Page 40: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Insert—Split A Node• Linear Method—assign remainder.

Assign remaining rectangles in random order. Rectangle is assigned to partition whose MBR area

increases least. Stop when all remaining rectangles must be assigned to

one of the partitions so that the partition has its minimum required m rectangles.

M = 8, m = 4

Page 41: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Delete

• If leaf doesn’t become deficient, simply readjust MBRs in path from root.

• If leaf becomes deficient, get from nearest sibling (if possible) and readjust MBRs.

• Combine with sibling as in B+ tree.

• Could instead do a more global reorganization to get better R-tree.

Page 42: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Variants

• R*-tree Leaf selection and node overflows in insertion

handled differently.

• Hilbert R-tree

Page 43: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Related Structures

• R+-tree Index nodes have non-overlapping rectangles. A data object may be represented in several

data nodes. No upper bound on size of a data node. No bounds (lower/upper) on degree of an index

node.

Page 44: R-Trees Extension of B+-trees.  Collection of d-dimensional rectangles.  A point in d-dimensions is a trivial rectangle.

Related Structures

• Cell tree Combines BSP and R+-tree concepts. Index nodes have non-overlapping convex

polyhedrons. No lower/upper bound on size of a data node. Lower bound (but not upper) on degree of an

index node.