Top Banner
ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/2010 1 Lipyeow Lim -- University of Hawaii at Manoa
16

ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Aug 04, 2019

Download

Documents

lamminh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

ICS 421 Spring 2010

Indexing (1)

Asst. Prof. Lipyeow Lim

Information & Computer Science Department

University of Hawaii at Manoa

02/18/2010 1Lipyeow Lim -- University of Hawaii at Manoa

Page 2: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

How to speed up queries?

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 2

SELECT *FROM SailorsWHERE age>40

File of Record for Sailors

Array of Sailor Tuples/Records

Page 3: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Binary Search Trees

• Given search value– if value < node.value, then follow left pointer– Else follow right pointer

• How do generalize each index node to an index page ?• How do we generalize this to search pages of records ?

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 3

28

21

18 20 21 21 26 27 28 30 31 34 37 41 45

34

20 26 30 41

Page 4: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Indexes• What do we store in the index nodes ? Let k

be the key value for an index entry:

1. Data record with key value k

2. <k, rid of data record with key value k>

3. <k, list of rids of data records with key value k>

• What kind of queries does the index support?

– Range

– Point (or equality)

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 4

Page 5: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Indexed Sequential Access Method (ISAM)

• Static (m+1)-way Search Tree

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 5

P0 K 1 P 1 K 2 P 2 K m P m

index entry

Non-leaf

Pages

Pages

Overflow page

Primary pages

Leaf

Page 6: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

ISAM: Example

• Store data record at the leaf pages

• Do we still need the file of record ?

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 6

sid

sname

rating

age

40

20 33 51 63

10 15 20 27 33 37 40 46 51 55 63 97

98

Insert new record with

age 98

Page 7: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

ISAM Facts• File creation: Leaf (data) pages allocated sequentially,

sorted by search key; then index pages allocated, then space for overflow pages.

• Index entries: <search key value, page id>; they `direct’ search for data entries, which are in leaf pages.

• Search: Start at root; use key comparisons to go to leaf. Cost=O(log F N) ; F = # entries/index pg, N = # leaf pgs

• Insert: Find leaf data entry belongs to, and put it there. If full, allocate and put in overflow page

• Delete: Find and remove from leaf; if empty overflow page, de-allocate.

• Static tree structure: inserts/deletes affect only leaf pages.

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 7

Page 8: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

B+ Tree Index• Insert/delete at log F N cost;

keep tree height-balanced. (F = fanout, N = # leaf pages)

• Minimum 50% occupancy (except for root). Each node contains d <= m <= 2dentries. The parameter d is called the order of the tree.

• Supports equality and range-searches efficiently.

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 8

B+ TreeIndex

Data Entries/Leaf Pages

(“Sequence Set”)

Index

Entries

Page 9: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

B+ Tree: Search Example

• Leaf entries store <key,rid> pairs

• What is the order ?

• Search for: age=5, age=15, age>=24

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 9

13 17 24 30

2 3 5 7 14 16 19 20 22 24 27 29 33 34 38 39

Page 10: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Inserting a new data entry• Find correct leaf L.

• Put data entry onto L.– If L has enough space, done!

– Else, must split L (into L and a new node L2)• Redistribute entries evenly, copy up middle key.

• Insert index entry pointing to L2 into parent of L.

• This can happen recursively– To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 10

Page 11: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Example: Insert 8*

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 11

2 3 5 7 8

13 17 24 30

2 3 5 7 14 16 19 20 22 24 27 29 33 34 38 39

5 5 copied up to

parent node

5 13 24 30

17 pushed up

into parent node17

Page 12: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Deleting a data entry• Start at root, find leaf L where entry belongs.

• Remove the entry.– If L is at least half-full, done!

– If L has only d-1 entries,• Try to re-distribute, borrowing from sibling (adjacent

node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

• Merge could propagate to root, decreasing height.

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 12

Page 13: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Miscellaneous• How do we handle data with duplicates ?

– Overflow buckets

– Make rid part of the key

– Each data entry stores <key, list of rids>

• Clustered vs Unclustered indexes

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 13

Index entries

Data entries

direct search for

(Index File)

(Data file)

Data Records

data entries

Data entries

Data Records

CLUSTERED UNCLUSTERED

Page 14: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Bulk Loading a B+ Tree• If we have a large collection of records, and we

want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow.

• Bulk Loading can be done much more efficiently.

• Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page.

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 14

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ treeRoot

Page 15: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Bulk Loading (cont.)• Index entries for

leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.)

• Much faster than repeated inserts, especially when one considers locking!

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 15

3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

Root

Data entry pages

not yet in B+ tree3523126

10 20

3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ tree

Data entry pages

Page 16: ICS 421 Spring 2010 Indexing (1) - lipyeow.github.io fileICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at

Creating Indexes• Most DBMS (eg. DB2) supports only B+ tree indexes:

CREATE INDEX myIdx ON mytable(col1, col3)CREATE UNIQUE INDEX myUniqIdx ON mytable(col2, col5)CREATE INDEX myIdx ON mytable(col1, col3) CLUSTER

• If a primary key is specified in the CREATE TABLE statement, an (unclustered) index is automatically created for the PK.

• To create a clustered PK index:– Create table without PK constraint– Create index on PK with cluster option– Alter table to add PK constraint

• To get rid of unused indexes: DROP INDEX myIdx;

02/18/2010 Lipyeow Lim -- University of Hawaii at Manoa 16