Top Banner
DATA WAREHOUSING Physical Design
18

Diseño fisico indices_2

Jun 20, 2015

Download

Documents

Claudia Gomez

Indices
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Diseño fisico indices_2

DATA WAREHOUSING Physical Design

Page 2: Diseño fisico indices_2

2

Page 3: Diseño fisico indices_2

Provide efficient access to relevant records

Based on values of particular attribute(s)

Same idea as index in back of a book An index is a “thin” copy of a relation

Not all columns from the relation are included The index is sorted in a particular way

Index supports efficient lookup Useful when filters are selective

Avoid scanning rows that will be filtered out

Page 4: Diseño fisico indices_2

Indexes organized based on some search key Column (or set of columns) whose values are used to access the index

Organization can be sorting or hashing Index is built for some relation

One index entry per record in the relation Index consists of <Value, RID> pairs

Value = value of the search key for this record

RID = record identifier ▪ Tells the DBMS where the record is stored

▪ Usually (page number, offset in page)

Page 5: Diseño fisico indices_2

Traditional Access Methods

B-trees, hash tables, R-trees, grids, …

Popular in Warehouses

Covering indexes

Multi column indexes

join indexes

bit map indexes

5

Page 6: Diseño fisico indices_2

Idea behind fact index: Thinner version of fact table Index takes up less space than fact table Fewer I/Os required to scan it

Page 7: Diseño fisico indices_2

Index has 1 index entry per fact table row Regardless of how many columns are in the

index

Page 8: Diseño fisico indices_2

Sometimes an index has all the data you need Allows index-only query plan Not necessary to access the actual tuples Such an index is called a covering index

SELECT COUNT(*) FROM R WHERE A=5 Use index on A Count number of <5,RID> entries No need to look up records referenced by RIDs

Page 9: Diseño fisico indices_2

Multi-column indexes are very useful in data warehousing We say such an index has a composite key

Example: B-Tree index on (A,B) Search key is (A,B) combination Index entries sorted by A value Entries with same A value are sorted by B value Called a lexicographic sort

SELECT SUM(B) FROM R WHERE A=5 Our (A,B) index covers this query!

Coverage vs. size trade-off More attributes in search key → index covers more queries More attributes in search key → index takes up more disk space

Page 10: Diseño fisico indices_2

10

Page 11: Diseño fisico indices_2

11

Advantages

efficient computation of joins involving first index columns (or all columns)

Disadvantages

useful only for specific join combinations

▪ for general usage, it is necessary to store a high number of indices

required space may be significant

▪ joins always involve the fact table

Page 12: Diseño fisico indices_2

12

Cust Region Type

C1 Asia Retail

C2 Europe Dealer

C3 Asia Dealer

C4 America Retail

C5 Europe Dealer

RecID Retail Dealer

1 1 0

2 0 1

3 0 1

4 1 0

5 0 1

RecIDAsia Europe America

1 1 0 0

2 0 1 0

3 1 0 0

4 0 0 1

5 0 1 0

Base table Index on Region Index on Type

Query:

Get customer with region = „Asia‟ AND type = “Dealer”

Page 13: Diseño fisico indices_2

Good if domain cardinality small Most useful for attributes with low or

medium cardinality ▪ Not good for something like LastName

13

Page 14: Diseño fisico indices_2

Index intersection plans with bitmap indexes are fast Just perform bitwise AND! Index intersection with B-Trees requires a

join

Page 15: Diseño fisico indices_2

Save space for low-cardinality attributes As compared to a B-Tree or Hash index

Page 16: Diseño fisico indices_2

Bit vectors can be compressed Compression Pros and Cons

Reduce storage space → reduce number of I/Os required Need to compress/uncompress → increase CPU work

required Each compression scheme negotiates this trade-off

differently Operate directly on compressed bitmap → improved

performance

16

Page 17: Diseño fisico indices_2

Bit matrix which precomputes the join between a dimension and the fact table

one column for each dimension RID

one row for each fact table RID

cell (i,j) is 1 if fact table tuple i joins dimension tuple j, 0 otherwise

Page 18: Diseño fisico indices_2

Indexing dimensions attributes frequently involved in selection predicates if domain cardinality is high, then B-tree index if domain cardinality is low, then bitmap index

Indices for join indexing only foreign keys in the fact table is rarely

appropriate star join index should be used with caution (column order

issue) bitmapped join index is suggested (if available)

Indices for group by use materialized views