Top Banner
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY
21

Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Jan 01, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Bitmap Indices for Data Warehouse

Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITY

Page 2: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Star Schema Vs. Multi-dimensional Range Queries

store storeId cityc1 nycc2 sfoc3 la

product prodId name pricep1 bolt 10p2 nut 5

sale oderId date custId prodId storeId qty amto100 1/7/97 53 p1 c1 1 12o102 2/7/97 53 p2 c1 2 11o105 3/8/97 111 p1 c3 5 50

customer custId name address city53 joe 10 main sfo81 fred 12 main sfo

111 sally 80 willow la

SUM (qty * amt)

WHERE ProdId in [p1.. p10] AND custId < 200

Page 3: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Characteristics of Multi-Dimensional Range Queries in Data Warehouse Ad-Hoc

Give N dimensions (attributes), every combination is possible: 2N combinations

A Data Cube equals to 2N GROUP-Bys

High Dimensions ( > 20)

Large Number of Records

Page 4: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Multi-Dimensional Index Fails! R-Trees or KD-Trees

Effective only for moderate number of dimensions Efficient only for queries involving all indexed

dimensions.

For Ad-hoc Rang Queries, Projection Index is usually better, and Bitmap Index is even better.

Page 5: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Projection Index

Fix the order of the records in the base table Store

Project records along some dimension i.e, A single Column Keeping the record order Keeping the duplicates

Like “array” in C language

store storeId cityc1 nycc2 sfoc3 la

storeIdc1c2c3

base table

Projection Index

Page 6: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Multi-dimensional Range Queries : A General Idea Build an index for each dimension (attribute);

A Projection Index A B-Tree

1 Primary B-Tree, N -1 Secondary B-Trees

For each involved dimension, use the index on that dimension to select records;

“AND” the records to get the final answer set.

Page 7: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

How to make the “AND” operation fast? Projection Index (B-Tree is similar)

Scan each involved dimension, And return a set of RIDs. Intersection the RID sets

Sets have different lengths We can use Sort and Merge to do the Intersection

Life is easier when all the sets have the same length and in the same

order Use 1/0 to record the membership of each record

Page 8: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

General Ideas of Bitmap Index Fix the order of records in the base table Suppose the base table has m records For each dimension

For each distinct dimension value (as the KEY) Build a bitmap with m bits (as the POSITIONS) A bitmap is like an Inverted Index

“AND”, “OR” operations realized by bitwise logical operations Well supported by hardware

Page 9: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Basic Bitmap IndexP. O’Neil, Model 204,1987

Page 10: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Size of Bitmap Indices

Number of Bitmap (Indices) How to build bitmap indices for dimensions with

large distinct values Temperature dimension

Size (i.e., Length) of a Single Bitmap

Page 11: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Three Solutions

Encoding Reduce the Number of Bitmaps

Binning Reduce the Number of Bitmaps

Compression Reduce the Size of a Single Bitmap

Page 12: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Encoding Strategies

Equality-encoded Good for equality queries , such as “temperature == 100” Basic Bitmap Index

Bit-sliced index Assume dimension A has c distinct values, use log2c

bitmap indices to represent each record (its value) Range-encoded

Good for one-sided range queries, such as “Pressure < 56.7”

Interval-encoded Good for two-sided range queries, such as“35.8 < Pressure

< 56.7”

Page 13: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Page 14: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Binning

Encoding mainly considers discrete dimension values Usually integers

Basic Ideas of Binning Build a bitmap index for a bin instead of for a distinct value The Number of Bitmaps has nothing to do with the number

of distinct values in a dimension. Pros and Cons

Pros : control the number of bitmap via controling the number of bins.

Cons : need to check original dimension values to decide if the records really satisfy query conditions.

Page 15: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

A Binning Example:Values of Dimension A lie in [0, 100]

Page 16: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Compression Strategies

General-purpose compression methods Software packages are widely available Tradeoff between query processing and compression ratio

De-compress data first

Specific methods BBC (Byte-aligned Bitmap Code ),

Antoshenkov,1994,1996. Adopted since Oracle 7.3

WAH(Word-aligned Hybrid Bitmap code ), Wu et al 2004, 2006. Used in Lawrence Berkeley Lab for high-energy physics

Page 17: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

WAH(Word-aligned Hybrid Bitmap code ) Based on run-length encoding

For consecutive 0s or 1s in a bit sequence (part of a bitmap)

Use machine WORD as the unit for compression Instead of BYTE in BBC

Design Goal : reduce the overhead of de-compression, in order to speed-

up query response.

Page 18: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Run-length encoding

Bit sequence B : 11111111110001110000111111110001001 fill : a set of consecutive identical bits (all 0s or all 1s)

The first 10 bits in B fill = count “+” bit value 1111111111=10 “+” 1

tail: a set of mixed 0s and 1s The last 8 bits in B

Run : Run = fill + tail

Basic Ideas of WAH Define fill and tail appropriately so that they can be stored in

WORDs.

Page 19: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Word-aligned Hybrid Bitmap code:32-bit WORD

Page 20: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

Characteristics of Industrial Products Model 204. (Pat O’Neil,1987)

The first that adopted bitmap index Basic Bitmap Index, No binning, No compression Now owned by Computer Corporation of America

Oracle ( 1995 ) Adopted compressed bitmap index since 7.3 Probably use BBC for compression, Equality-encoded, No

binning. Sybase IQ

bit-sliced index(Pat O’Neil et al,1997) No binning, No compression For dimension with small number of distinct values, use

Basic Bitmap Index.

Page 21: Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.

References

Kurt Stockinger, Kesheng Wu, Bitmap Indices for Data Warehouses, In Wrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. Idea Group, Inc. 2006.