Data Storage & Indexesweb.stanford.edu/class/cs245/slides/05-Storage-Formats-p2.pdf · Data Storage & Indexes Instructor: Matei Zaharia cs245.stanford.edu. Outline Co-designing storage

Data Storage & Indexes

Instructor: Matei Zahariacs245.stanford.edu

Outline

Co-designing storage and compute (paper)

Indexes

CS 245 2

Outline

Indexes

CS 245 3

C-Store Storage

The storage construct was a “projection”; what does that mean?

CS 245 4

C-Store Compression

Five types of compression:» Null suppression» Dictionary encoding» Run-length encoding» Bit-vector encoding» Lempel-Ziv

Tradeoff: size vs ease of computation

CS 245 5

API for Compressed Blocks

CS 245 6

Using the Block API

CS 245 7

Data Size with Each Scheme

CS 245 8

(a) Runs of length 50 (b) Runs of length 1000

Performance with Each Scheme

CS 245 9

(a) Runs of length 50 (b) Runs of length 1000

How would the results change on SSDs?

Outline

Indexes

CS 245 10

Key Operations on an Index

Find all records with a given value for a key» Key can be one field or a tuple of fields

(e.g. country=“US” AND state=“CA”)» In some cases, only one matching record

Find all records with key in a given range

Find nearest neighbor to a data point?

CS 245 11

Tradeoffs in Indexing

Improved queryperformance

Size ofindexes

Cost to updateindexes

CS 245 12

Some Types of Indexes

Conventional indexes

B-trees

Hash indexes

Multi-key indexing

CS 245 13

Many standard data structures, but adapted to work well on disk

Sequential File

CS 245 14

Sequential File

Dense Index

10203040

50607080

90100110120

CS 245 15

Sequential File

Sparse Index

10305070

90110130150

170190210230

CS 245 16

Sequential File

2-level sparse index

10305070

90110130150

170190210230

170250

330410490570

CS 245 17File and 2nd level index blocks need not be contiguous on disk

Sparse: Less space usage, can keep moreof index in memory

Dense: Can tell whether any record existswithout accessing file

(Later: sparse better for insertions, dense needed for secondary indexes)

Sparse vs Dense Tradeoff

CS 245 18

Search key of an indexPrimary index (on primary key of ordered files)Secondary indexDense index (contains all search key values)Sparse indexMulti-level index

CS 245 19

Handling Duplicate Keys

For a primary index, can point to first instance of each item (assuming blocks are linked)

For a secondary index, need to point to a list of records since they can be anywhere

CS 245 20

10305070

90110130150

Deletion: Sparse Index

CS 245 21

Deletion: Sparse Index

10305070

90110130150

– delete record 40

CS 245 22

10305070

90110130150

Deletion: Sparse Index– delete record 40

CS 245 23

10305070

90110130150

CS 245 24

10305070

90110130150

CS 245 25

10305070

90110130150

Deletion: Sparse Index– delete records 30 & 40

CS 245 26

10305070

90110130150

CS 245 27

10305070

90110130150

CS 245 28

10203040

50607080

Deletion: Dense Index

CS 245 29

10203040

50607080

Deletion: Dense Index– delete record 30

CS 245 30

10203040

50607080

CS 245 31

10203040

50607080

CS 245 32

10304060

Insertion: Sparse Index– insert record 34

CS 245 33

10304060

CS 245 34

10304060 34

• our lucky day!we have free spacewhere we need it!

CS 245 35

10304060

CS 245 36

10304060

CS 245 37

10304060

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

CS 245 38

10304060

CS 245 39

10304060

overflow blocks(reorganize later...)

CS 245 40

Orderingfield

Secondary Indexes

CS 245 41

Orderingfield

Sparse index: 302080

Secondary Indexes

CS 245 42

Sparse index:

Secondary Indexes Orderingfield

302080

does not make sense!

CS 245 43

Orderingfield

10203040

506070...

Secondary IndexesDense index:

CS 245 44

Orderingfield

10203040

506070...

105090...

Sparsehigherlevel

Dense index:

Secondary Indexes

CS 245 45

Lowest level is dense

Other levels are sparse

Pointers are record pointers (not block)

With Secondary Indexes

CS 245 46

10203040

5060...

buckets

Duplicate Values in Secondary Indexes

CS 245 47

Can compute complex queries through Boolean operations on record pointer lists

Consider an employee table with foreign keys for department and floor:

Another Benefit of Buckets

EmpID Name DeptID FloorID

1 Alice 2 1

2 Bob 2 2

FloorID …

DeptID …

CS 245 48

Query: Get Employees in (Toy Dept) AND (2nd floor)

Dept. index Employee Floor index

Toy 2nd

CS 245 49

Intersect “Toy” bucket and “2nd floor” buckets to get list of matching employees

This Idea is Used in Text Information Retrieval

Documents

...the cat is fat ...

...was rainingcats and dogs...

...Fido the dog ...

CS 245 50

This Idea is Used in Text Information Retrieval

Documents

...the cat is fat ...

...was rainingcats and dogs...

...Fido the dog ...

Inverted lists

CS 245 51

cat Title 5

Title 100

Author 10Abstract 57

Title 12

typepositio

nlocation

Common Technique: More Info in Index Entries

Answer queries like “cat within 5 words of dog”CS 245 52

Pros:- Simple- Index is sequential file (good for scans)

Cons:- Inserts expensive, and/or- Lose sequentiality & balance

Conventional Indexes

CS 245 53

B-trees

Hash indexes

Multi-key indexing

CS 245 56

B-Trees

Another type of index» Give up on sequentiality of index» Try to get “balance”

Note: the exact data structure we’ll look at is a B+ tree, but plain old “B-trees” are similar

CS 245 57

B+ Tree ExampleRoot

3 5 11 30 35 100

(n = 3)

CS 245 58

to keys to keys to keys to keys< 57 57£ k<81 81£k<95 ³95

57 81 95

Sample Non-Leaf

CS 245 59

From non-leaf node

to next leafin sequence

57 81 95

Sample Leaf Node

CS 245 60

Size of Nodes on Disk

n + 1 pointersn keys

(Fixed size nodes)

CS 245 61

Use at least

Non-leaf: é(n+1)/2ù pointers

Leaf: ë(n+1)/2û pointers to data

Don’t Want Nodes to be Too Empty

CS 245 62

Example: n = 3Full node min. node

Non-leaf

3 5 11 30 35

CS 245 63

1. All leaves are at same lowest level (balanced tree)

2. Pointers in leaves point to records, except for “sequence pointer”

B+ Tree Rules (tree of order n)

CS 245 64

(3) Number of pointers/keys for B+ tree:

* When there is only one record in the B+ tree, min pointersin the root is 1 (the other pointers are null)

Non-leaf(non-root) n+1 n é(n+1)/2ù é(n+1)/2ù-1

Leaf(non-root) n+1 n

Root n+1 n 2* 1

Max Max Min Min ptrs keys ptrs®data keys

ë(n+1)/2û ë(n+1)/2û

B+ Tree Rules (tree of order n)

CS 245 65

Insert Into B+ Tree

(a) simple case» space available in leaf

(b) leaf overflow

(c) non-leaf overflow

(d) new root

CS 245 66

(a) Insert key = 32 n=33 5 11 30 31

CS 245 67

(a) Insert key = 32 n=33 5 11 30 31

CS 245 68

(a) Insert key = 7 n=3

3 5 11 30 31

CS 245 69

3 5 11 30 31

CS 245 70

3 5 11 30 31

CS 245 71

(c) Insert key = 160 n=3

CS 245 72

CS 245 73

CS 245 74

CS 245 75

(d) New root, insert 45 n=3

10 20 30

1 2 3 10 12 20 25 30 32 40CS 245 76

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

CS 245 77

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

40CS 245 78

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

30new root

CS 245 79

Deletion from B+tree

(a) Simple case: no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

CS 245 80

(b) Coalesce with sibling» Delete 50

10 40 100

10 20 30 40 50

CS 245 81

(b) Coalesce with sibling» Delete 50

10 40 100

10 20 30 40 50

CS 245 82

(c) Redistribute keys» Delete 50

10 40 100

10 20 30 35 40 50

CS 245 83

(c) Redistribute keys» Delete 50

10 40 100

10 20 30 35 40 50

CS 245 84

40 4530 3725 2620 2210 141 3

10 20 30 40

(d) Non-leaf coalesce– Delete 37

CS 245 85

40 4530 3725 2620 2210 141 3

10 20 30 40

CS 245 86

40 4530 3725 2620 2210 141 3

10 20 30 40

CS 245 87

40 4530 3725 2620 2210 141 3

10 20 30 40

new root

CS 245 88

B+ Tree Deletion in Practice

Often, coalescing is not implemented» Too hard and not worth it! (Most datasets just

grow in size over time.)

CS 245 89

Interesting Problem:

For B+ tree, how large should n be?

n is number of keys / node

CS 245 90

Sample Assumptions:

(1) Time to read node from disk is(S + Tn) msec.

CS 245 91

Sample Assumptions:

(2) Once block in memory, use binarysearch to locate key:

(a + b log2 n) msec.For some constants a, b; Assume a << S

CS 245 92

Sample Assumptions:

(3) Assume B+tree is full, i.e., # nodes toexamine is logn N where N = # records

(2) Once block in memory, use binarysearch to locate key:

(a + b log2 n) msec.For some constants a, b; Assume a << S

CS 245 93

Can Get:f(n) = time to find a record

nopt n

CS 245 94

Find nopt by setting f’(n) = 0

Answer is nopt = “a few hundred” in practice

CS 245 95

Exercise

f(n) = logn N * (S + T n + a + b log2 n)

S = 14000 μsT = 0.2 μsb = 0.002 μsa = 0 μsN = 10,000,000

CS 245 96

N = 10 Million RecordsS= 14000T= 0.2b= 0.002a= 0N= 10,000,000

times in microseconds

CS 245 97

N = 100 Million RecordsS= 14000T= 0.2b= 0.002a= 0N= 100,000,000

times in microseconds

CS 245 98

B-trees

Hash indexes

Multi-key indexing

CS 245 100

Hash Indexes

key h(key)

record / ptr

Buckets(block sized)

Buckets can contain records or pointers to file

overflowbucket

CS 245 101

Chaining is used to handle bucket overflow

Hash vs Tree Indexes

+ O(1) instead of O(log N) disk accesses

– Can’t efficiently do range queries

CS 245 102

Challenge: Resizing

Hash tables try to keep occupancy in a fixed range (50-80%) and slow down beyond that» Too much chaining

How to resize the table when this happens?» In memory: just move everything, amortized

cost is pretty low» On disk: moving everything is expensive!

CS 245 103

Extendible Hashing

Tree-like design for hash tables that allows cheap resizing while requiring 2 IOs / access

CS 245 104

Extendible Hashing: 2 Ideas

(a) Use i of b bits output by hash function

h(K) ®

i will grow over time; the first i bits of each key’s hash are used to map it to a bucket

00110101

CS 245 105

(b) Use a directory with pointers to buckets

h(K)[0..i] to bucket...

Extendible Hashing: 2 Ideas

CS 245 106

Example: 4-bit h(K), 2 keys/bucket

i = 11

10011100

CS 245 107

local depthglobal depth

Insert 0010

i = 11

10011100

Insert 1010

CS 245 108

local depthglobal depth

i = 11

10011100

Insert 101011100

CS 245 109

i = 11

10011100

Insert 101011100

New directory

CS 245 110

210011010

Insert:

Example

CS 245 111

210011010

Insert:

Example

CS 245 112

210011010

Insert:

2Example

CS 245 113

210011010

200000001

Example

Note: still need chaining if values

of h(K) repeat and fill a bucket

CS 245 114

B-trees

Hash indexes

Multi-key indexing

CS 245 115

Motivation

Example: find records where

DEPT = “Toy” AND SALARY > 50k

CS 245 116

Strategy I:

Use one index, say Dept.

Get all Dept = “Toy” recordsand check their salary

CS 245 117

Strategy II:

Use 2 indexes; manipulate pointers

Toy Sal> 50k

CS 245 118

Strategy III:

Multi-key index

One idea:

CS 245 119

Example

ExampleRecord

DeptIndex

SalaryIndex

Name=JoeDEPT=SalesSALARY=15k

ArtSalesToy

10k15k17k21k

12k15k15k19k

CS 245 120

k-d Tree

CS 245 121

Splits dimensions in any order to hold k-dimensional data

CS 245 122

k-d Tree

kj25 15 35 20

CS 245 123

k-d Tree

kj25 15 35 20

CS 245 124

k-d Tree

kj25 15 35 20

h i a bcd efg

n omlj k

Efficient range queries in both

dimensionsCS 245 125

k-d Tree

Summary

Wide range of indexes for different data types and queries (e.g. range vs exact)

Key concerns: query time, cost to update, and size of index

Next: given all these storage data structures, how do we run our queries?

CS 245 126

Data Storage & Indexesweb.stanford.edu/class/cs245/slides/05-Storage-Formats-p2.pdf · Data Storage & Indexes Instructor: Matei Zaharia cs245.stanford.edu. Outline Co-designing storage

Documents

Cloud MapReduce Zaharia

Raul Zaharia - Curs_foc

Database System Architecture - Stanford...

Alexandru Zaharia Licenta

Zaharia Stancu - Uruma

Catalin Zaharia

Data Structures and Algorithms - Computer...

ȘCOALA GIMNAZIALĂ “ZAHARIA STANCU”

Zaharia, Pr.Dr. Adrian- Vamile Văzduhului

Zaharia spark-scala-days-2012

14 Flashnet George Zaharia

Roman Leke Zaharia

Sertarul cu ură - Anca Zaharia

Lucian zaharia

Zaharia Dubin A

Logic and...