Data Storage & Indexesweb.stanford.edu/class/cs245/slides/05-Storage-Formats-p2.pdf · Data Storage & Indexes Instructor: Matei Zaharia cs245.stanford.edu. Outline Co-designing storage

Post on 04-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Data Storage & Indexes

Instructor: Matei Zahariacs245.stanford.edu

Outline

Co-designing storage and compute (paper)

Indexes

CS 245 2

Outline

Co-designing storage and compute (paper)

Indexes

CS 245 3

C-Store Storage

The storage construct was a “projection”; what does that mean?

CS 245 4

C-Store Compression

Five types of compression:» Null suppression» Dictionary encoding» Run-length encoding» Bit-vector encoding» Lempel-Ziv

Tradeoff: size vs ease of computation

CS 245 5

API for Compressed Blocks

CS 245 6

Using the Block API

CS 245 7

Data Size with Each Scheme

CS 245 8

(a) Runs of length 50 (b) Runs of length 1000

Performance with Each Scheme

CS 245 9

(a) Runs of length 50 (b) Runs of length 1000

How would the results change on SSDs?

Outline

Co-designing storage and compute (paper)

Indexes

CS 245 10

Key Operations on an Index

Find all records with a given value for a key» Key can be one field or a tuple of fields

(e.g. country=“US” AND state=“CA”)» In some cases, only one matching record

Find all records with key in a given range

Find nearest neighbor to a data point?

CS 245 11

Tradeoffs in Indexing

Improved queryperformance

Size ofindexes

Cost to updateindexes

CS 245 12

Some Types of Indexes

Conventional indexes

B-trees

Hash indexes

Multi-key indexing

CS 245 13

Many standard data structures, but adapted to work well on disk

Sequential File

2010

4030

6050

8070

10090

CS 245 14

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

CS 245 15

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

CS 245 16

Sequential File

2010

4030

6050

8070

10090

2-level sparse index

10305070

90110130150

170190210230

1090

170250

330410490570

CS 245 17File and 2nd level index blocks need not be contiguous on disk

Sparse: Less space usage, can keep moreof index in memory

Dense: Can tell whether any record existswithout accessing file

(Later: sparse better for insertions, dense needed for secondary indexes)

Sparse vs Dense Tradeoff

CS 245 18

Terms

Search key of an indexPrimary index (on primary key of ordered files)Secondary indexDense index (contains all search key values)Sparse indexMulti-level index

CS 245 19

Handling Duplicate Keys

For a primary index, can point to first instance of each item (assuming blocks are linked)

For a secondary index, need to point to a list of records since they can be anywhere

CS 245 20

2010

4030

6050

8070

10305070

90110130150

Deletion: Sparse Index

CS 245 21

Deletion: Sparse Index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

CS 245 22

2010

4030

6050

8070

10305070

90110130150

Deletion: Sparse Index– delete record 40

CS 245 23

2010

4030

6050

8070

10305070

90110130150

Deletion: Sparse Index– delete record 30

CS 245 24

2010

4030

6050

8070

10305070

90110130150

4040

Deletion: Sparse Index– delete record 30

CS 245 25

2010

4030

6050

8070

10305070

90110130150

Deletion: Sparse Index– delete records 30 & 40

CS 245 26

2010

4030

6050

8070

10305070

90110130150

Deletion: Sparse Index– delete records 30 & 40

CS 245 27

2010

4030

6050

8070

10305070

90110130150

5070

Deletion: Sparse Index– delete records 30 & 40

CS 245 28

2010

4030

6050

8070

10203040

50607080

Deletion: Dense Index

CS 245 29

2010

4030

6050

8070

10203040

50607080

Deletion: Dense Index– delete record 30

CS 245 30

2010

4030

6050

8070

10203040

50607080

40

Deletion: Dense Index– delete record 30

CS 245 31

2010

4030

6050

8070

10203040

50607080

4040

Deletion: Dense Index– delete record 30

CS 245 32

2010

30

5040

60

10304060

Insertion: Sparse Index– insert record 34

CS 245 33

2010

30

5040

60

10304060

Insertion: Sparse Index– insert record 34

CS 245 34

2010

30

5040

60

10304060 34

• our lucky day!we have free spacewhere we need it!

Insertion: Sparse Index– insert record 34

CS 245 35

2010

30

5040

60

10304060

Insertion: Sparse Index– insert record 15

CS 245 36

2010

30

5040

60

10304060

15

2030

20

Insertion: Sparse Index– insert record 15

CS 245 37

2010

30

5040

60

10304060

15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

Insertion: Sparse Index– insert record 15

CS 245 38

2010

30

5040

60

10304060

Insertion: Sparse Index– insert record 25

CS 245 39

2010

30

5040

60

10304060

25

overflow blocks(reorganize later...)

Insertion: Sparse Index– insert record 25

CS 245 40

Orderingfield

5030

7020

4080

10100

6090

Secondary Indexes

CS 245 41

Orderingfield

5030

7020

4080

10100

6090

Sparse index: 302080

100

90...

Secondary Indexes

CS 245 42

Sparse index:

Secondary Indexes Orderingfield

5030

7020

4080

10100

6090

302080

100

90...

does not make sense!

CS 245 43

Orderingfield

5030

7020

4080

10100

6090

10203040

506070...

Secondary IndexesDense index:

CS 245 44

Orderingfield

5030

7020

4080

10100

6090

10203040

506070...

105090...

Sparsehigherlevel

Dense index:

Secondary Indexes

CS 245 45

Lowest level is dense

Other levels are sparse

Pointers are record pointers (not block)

With Secondary Indexes

CS 245 46

1020

4020

4010

4010

4030

10203040

5060...

buckets

Duplicate Values in Secondary Indexes

CS 245 47

Can compute complex queries through Boolean operations on record pointer lists

Consider an employee table with foreign keys for department and floor:

Another Benefit of Buckets

EmpID Name DeptID FloorID

1 Alice 2 1

2 Bob 2 2

FloorID …

1

2

DeptID …

1

2

CS 245 48

Query: Get Employees in (Toy Dept) AND (2nd floor)

Dept. index Employee Floor index

Toy 2nd

CS 245 49

Intersect “Toy” bucket and “2nd floor” buckets to get list of matching employees

This Idea is Used in Text Information Retrieval

Documents

...the cat is fat ...

...was rainingcats and dogs...

...Fido the dog ...

CS 245 50

This Idea is Used in Text Information Retrieval

Documents

...the cat is fat ...

...was rainingcats and dogs...

...Fido the dog ...

Inverted lists

cat

dog

CS 245 51

cat Title 5

Title 100

Author 10Abstract 57

Title 12

d3d2

d1

dog

typepositio

nlocation

Common Technique: More Info in Index Entries

Answer queries like “cat within 5 words of dog”CS 245 52

Pros:- Simple- Index is sequential file (good for scans)

Cons:- Inserts expensive, and/or- Lose sequentiality & balance

Conventional Indexes

CS 245 53

Some Types of Indexes

Conventional indexes

B-trees

Hash indexes

Multi-key indexing

CS 245 56

B-Trees

Another type of index» Give up on sequentiality of index» Try to get “balance”

Note: the exact data structure we’ll look at is a B+ tree, but plain old “B-trees” are similar

CS 245 57

B+ Tree ExampleRoot

100

120

150

180

30

3 5 11 30 35 100

101

110

120

130

150

156

179

180

200

(n = 3)

CS 245 58

to keys to keys to keys to keys< 57 57£ k<81 81£k<95 ³95

57 81 95

Sample Non-Leaf

CS 245 59

From non-leaf node

to next leafin sequence

57 81 95

To re

cord

w

ith k

ey 5

7To

reco

rd

with

key

81

To re

cord

w

ith k

ey 9

5

Sample Leaf Node

CS 245 60

Size of Nodes on Disk

n + 1 pointersn keys

(Fixed size nodes)

CS 245 61

Use at least

Non-leaf: é(n+1)/2ù pointers

Leaf: ë(n+1)/2û pointers to data

Don’t Want Nodes to be Too Empty

CS 245 62

Example: n = 3Full node min. node

Non-leaf

Leaf

120

150

180

30

3 5 11 30 35

CS 245 63

1. All leaves are at same lowest level (balanced tree)

2. Pointers in leaves point to records, except for “sequence pointer”

B+ Tree Rules (tree of order n)

CS 245 64

(3) Number of pointers/keys for B+ tree:

* When there is only one record in the B+ tree, min pointersin the root is 1 (the other pointers are null)

Non-leaf(non-root) n+1 n é(n+1)/2ù é(n+1)/2ù-1

Leaf(non-root) n+1 n

Root n+1 n 2* 1

Max Max Min Min ptrs keys ptrs®data keys

ë(n+1)/2û ë(n+1)/2û

B+ Tree Rules (tree of order n)

CS 245 65

Insert Into B+ Tree

(a) simple case» space available in leaf

(b) leaf overflow

(c) non-leaf overflow

(d) new root

CS 245 66

(a) Insert key = 32 n=33 5 11 30 31

30

100

CS 245 67

(a) Insert key = 32 n=33 5 11 30 31

30

100

32

CS 245 68

(a) Insert key = 7 n=3

3 5 11 30 31

30

100

CS 245 69

(a) Insert key = 7 n=3

3 5 11 30 31

30

100

3 5

7

CS 245 70

(a) Insert key = 7 n=3

3 5 11 30 31

30

100

3 5

7

7

CS 245 71

(c) Insert key = 160 n=3

100

120

150

180

150

156

179

180

200

CS 245 72

(c) Insert key = 160 n=3

100

120

150

180

150

156

179

180

200

160

179

CS 245 73

(c) Insert key = 160 n=3

100

120

150

180

150

156

179

180

200

180

160

179

CS 245 74

(c) Insert key = 160 n=3

100

120

150

180

150

156

179

180

200

160

180

160

179

CS 245 75

(d) New root, insert 45 n=3

10 20 30

1 2 3 10 12 20 25 30 32 40CS 245 76

(d) New root, insert 45 n=3

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

CS 245 77

(d) New root, insert 45 n=3

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

40CS 245 78

(d) New root, insert 45 n=3

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

40

30new root

CS 245 79

Deletion from B+tree

(a) Simple case: no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

CS 245 80

(b) Coalesce with sibling» Delete 50

10 40 100

10 20 30 40 50

n=4

CS 245 81

(b) Coalesce with sibling» Delete 50

10 40 100

10 20 30 40 50

n=4

40

CS 245 82

(c) Redistribute keys» Delete 50

10 40 100

10 20 30 35 40 50

n=4

CS 245 83

(c) Redistribute keys» Delete 50

10 40 100

10 20 30 35 40 50

n=4

35

35

CS 245 84

40 4530 3725 2620 2210 141 3

10 20 30 40

(d) Non-leaf coalesce– Delete 37

n=4

25

CS 245 85

40 4530 3725 2620 2210 141 3

10 20 30 40

(d) Non-leaf coalesce– Delete 37

n=4

30

25

CS 245 86

40 4530 3725 2620 2210 141 3

10 20 30 40

(d) Non-leaf coalesce– Delete 37

n=4

40

30

25

CS 245 87

40 4530 3725 2620 2210 141 3

10 20 30 40

(d) Non-leaf coalesce– Delete 37

n=4

40

30

25

25

new root

CS 245 88

B+ Tree Deletion in Practice

Often, coalescing is not implemented» Too hard and not worth it! (Most datasets just

grow in size over time.)

CS 245 89

Interesting Problem:

For B+ tree, how large should n be?

n is number of keys / node

CS 245 90

Sample Assumptions:

(1) Time to read node from disk is(S + Tn) msec.

CS 245 91

Sample Assumptions:

(1) Time to read node from disk is(S + Tn) msec.

(2) Once block in memory, use binarysearch to locate key:

(a + b log2 n) msec.For some constants a, b; Assume a << S

CS 245 92

Sample Assumptions:

(3) Assume B+tree is full, i.e., # nodes toexamine is logn N where N = # records

(1) Time to read node from disk is(S + Tn) msec.

(2) Once block in memory, use binarysearch to locate key:

(a + b log2 n) msec.For some constants a, b; Assume a << S

CS 245 93

Can Get:f(n) = time to find a record

f(n)

nopt n

CS 245 94

Find nopt by setting f’(n) = 0

Answer is nopt = “a few hundred” in practice

CS 245 95

Exercise

f(n) = logn N * (S + T n + a + b log2 n)

S = 14000 μsT = 0.2 μsb = 0.002 μsa = 0 μsN = 10,000,000

CS 245 96

N = 10 Million RecordsS= 14000T= 0.2b= 0.002a= 0N= 10,000,000

times in microseconds

n

CS 245 97

N = 100 Million RecordsS= 14000T= 0.2b= 0.002a= 0N= 100,000,000

times in microseconds

n

CS 245 98

Some Types of Indexes

Conventional indexes

B-trees

Hash indexes

Multi-key indexing

CS 245 100

Hash Indexes

key h(key)

record / ptr

...

Buckets(block sized)

Buckets can contain records or pointers to file

overflowbucket

CS 245 101

Chaining is used to handle bucket overflow

Hash vs Tree Indexes

+ O(1) instead of O(log N) disk accesses

– Can’t efficiently do range queries

CS 245 102

Challenge: Resizing

Hash tables try to keep occupancy in a fixed range (50-80%) and slow down beyond that» Too much chaining

How to resize the table when this happens?» In memory: just move everything, amortized

cost is pretty low» On disk: moving everything is expensive!

CS 245 103

Extendible Hashing

Tree-like design for hash tables that allows cheap resizing while requiring 2 IOs / access

CS 245 104

Extendible Hashing: 2 Ideas

(a) Use i of b bits output by hash function

b

h(K) ®

i

i will grow over time; the first i bits of each key’s hash are used to map it to a bucket

00110101

CS 245 105

(b) Use a directory with pointers to buckets

h(K)[0..i] to bucket...

...

Extendible Hashing: 2 Ideas

CS 245 106

Example: 4-bit h(K), 2 keys/bucket

i = 11

1

0001

10011100

CS 245 107

local depthglobal depth

0010

Insert 0010

Example: 4-bit h(K), 2 keys/bucket

i = 11

1

0001

10011100

Insert 1010

CS 245 108

local depthglobal depth

i = 11

1

0001

10011100

Insert 101011100

1010

Example: 4-bit h(K), 2 keys/bucket

CS 245 109

i = 11

1

0001

10011100

Insert 101011100

1010

New directory

200

01

10

11

i =

2

2

Example: 4-bit h(K), 2 keys/bucket

CS 245 110

10001

210011010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example

CS 245 111

10001

210011010

21100

Insert:

0111

0000

00

01

10

11

2i =

0111

0000

0111

0001

Example

CS 245 112

10001

210011010

21100

Insert:

0111

0000

00

01

10

11

2i =

0111

0000

0111

0001

2

2Example

CS 245 113

00

01

10

11

2i =

210011010

21100

20111

200000001

Example

Note: still need chaining if values

of h(K) repeat and fill a bucket

CS 245 114

Some Types of Indexes

Conventional indexes

B-trees

Hash indexes

Multi-key indexing

CS 245 115

Motivation

Example: find records where

DEPT = “Toy” AND SALARY > 50k

CS 245 116

Strategy I:

Use one index, say Dept.

Get all Dept = “Toy” recordsand check their salary

I1

CS 245 117

Strategy II:

Use 2 indexes; manipulate pointers

Toy Sal> 50k

CS 245 118

Strategy III:

Multi-key index

One idea:

I1

I2

I3

CS 245 119

Example

ExampleRecord

DeptIndex

SalaryIndex

Name=JoeDEPT=SalesSALARY=15k

ArtSalesToy

10k15k17k21k

12k15k15k19k

CS 245 120

h

nb

i a

co

de

g

f

m

l

kj

k-d Tree

CS 245 121

Splits dimensions in any order to hold k-dimensional data

h

nb

i a

co

d

10 20

10 20

e

g

f

m

l

kj

CS 245 122

k-d Tree

h

nb

i a

co

d

10 20

10 20

e

g

f

m

l

kj25 15 35 20

40

30

20

10

CS 245 123

k-d Tree

h

nb

i a

co

d

10 20

10 20

e

g

f

m

l

kj25 15 35 20

40

30

20

10

5

15 15

CS 245 124

k-d Tree

h

nb

i a

co

d

10 20

10 20

e

g

f

m

l

kj25 15 35 20

40

30

20

10

5

15 15

h i a bcd efg

n omlj k

Efficient range queries in both

dimensionsCS 245 125

k-d Tree

Summary

Wide range of indexes for different data types and queries (e.g. range vs exact)

Key concerns: query time, cost to update, and size of index

Next: given all these storage data structures, how do we run our queries?

CS 245 126

top related