Top Banner
column stores 3.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 6
57

class 6 column stores 3 - Harvard University

Jun 24, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: class 6 column stores 3 - Harvard University

column stores 3.0prof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 6

Page 2: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /33

research tuesday coming up

2

9/29 6:30pm Pierce 213 + blackboard (chocolate included)

wilson, class 15alex, class 16

& stratos

can we scan big data without even touching the data? can we do joins without doing joins?

no handouts + recording

Page 3: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /333

disk memoryA B C D

A

ABCrow-store

engineearly tuple

reconstruction/materialization

option1

option2

column-store

engine

Page 4: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /334

working over fixed width & dense columns

for (i=0;i<size;i++) if column[i]>v

res[j++]=i

no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values

for (i=0;i<size;i++) inter2[j++]=column[inter1[i]]

select

fetch

Page 5: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /335

B<20 minCA<10 IDs B CIDs

late tuple reconstruction/materialization only reconstruct to present results

no need to assemble tuples minimize memory footprint minimize data we are moving up the memory hierarchy but requires new processing engine

Page 6: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /336

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDs

A B C D B<20 minCA<10 IDs B CIDs

column-

vector-

Page 7: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /337

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 8: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /337

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(inter2,40,50)

select(Sa,50,65)

Page 9: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /338

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 10: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /338

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

select(Sa,50,65)

Page 11: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /339

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 12: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /339

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 13: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3310

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 14: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3310

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 15: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3311

update row7=(A=a,B=b,C=c,D=d)

A B C D A B C D

vs

cost: 1 page cost: N pages, N=# of columns

Page 16: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /33

A

12

A B C D

B C D

base data pending updates

updatequery

periodically

Page 17: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3313

a1 a2 a3 a4 a5 a6

b1 b2 b3 b4 b5 b6

c1 c2 c3 c4 c5 c6

tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6

A B C

relational table

inserts, deletes affect whole table

Page 18: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3314

A

pending inserts pending deletesupdate= delete followed by insert

what information do we need to remember

Page 19: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3315

Assume a column-store database with a table R(A,B,C,D,E). All attributes are integers. Our workload has two classes of queries:

1) select max(B),max(C),max(D),max(E) from R where A>v1 2) select B+C+D+E from R where A>v1

Should we use late or early tuple reconstruction plans? For each query, draw the 2 possible plans and the respective operators, explain which one is best and give the total cost.

early vs late tuple reconstruction

Page 20: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 21: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 22: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 23: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 24: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 25: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 26: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 27: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 28: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 29: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 30: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 31: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3316

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

Page 32: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 33: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 34: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 35: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 36: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 37: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 38: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 39: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 40: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 41: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3317

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

Page 42: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3318

defaultlate tuple reconstruction (rather safe choice)

open topic for optimizationwhen and how to do selective early reconstruction

issues to considertransformation overhead materialization overhead extra passes over the data it may be almost for free (sometimes in hash joins) dynamic code generation to fit data layouts

DSM vs. NSM: CPU performance tradeoffs in block-oriented query processingMarcin Zukowski, Niels Nes, Peter A. Boncz International Workshop on Data Management on New Hardware (DaMoN) 2008

Page 43: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3319

registers

on chip cache

on board cache

memory

disk

CPU

data

compute

time

speed cpu

mem

compression=

data

&

computation

but

Page 44: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3320

A B C D A B C D

which one gives better compression

Page 45: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3321

A

can we do something like huffman coding any side-effects

A dictionary

(check: Business Analytics in (a) Blink from readings)

Page 46: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3322

fixed width is key… ok and how do we store variable length data?

dictionary of strings

fixed width codes that point to dictionary entries

Page 47: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3323

essential column-stores featuresvirtual ids late tuple reconstruction (if ever) vectorized execution compression fixed-width columns

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Column"Store" Row"Store"

Run$

me'(sec)'

Performance'of'Column3Oriented'Op$miza$ons'

–Late"Materializa:on"

–Compression"

–Join"Op:miza:on"

–Tuple@at@a@:me"

Baseline"

Column-stores vs. row-stores: how different are they really? D. Abadi, S. Madden, and N. Hachem

ACM SIGMOD Conference on Management of Data, 2008

Page 48: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3324

disk memoryA B C D

A

ABCrow-store

engineearly tuple

reconstruction/materialization

option1

option2

column-store

engine

Page 49: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3325

but why now…weren’t all those design options obvious in the past as well?

moving data from disk

moving data from memory

computation 1) big memories 2) cpu vs memory speed

Page 50: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /33

main-memory systems

26

optimized for the memory wall

with or without persistent data

Page 51: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3327

other system categoriesnoSQL, new SQL, key-value stores, matlab, etc..

column-stores = bad name modern systems

Page 52: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3328

other data modelsrdf, jason, xml, arrays, sciences ?

Page 53: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3329

data layoutscolumn-storage row-storage

H2O: A Hands-free Adaptive StoreIoannis Alagiannis, Stratos Idreos, and Anastassia Ailamaki In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2014

Page 54: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3330

first part done: basic concepts in modern systems

coming up: indexing and fast scans

Page 55: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3331

reading

The Design and Implementation of Modern Column-store Database Systems (Sections: all -4.6 & 4.8)by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden

IEEE Data Engineering Bulletin, 35(1), March 2012 Special Issue on Column-stores (9 short overview papers)

Page 56: class 6 column stores 3 - Harvard University

CS165, Fall 2015 Stratos Idreos /3332

research papers

Integrating compression and execution in column-oriented database systemsDaniel J. Abadi, Samuel Madden, Miguel Ferreira In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2006

Updating a cracked databaseStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2007

Positional update handling in column storesSándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, Peter A. Boncz In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2010

Column-stores vs. row-stores: how different are they really? D. Abadi, S. Madden, and N. Hachem In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2008

Page 57: class 6 column stores 3 - Harvard University

DATA SYSTEMSprof. Stratos Idreos

class 6

column-stores 3.0