column stores 3.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 6
column stores 3.0prof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 6
CS165, Fall 2015 Stratos Idreos /33
research tuesday coming up
2
9/29 6:30pm Pierce 213 + blackboard (chocolate included)
wilson, class 15alex, class 16
& stratos
can we scan big data without even touching the data? can we do joins without doing joins?
no handouts + recording
CS165, Fall 2015 Stratos Idreos /333
disk memoryA B C D
A
ABCrow-store
engineearly tuple
reconstruction/materialization
option1
option2
column-store
engine
CS165, Fall 2015 Stratos Idreos /334
working over fixed width & dense columns
for (i=0;i<size;i++) if column[i]>v
res[j++]=i
no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values
for (i=0;i<size;i++) inter2[j++]=column[inter1[i]]
select
fetch
CS165, Fall 2015 Stratos Idreos /335
B<20 minCA<10 IDs B CIDs
late tuple reconstruction/materialization only reconstruct to present results
no need to assemble tuples minimize memory footprint minimize data we are moving up the memory hierarchy but requires new processing engine
CS165, Fall 2015 Stratos Idreos /336
select min(C) from R where A<10 & B<20
B<20 minCA<10A B C D IDs B CIDs
A B C D B<20 minCA<10 IDs B CIDs
column-
vector-
CS165, Fall 2015 Stratos Idreos /337
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /337
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(inter2,40,50)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /338
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /338
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /339
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /339
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /3310
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /3310
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
12347545495897754255
11356244297819812623
Relation R
Ra
RbRelation SSa Sb
select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40
Initial Status
1234532378653321290
Rc
31656911278
411935
Ra 24579
inter1
select(Ra,5,20)
31656911278
411935
24579
inter1 12347545495897754255
3445499742
3445499742
Rb inter2 reconstruct(Rb,inter1)
inter2 inter3 24579
24579
459
select(inter2,30,40)
inter3
459
1234532378653321290
Rc join_input_R
237829
459
join_input_R
237829
459
reconstruct(Rc,inter3)
1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)
17495899643753613250
Sa 3578
10
inter4 select(Sa,55,65)
17495899643753613250
6229198123
inter5reconstruct(Sb,inter4)
3578
10
3578
10
inter4 11356244297819812623
Sb6229198123
inter53578
10
6229198123
join_input_S 3578
10
reverse(inter5)
6229198123
join_input_S 3578
10
49
105
join_res_ R_S
49
105
join(join_input_R,join_input_S)join_res_ R_S
49
inter6voidTail(join_res_R_S)
49
inter6
Ra
31656911278
411935
919
inter7
919
inter7 reconstruct(Ra,inter6)
28
resultsum(inter7)
Query and Query Plan (MAL Algebra)
(1) (2) (4)(3)
(5) (6) (7)
(8) (9)
(10) (11)
select(inter2,40,50)
select(Sa,50,65)
CS165, Fall 2015 Stratos Idreos /3311
update row7=(A=a,B=b,C=c,D=d)
A B C D A B C D
vs
cost: 1 page cost: N pages, N=# of columns
CS165, Fall 2015 Stratos Idreos /33
A
12
A B C D
B C D
base data pending updates
updatequery
periodically
CS165, Fall 2015 Stratos Idreos /3313
a1 a2 a3 a4 a5 a6
b1 b2 b3 b4 b5 b6
c1 c2 c3 c4 c5 c6
tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6
A B C
…
relational table
inserts, deletes affect whole table
CS165, Fall 2015 Stratos Idreos /3314
A
pending inserts pending deletesupdate= delete followed by insert
what information do we need to remember
CS165, Fall 2015 Stratos Idreos /3315
Assume a column-store database with a table R(A,B,C,D,E). All attributes are integers. Our workload has two classes of queries:
1) select max(B),max(C),max(D),max(E) from R where A>v1 2) select B+C+D+E from R where A>v1
Should we use late or early tuple reconstruction plans? For each query, draw the 2 possible plans and the respective operators, explain which one is best and give the total cost.
early vs late tuple reconstruction
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3316
sel A IDs B max
sel A IDs B
result
max(B), max(C) max(D), max(E)
C D E
late TR
hybrid
IDs C max IDs D maxresult
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3317
sA IDs B
sA IDs B result+(B,C,D,E)C D E
late TR
r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res
hybrid
CS165, Fall 2015 Stratos Idreos /3318
defaultlate tuple reconstruction (rather safe choice)
open topic for optimizationwhen and how to do selective early reconstruction
issues to considertransformation overhead materialization overhead extra passes over the data it may be almost for free (sometimes in hash joins) dynamic code generation to fit data layouts
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processingMarcin Zukowski, Niels Nes, Peter A. Boncz International Workshop on Data Management on New Hardware (DaMoN) 2008
CS165, Fall 2015 Stratos Idreos /3319
registers
on chip cache
on board cache
memory
disk
CPU
data
compute
time
speed cpu
mem
compression=
data
&
computation
but
CS165, Fall 2015 Stratos Idreos /3320
A B C D A B C D
which one gives better compression
CS165, Fall 2015 Stratos Idreos /3321
A
can we do something like huffman coding any side-effects
A dictionary
(check: Business Analytics in (a) Blink from readings)
CS165, Fall 2015 Stratos Idreos /3322
fixed width is key… ok and how do we store variable length data?
dictionary of strings
fixed width codes that point to dictionary entries
CS165, Fall 2015 Stratos Idreos /3323
essential column-stores featuresvirtual ids late tuple reconstruction (if ever) vectorized execution compression fixed-width columns
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
Column"Store" Row"Store"
Run$
me'(sec)'
Performance'of'Column3Oriented'Op$miza$ons'
–Late"Materializa:on"
–Compression"
–Join"Op:miza:on"
–Tuple@at@a@:me"
Baseline"
Column-stores vs. row-stores: how different are they really? D. Abadi, S. Madden, and N. Hachem
ACM SIGMOD Conference on Management of Data, 2008
CS165, Fall 2015 Stratos Idreos /3324
disk memoryA B C D
A
ABCrow-store
engineearly tuple
reconstruction/materialization
option1
option2
column-store
engine
CS165, Fall 2015 Stratos Idreos /3325
but why now…weren’t all those design options obvious in the past as well?
moving data from disk
moving data from memory
computation 1) big memories 2) cpu vs memory speed
CS165, Fall 2015 Stratos Idreos /33
main-memory systems
26
optimized for the memory wall
with or without persistent data
CS165, Fall 2015 Stratos Idreos /3327
other system categoriesnoSQL, new SQL, key-value stores, matlab, etc..
column-stores = bad name modern systems
CS165, Fall 2015 Stratos Idreos /3328
other data modelsrdf, jason, xml, arrays, sciences ?
CS165, Fall 2015 Stratos Idreos /3329
data layoutscolumn-storage row-storage
…
H2O: A Hands-free Adaptive StoreIoannis Alagiannis, Stratos Idreos, and Anastassia Ailamaki In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2014
CS165, Fall 2015 Stratos Idreos /3330
first part done: basic concepts in modern systems
coming up: indexing and fast scans
CS165, Fall 2015 Stratos Idreos /3331
reading
The Design and Implementation of Modern Column-store Database Systems (Sections: all -4.6 & 4.8)by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden
IEEE Data Engineering Bulletin, 35(1), March 2012 Special Issue on Column-stores (9 short overview papers)
CS165, Fall 2015 Stratos Idreos /3332
research papers
Integrating compression and execution in column-oriented database systemsDaniel J. Abadi, Samuel Madden, Miguel Ferreira In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2006
Updating a cracked databaseStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2007
Positional update handling in column storesSándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, Peter A. Boncz In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2010
Column-stores vs. row-stores: how different are they really? D. Abadi, S. Madden, and N. Hachem In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2008
DATA SYSTEMSprof. Stratos Idreos
class 6
column-stores 3.0