Top Banner
IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland
31

IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Jan 18, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

IntegriDB: Verifiable SQL for Outsourced Databases

Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou

University of Maryland

Page 2: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

What is a verifiable database?digest

δ

result proof

or

database

SQL query

Update

database’

Update

digest δ’

owner server

clientID na ag

e

1 ali 12

2 ba 24

Page 3: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Our contributions

IntegriDB: A system for Verifiable SQL

• Expressive: multidimensional RANGE, JOIN, SUM, COUNT, MAX, MIN, etc.; limited nesting Validated on native SQL queries from TPC-H benchmark

• Dynamic: Efficient updates

• Scalable: Validated on tables from TPC-H benchmark (6 million rows)

Page 4: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

Query #19 of the TPC-H benchmarkhttp://www.tpc.org/tpch

Example

Page 5: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

Query #19 of the TPC-H benchmarkhttp://www.tpc.org/tpch

Example

Executed on tables with 6 million rows (2.8GB)

Proof size: 184.16KBVerification time: 232ms

Page 6: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Metrics

result proof

or

database

SQL query

Update

Update

digest δ

owner server

clientID na ag

e

1 ali 12

2 ba 24

setuptime

prover time

proof size.“Optimal”: O(|

R|)

verification time

update time

Page 7: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Prior solutions• Generic verifiable computation systems• Circuit-based VC [PHCM13, BCTV13, CFHKNPZ14]• RAM-based VC [BFRSBW13, BCGTV13, BCTV14]Not practical

• Authenticated data structuresJoin Multidim

rangeFunction

sNested queries

Update

Tree-based [YPPK09]

Signature-based [PZM09]

Multi-range [PPT14] IntegriDB

Page 8: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

Page 9: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

Table T:

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

4 721 20 3.7 David

Result:

Page 10: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

18, {1}

20, {4}

21, {3}

24, {2}

(age, row_ID):

18, {1,4}

21, {2,3}

18, {1,2,3,4}

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

Page 11: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

18, {1}

20, {4}

21, {3}

24, {2}

18, {1,4}

21, {2,3}

18, {1,2,3,4}

Result of age: {1,4} U {3} = {1,3,4}

715, {3}

721, {4}

747, {1}

781, {2}

715, {3,4}

747, {1,2}

715, {1,2,3,4}

Result of student_ID: {4} U {1,2} = {1,2,4}

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

(student_ID, row_ID):

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

(age, row_ID):

Page 12: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queriesrow_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

18, {1}

20, {4}

21, {3}

24, {2}

18, {1,4}

21, {2,3}

18, {1,2,3,4}

Result of age: {1,4} U {3} = {1,3,4}

715, {3}

721, {4}

747, {1}

781, {2}

715, {3,4}

747, {1,2}

715, {1,2,3,4}

Result of student_ID: {4} U {1,2} = {1,2,4}

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

(student_ID, row_ID):

SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)

(age, row_ID):

Not efficient!!

Page 13: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queries

18, {1}

20, {4}

21, {3}

24, {2}

(age, row_ID):

18, {1,4}

21, {2,3}

18, {1,2,3,4}

( )

Bilinear accumulator: ( ) [Nguyen05]x A

x s

acc A g

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

Page 14: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Multidimensional range queries

18, acc({1}

)

20, acc({4}

)

21, acc({3

})

24, acc({2

})

(age, row_ID):

18, acc({1,4}

)

21, acc({2,3

})

18, acc({1,2,3,4

})

Final result: {1,3,4} ∩ {1,2,4} = {1, 4}

( )

Bilinear accumulator: ( ) [Nguyen05]x A

x s

acc A g

Page 15: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Join queries

row_ID student_ID course_ID

1 715 ENEE140

2 779 ENEE150

3 781 ENEE340

Table T’:

SELECT * FROM T JOIN T’ ON T.student_ID = T’.student_ID

student_ID age course_ID

715 21 ENEE140

781 24 ENEE340

Result:

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

GPA First_name

3.0 Cathy

3.3 Bob

Page 16: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Supporting summationWe extend the bilinear accumulator to support summation 1( )

New bilinear accumulator: '( ) x A

x s

acc A g

11 0

2 3 1 3 1 2 1 1 2

( ) ...

1 1 1 1... ( ... )

... ... ... ...

n

nx A

n

n n n n

x s a s a s a

s sx x x x x x x x x x x x

1 0 1 2/ ... ( )na a x x x sum A

Page 17: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

SQL SUM

SELECT SUM(age) FROM T: 83

(age, age):

18, '({18,20,21,24})acc

18, '({18,20})acc 21, '({21,24})acc

18, '({18})acc 20, '({20})acc 21, '({21})acc 24, '({24})acc

Page 18: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

SQL functions

SELECT MAX(age) FROM A WHERE A.GPA BETWEEN 3.0 AND 3.5 24

SELECT age FROM A WHERE (A.GPA BETWEEN 3.0 AND 3.5) AND (A.age >= 24)

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Table T:

SELECT COUNT(*) FROM A WHERE A.age BETWEEN 17 AND 22: 2

SELECT SUM(row_ID’) – SUM(row_ID) FROM A WHERE A.age BETWEEN 17 AND 22

row_ID’

2

3

4

5

Page 19: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Nested queriesrow_ID student_ID age GPA First_nam

e

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

row_ID student_ID course_ID

1 715 ENEE140

2 779 ENEE150

3 781 ENEE340

18, {747}

20, {721}

21, {715}

24, {781}

(T.age, T.student_ID):

18, {721,74

7}

21, {715,78

1}

18, {715,721,747,7

81}

1, {715

}

2, {779,78

1}

1, {715,715,78

1}

(T’.row_ID , T’.student_ID):

SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID

3, {781}

2, {779}

Page 20: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Nested queriesrow_ID student_ID course_ID

1 715 ENEE140

2 779 ENEE150

3 781 ENEE340

Final result: COUNT(({721} U {715,781}) {715,781}∩ )

SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID

18, {747}

20, {721}

21, {715}

24, {781}

(T.age, T.student_ID):

18, {721,74

7}

21, {715,78

1}

18, {715,721,747,7

81}

1, {715

}

2, {779,78

1}

1, {715,715,78

1}

(T’.row_ID , T’.student_ID):

3, {781}

2, {779}

row_ID student_ID age GPA First_name

1 747 18 3.5 Alice

2 781 24 3.3 Bob

3 715 21 3.0 Cathy

4 721 20 3.7 David

Page 21: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Efficient update

Updates can be done in logarithmic time

See our paper for details

Page 22: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Implementation

IntegriDB

Client

SQL Client

SQL query

ClientIntegriDB

queryIntegriD

B Server

SQL Server

subqueries

intermediate

results

Result& proof

Resultor

Server

IntegriDBData

Owner

digest ADS

database

Page 23: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

TPC-H #19

Page 24: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

TPC-H #19

Page 25: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );

TPC-H #19

Join query

Multi-range on part

Multi-range on lineitem

Page 26: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Performance on query #19

Setup time

Prover time

Verification time

Proof size

Update time

Digest size

25272.76s

6422.13s 232ms 184.16KB 150ms 256bits

lineitem: 6 million rows x 16 columns part: 200,000 rows x 9 columns

2.8GB 54MB

Disadvantages:• Slow setup time • Slow prover time

Advantages:• Small digest size• Fast verification time• Small proof size• Excellent update time

One-time cost

Page 27: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Expressiveness

• IntegriDB supports • 12 out of 22 queries in TPC-H benchmark• 94% of the queries in TPC-C benchmark

Page 28: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Future work

• Support larger class of SQL queries• Join with duplicates in nested queries• Aggregations among columns• Arbitrary nesting

• Improve prover time• Parallelization• Better data structures, precomputation• Faster crypto primitives

Page 29: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Code available at http://integridb.github.io/

Thank you!!!

Page 30: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

• Problems of generic circuit-based VC• Cannot support updates• Cannot support nested queries• All queries must be merged into one circuit

• Problems of generic RAM-based VC• Compiler of SQL queries introduces high overhead• All queries must be merged into one RAM program

• Our comparison: only 1 query, no compiler included for generic VC

Comparison to generic VC

Page 31: IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland.

Comparison to generic VC

Libsnark(circuit-based)

[BCTV13]

SNARKs for C(RAM-based)

[BCTV14]

IntegriDB

setup time 187.96s 2000s* 13.878s

prover time 47.57s 1000s* 10.420s

verification time 8ms 10ms* 112ms

proof size 288 Bytes 288 Bytes 84 KB

SELECT SUM(c1) FROM T WHERE (c1 BETWEEN a1 AND b1) AND

... AND (c10 BETWEEN a10 AND b10)

Sum on a 10-dimensional range query on a table of 1000x10 table

*estimation