IntegriDB: Verifiable SQL for Outsourced Databases Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland
IntegriDB: Verifiable SQL for Outsourced Databases
Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou
University of Maryland
What is a verifiable database?digest
δ
result proof
or
database
SQL query
Update
database’
Update
digest δ’
owner server
clientID na ag
e
1 ali 12
2 ba 24
Our contributions
IntegriDB: A system for Verifiable SQL
• Expressive: multidimensional RANGE, JOIN, SUM, COUNT, MAX, MIN, etc.; limited nesting Validated on native SQL queries from TPC-H benchmark
• Dynamic: Efficient updates
• Scalable: Validated on tables from TPC-H benchmark (6 million rows)
1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );
Query #19 of the TPC-H benchmarkhttp://www.tpc.org/tpch
Example
1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );
Query #19 of the TPC-H benchmarkhttp://www.tpc.org/tpch
Example
Executed on tables with 6 million rows (2.8GB)
Proof size: 184.16KBVerification time: 232ms
Metrics
result proof
or
database
SQL query
Update
Update
digest δ
owner server
clientID na ag
e
1 ali 12
2 ba 24
setuptime
prover time
proof size.“Optimal”: O(|
R|)
verification time
update time
Prior solutions• Generic verifiable computation systems• Circuit-based VC [PHCM13, BCTV13, CFHKNPZ14]• RAM-based VC [BFRSBW13, BCGTV13, BCTV14]Not practical
• Authenticated data structuresJoin Multidim
rangeFunction
sNested queries
Update
Tree-based [YPPK09]
Signature-based [PZM09]
Multi-range [PPT14] IntegriDB
Multidimensional range queriesrow_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Table T:
Multidimensional range queriesrow_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)
Table T:
row_ID student_ID age GPA First_name
1 747 18 3.5 Alice
4 721 20 3.7 David
Result:
Multidimensional range queriesrow_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Table T:
18, {1}
20, {4}
21, {3}
24, {2}
(age, row_ID):
18, {1,4}
21, {2,3}
18, {1,2,3,4}
SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)
Multidimensional range queriesrow_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Table T:
18, {1}
20, {4}
21, {3}
24, {2}
18, {1,4}
21, {2,3}
18, {1,2,3,4}
Result of age: {1,4} U {3} = {1,3,4}
715, {3}
721, {4}
747, {1}
781, {2}
715, {3,4}
747, {1,2}
715, {1,2,3,4}
Result of student_ID: {4} U {1,2} = {1,2,4}
Final result: {1,3,4} ∩ {1,2,4} = {1, 4}
(student_ID, row_ID):
SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)
(age, row_ID):
Multidimensional range queriesrow_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Table T:
18, {1}
20, {4}
21, {3}
24, {2}
18, {1,4}
21, {2,3}
18, {1,2,3,4}
Result of age: {1,4} U {3} = {1,3,4}
715, {3}
721, {4}
747, {1}
781, {2}
715, {3,4}
747, {1,2}
715, {1,2,3,4}
Result of student_ID: {4} U {1,2} = {1,2,4}
Final result: {1,3,4} ∩ {1,2,4} = {1, 4}
(student_ID, row_ID):
SELECT * FROM T WHERE (T.age BETWEEN 17 AND 22) AND (T.student_ID > 720)
(age, row_ID):
Not efficient!!
Multidimensional range queries
18, {1}
20, {4}
21, {3}
24, {2}
(age, row_ID):
18, {1,4}
21, {2,3}
18, {1,2,3,4}
( )
Bilinear accumulator: ( ) [Nguyen05]x A
x s
acc A g
Final result: {1,3,4} ∩ {1,2,4} = {1, 4}
Multidimensional range queries
18, acc({1}
)
20, acc({4}
)
21, acc({3
})
24, acc({2
})
(age, row_ID):
18, acc({1,4}
)
21, acc({2,3
})
18, acc({1,2,3,4
})
Final result: {1,3,4} ∩ {1,2,4} = {1, 4}
( )
Bilinear accumulator: ( ) [Nguyen05]x A
x s
acc A g
Join queries
row_ID student_ID course_ID
1 715 ENEE140
2 779 ENEE150
3 781 ENEE340
Table T’:
SELECT * FROM T JOIN T’ ON T.student_ID = T’.student_ID
student_ID age course_ID
715 21 ENEE140
781 24 ENEE340
Result:
row_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Table T:
GPA First_name
3.0 Cathy
3.3 Bob
Supporting summationWe extend the bilinear accumulator to support summation 1( )
New bilinear accumulator: '( ) x A
x s
acc A g
11 0
2 3 1 3 1 2 1 1 2
( ) ...
1 1 1 1... ( ... )
... ... ... ...
n
nx A
n
n n n n
x s a s a s a
s sx x x x x x x x x x x x
1 0 1 2/ ... ( )na a x x x sum A
row_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Table T:
SQL SUM
SELECT SUM(age) FROM T: 83
(age, age):
18, '({18,20,21,24})acc
18, '({18,20})acc 21, '({21,24})acc
18, '({18})acc 20, '({20})acc 21, '({21})acc 24, '({24})acc
SQL functions
SELECT MAX(age) FROM A WHERE A.GPA BETWEEN 3.0 AND 3.5 24
SELECT age FROM A WHERE (A.GPA BETWEEN 3.0 AND 3.5) AND (A.age >= 24)
row_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Table T:
SELECT COUNT(*) FROM A WHERE A.age BETWEEN 17 AND 22: 2
SELECT SUM(row_ID’) – SUM(row_ID) FROM A WHERE A.age BETWEEN 17 AND 22
row_ID’
2
3
4
5
Nested queriesrow_ID student_ID age GPA First_nam
e
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
row_ID student_ID course_ID
1 715 ENEE140
2 779 ENEE150
3 781 ENEE340
18, {747}
20, {721}
21, {715}
24, {781}
(T.age, T.student_ID):
18, {721,74
7}
21, {715,78
1}
18, {715,721,747,7
81}
1, {715
}
2, {779,78
1}
1, {715,715,78
1}
(T’.row_ID , T’.student_ID):
SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID
3, {781}
2, {779}
Nested queriesrow_ID student_ID course_ID
1 715 ENEE140
2 779 ENEE150
3 781 ENEE340
Final result: COUNT(({721} U {715,781}) {715,781}∩ )
SELECT COUNT(student_ID) FROM (SELECT * FROM T WHERE age>19) JOIN (SELECT * FROM T’ WHERE row_ID >1) ON T.student_ID = T’.student_ID
18, {747}
20, {721}
21, {715}
24, {781}
(T.age, T.student_ID):
18, {721,74
7}
21, {715,78
1}
18, {715,721,747,7
81}
1, {715
}
2, {779,78
1}
1, {715,715,78
1}
(T’.row_ID , T’.student_ID):
3, {781}
2, {779}
row_ID student_ID age GPA First_name
1 747 18 3.5 Alice
2 781 24 3.3 Bob
3 715 21 3.0 Cathy
4 721 20 3.7 David
Efficient update
Updates can be done in logarithmic time
See our paper for details
Implementation
IntegriDB
Client
SQL Client
SQL query
ClientIntegriDB
queryIntegriD
B Server
SQL Server
subqueries
intermediate
results
Result& proof
Resultor
Server
IntegriDBData
Owner
digest ADS
database
1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );
TPC-H #19
1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );
TPC-H #19
1. SELECT SUM (l_extendedprice * (1 - l_discount)) AS revenue FROM lineitem, part WHERE2. ( p_partkey = l_partkey3. AND p_brand = ‘Brand#41’4. AND p_container IN (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)5. AND l_quantity >= 7 AND l_quantity <= 7 + 106. AND p_size BETWEEN 1 AND 57. AND l_shipmode IN (‘AIR’, ‘AIR REG’)8. AND l_shipinstruct = ‘DELIVER IN PERSON’ )9. OR10. ( p_partkey = l_partkey11. AND p_brand = ‘Brand#14’12. AND p_container IN (‘MED BAG’, ‘MED BOX’,‘MED PKG’, ‘MED PACK’)13. AND l_quantity >= 14 AND l_quantity <= 14 + 1014. AND p_size BETWEEN 1 AND 1015. AND l_shipmode IN (‘AIR’, ‘AIR REG’)16. AND l_shipinstruct = ‘DELIVER IN PERSON’ )17. OR18. ( p_partkey = l_partkey19. AND p_brand = ‘Brand#23’20. AND p_container IN (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)21. AND l_quantity >= 25 AND l_quantity <= 25 + 1022. AND p_size BETWEEN 1 AND 1523. AND l_shipmode IN (‘AIR’, ‘AIR REG’)24. AND l_shipinstruct = ‘DELIVER IN PERSON’ );
TPC-H #19
Join query
Multi-range on part
Multi-range on lineitem
Performance on query #19
Setup time
Prover time
Verification time
Proof size
Update time
Digest size
25272.76s
6422.13s 232ms 184.16KB 150ms 256bits
lineitem: 6 million rows x 16 columns part: 200,000 rows x 9 columns
2.8GB 54MB
Disadvantages:• Slow setup time • Slow prover time
Advantages:• Small digest size• Fast verification time• Small proof size• Excellent update time
One-time cost
Expressiveness
• IntegriDB supports • 12 out of 22 queries in TPC-H benchmark• 94% of the queries in TPC-C benchmark
Future work
• Support larger class of SQL queries• Join with duplicates in nested queries• Aggregations among columns• Arbitrary nesting
• Improve prover time• Parallelization• Better data structures, precomputation• Faster crypto primitives
Code available at http://integridb.github.io/
Thank you!!!
• Problems of generic circuit-based VC• Cannot support updates• Cannot support nested queries• All queries must be merged into one circuit
• Problems of generic RAM-based VC• Compiler of SQL queries introduces high overhead• All queries must be merged into one RAM program
• Our comparison: only 1 query, no compiler included for generic VC
Comparison to generic VC
Comparison to generic VC
Libsnark(circuit-based)
[BCTV13]
SNARKs for C(RAM-based)
[BCTV14]
IntegriDB
setup time 187.96s 2000s* 13.878s
prover time 47.57s 1000s* 10.420s
verification time 8ms 10ms* 112ms
proof size 288 Bytes 288 Bytes 84 KB
SELECT SUM(c1) FROM T WHERE (c1 BETWEEN a1 AND b1) AND
... AND (c10 BETWEEN a10 AND b10)
Sum on a 10-dimensional range query on a table of 1000x10 table
*estimation