Top Banner
Opaque: A Secure Distributed Data Analytics Framework Wenting Zheng, Ankur Dave, Jethro Beekman, Raluca Ada Popa, Joseph Gonzalez, Ion Stoica
207

Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Apr 12, 2017

Download

Data & Analytics

Spark Summit
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque: A Secure Distributed Data Analytics

FrameworkWenting Zheng, Ankur Dave, Jethro Beekman, Raluca Ada Popa, Joseph Gonzalez, Ion Stoica

Page 2: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Complex analytics run on sensitive data

client cloud provider

sensitive data

Page 3: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Complex analytics run on sensitive data

client

Spark SQL MLLib GraphX Spark

Streaming

cloud provider

sensitive data

Page 4: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Complex analytics run on sensitive data

client

Spark SQL MLLib GraphX Spark

Streaming

cloud provider

sensitive data

Page 5: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng
Page 6: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng
Page 7: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng
Page 8: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng
Page 9: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng
Page 10: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Threat model

client cloud provider

sensitive data

Page 11: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Threat model

client cloud provider

sensitive data

Page 12: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Threat model

client cloud provider

sensitive data

Page 13: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Threat model

client cloud provider

sensitive data

Page 14: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Challenge: protect data and preserve functionality

Page 15: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark SQL

Opaque*: secure data analytics

* Oblivious Platform for Analytic QUEries

SQL Machine Learning

Graph Analytics

Page 16: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark SQLOpaque

Opaque*: secure data analytics

* Oblivious Platform for Analytic QUEries

SQL Machine Learning

Graph Analytics

Page 17: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work

Page 18: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work• Computation on encrypted data

Page 19: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work• Computation on encrypted data

– A cryptographic approach using homomorphic encryption

Page 20: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work• Computation on encrypted data

– A cryptographic approach using homomorphic encryption– Either impractically slow (FHE), or limited functionality (CryptDB)

Page 21: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work• Computation on encrypted data

– A cryptographic approach using homomorphic encryption– Either impractically slow (FHE), or limited functionality (CryptDB)

• Hardware-based systems

Page 22: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work• Computation on encrypted data

– A cryptographic approach using homomorphic encryption– Either impractically slow (FHE), or limited functionality (CryptDB)

• Hardware-based systems– Use trusted hardware

Page 23: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work• Computation on encrypted data

– A cryptographic approach using homomorphic encryption– Either impractically slow (FHE), or limited functionality (CryptDB)

• Hardware-based systems– Use trusted hardware – Only single machine computation (Haven), or weaker security

guarantees (VC3)

Page 24: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Prior work• Computation on encrypted data

– A cryptographic approach using homomorphic encryption– Either impractically slow (FHE), or limited functionality (CryptDB)

• Hardware-based systems– Use trusted hardware – Only single machine computation (Haven), or weaker security

guarantees (VC3)

Opaque utilizes trusted hardware

Page 25: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Hardware enclaves

Page 26: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

Page 27: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

Page 28: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

Page 29: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

Untrusted OS

Page 30: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Enclave

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

Untrusted OS

Page 31: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Enclave

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

• Shielded execution

Untrusted OS

Page 32: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Enclave

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

• Shielded execution

Untrusted OS

Secret data

Page 33: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Enclave

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

• Shielded execution

Untrusted OS

Secret data Code

Page 34: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Enclave

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

• Shielded execution

• Encrypted enclave memory

Untrusted OS

Secret data Code

Page 35: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Enclave

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

• Shielded execution

• Encrypted enclave memory

• Software attestation

Untrusted OS

Secret data Code

Page 36: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Enclave

Hardware enclaves• Hardware-protected containers in

presence of malicious OS

• Shielded execution

• Encrypted enclave memory

• Software attestation

• Example: Intel SGX, AMD memory encryption Untrusted OS

Secret data Code

Page 37: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization

Database

Client Server

Page 38: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

Database

Client Server

Page 39: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

Opaque SQL operators

Database

Client Server

Page 40: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

Opaque SQL operators

Database

Client Server

Page 41: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

Opaque SQL operators

hash

Database

Client Server

Page 42: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

Opaque SQL operators

hash

Database

Client Server

Page 43: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

Opaque SQL operators

hash

Database

Client Server

Page 44: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

Database

Client Server

Page 45: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

• Key exchange protocol

Database

Client Server

Page 46: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

• Key exchange protocol

Database

Client Server

Page 47: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

• Key exchange protocol

Database

Client Server

Page 48: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

• Key exchange protocol

Database

Client Server

Page 49: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

• Key exchange protocol

Database

Client Server

Page 50: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

• Key exchange protocol

Database

Client Server

Page 51: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

System initialization• Remote

attestation to load Opaque code

• Key exchange protocol

• This is NOT on a per-query basis Database

Client Server

Page 52: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3

Page 53: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 54: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3

Query

query = SELECT sum(*) FROM table

Page 55: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3

Query

query = SELECT sum(*) FROM table

Page 56: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 57: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 58: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 59: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3query = SELECT sum(*) FROM table

Page 60: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

1 2 3

query = SELECT sum(*) FROM table

Page 61: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

10 13 4

Page 62: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

Query execution

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

10

13

4

Page 63: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst 27

Query execution

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

Page 64: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Spark Driver

Opaque

Catalyst

27

Query execution

Client Server

Database

Scheduler

query = SELECT sum(*) FROM table

Page 65: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: cloud can alter distributed computation

Page 66: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: cloud can alter distributed computation

• Drop data

Page 67: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: cloud can alter distributed computation

• Drop data• Modify data

Page 68: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: cloud can alter distributed computation

• Drop data• Modify data• Skip task

Page 69: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: cloud can alter distributed computation

• Drop data• Modify data• Skip task• Replay old state

Page 70: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Example: drop data

Spark Driver

Opaque

Catalyst

Server

Database

Scheduler

1 2 3

Client

query = SELECT sum(*) FROM table

Page 71: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Example: drop data

Spark Driver

Opaque

Catalyst

Server

Database

Scheduler

1 2 3

Client

query = SELECT sum(*) FROM table

Page 72: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Example: drop data

Spark Driver

Opaque

Catalyst

Server

Database

Scheduler

10 13 4

Client

query = SELECT sum(*) FROM table

Page 73: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Example: drop data

Spark Driver

Opaque

Catalyst

Server

Database

Scheduler

10

13

Client

query = SELECT sum(*) FROM table

Page 74: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Example: drop data

Spark Driver

Opaque

Catalyst

Server

Database

Scheduler

23

Client

query = SELECT sum(*) FROM table

Page 75: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Example: drop data

Spark Driver

Opaque

Catalyst

Server

Database

Scheduler

23

Client

query = SELECT sum(*) FROM table

Page 76: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

Page 77: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

Invariant: if computation does not abort, the execution completed so far is correct

Page 78: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

Invariant: if computation does not abort, the execution completed so far is correct

Page 79: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

Invariant: if computation does not abort, the execution completed so far is correct

If the computation is complete, then the entire query was executed correctly

Page 80: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 81: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 82: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 83: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

10

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 84: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

10

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 85: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

10

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 86: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

10

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 87: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

10

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 88: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

10

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 89: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Self-verifying computation

20

1413 15

10

13

4

Task 13

Task 14

Task 15

Task 20

query = SELECT sum(*) FROM table

Page 90: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

ID Name Age Disease12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes98329 Ronald S. Ogden 53 Cancer

medical table:

Problem: access pattern leakage

32591 Donna R. Bridges 26 Diabetes

Page 91: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

ID Name Age Disease

SELECT count(*) FROM medical GROUP BY disease

12809 Amanda D. Edwards 40 Diabetes29489 Robert R. McGowan 56 Diabetes13744 Kimberly R. Seay 51 Cancer18740 Dennis G. Bates 32 Diabetes98329 Ronald S. Ogden 53 Cancer

medical table:

Problem: access pattern leakage

32591 Donna R. Bridges 26 Diabetes

Page 92: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: access pattern leakage

SELECT count(*) FROM medical GROUP BY disease

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 93: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: access pattern leakage

SELECT count(*) FROM medical GROUP BY disease

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 94: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: access pattern leakage

SELECT count(*) FROM medical GROUP BY disease

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Page 95: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Problem: access pattern leakage12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Attack viable for both memory and network access patterns!

Page 96: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

Page 97: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

Oblivious primitives

Page 98: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Page 99: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Opaque operators

Page 100: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Opaque operators

project-filter

low-cardinality agg.

sort-merge join

broadcast join …

Page 101: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Opaque operators

Oblivious operators

project-filter

low-cardinality agg.

sort-merge join

broadcast join …

Page 102: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Opaque operators

Oblivious operators

project-filter

low-cardinality agg.

sort-merge join

broadcast join …

Oblivious Filter

Oblivious Sort

Oblivious Aggregation

Oblivious Join

Page 103: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Opaque operators

Oblivious operators

project-filter

low-cardinality agg.

sort-merge join

broadcast join …

Oblivious Filter

Oblivious Sort

Oblivious Aggregation

Oblivious Join

Oblivious Query Plan

Page 104: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Opaque operators

Oblivious operators

project-filter

low-cardinality agg.

sort-merge join

broadcast join …

Oblivious Filter

Oblivious Sort

Oblivious Aggregation

Oblivious Join

Oblivious Query Plan{

Page 105: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious mode

intra-machine o-sort

inter-machine o-sort

random permutation

Oblivious primitives

Opaque operators

Oblivious operators

project-filter

low-cardinality agg.

sort-merge join

broadcast join …

Oblivious Filter

Oblivious Sort

Oblivious Aggregation

Oblivious Join

Oblivious Query Plan{Query optimization

Page 106: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious sort

SELECT count(*) FROM medical GROUP BY disease

Map Sort

Page 107: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious sort

SELECT count(*) FROM medical GROUP BY disease

Map Sort

Page 108: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Oblivious aggregation

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious sort

SELECT count(*) FROM medical GROUP BY disease

Map Sort

Page 109: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Map Sort

Oblivious sort

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 110: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Sort

Oblivious sort

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 111: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 112: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan

Statistics

Statistics

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 113: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Statistics

Statistics

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 114: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Statistics

Statistics

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 115: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

2; 1

2; 2

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 116: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

2; 1

2; 2

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Result size

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 117: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

2; 1

2; 2

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 118: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

2; 1

2; 2

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Offset

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 119: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

2; 1

2; 2

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 120: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

2; 1

2; 2

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing Scan

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 121: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

2; 1

2; 2

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing Scan

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 122: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing Scan

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 123: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing Scan

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 124: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing Scan

Cancer:2

Diabetes:3

Diabetes:1

DUMMY

Final result

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 125: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

12809 … Diabetes

29489 … Diabetes

13744 … Cancer

18740 … Diabetes

98329 … Cancer

32591 … Diabetes

Scan Boundary processing Scan Final

result

Cancer:2

Diabetes:4

Oblivious aggregationSELECT count(*) FROM medical GROUP BY disease

Page 126: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes

Page 127: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

Page 128: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

Page 129: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

Page 130: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

Page 131: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

Snapshot attacker e.g. external hacker

Page 132: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

• Oblivious mode

Snapshot attacker e.g. external hacker

Page 133: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

• Oblivious mode– Additionally hide

access patterns

Snapshot attacker e.g. external hacker

Page 134: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

• Oblivious mode– Additionally hide

access patterns

Snapshot attacker e.g. external hacker

Page 135: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

• Oblivious mode– Additionally hide

access patterns

Snapshot attacker e.g. external hacker

Persistent attacker e.g. insider

Page 136: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Opaque modes• Encryption mode

– Data encryption and authentication

– Computation is integrity protected

• Oblivious mode– Additionally hide

access patterns

Snapshot attacker e.g. external hacker

Persistent attacker e.g. insider

Trade off: performance and security

Page 137: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Obliv. sort

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 138: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Obliv. sort

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 139: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Obliv. sort

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 140: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Obliv. sort

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 141: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Obliv. sort

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 142: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Obliv. sort

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 143: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Obliv. sort

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 144: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Page 145: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Project-filter

Filter

Query optimization - oblivious

SELECT count(*) FROM medical WHERE age > 30 GROUP BY disease

Low-card. obliv. agg.

Scan

Obliv. sort

Aggregate

medical

Reduced # of oblivious sorts by 1

Page 146: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

Query optimization - mixed sensitivity

Page 147: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

Query optimization - mixed sensitivity

Page 148: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

Query optimization - mixed sensitivity

Page 149: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

Page 150: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

Page 151: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Page 152: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

|P| < |D| < |M|

Page 153: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Patient Disease

⨝Medication

⨝ᵞ

|P| < |D| < |M|

Page 154: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Patient Disease

⨝Medication

⨝ᵞ

SQL join order

|P| < |D| < |M|

Page 155: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Patient Disease

⨝Medication

⨝ᵞ

SQL join order

|P| < |D| < |M|

Page 156: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Patient Disease

⨝Medication

⨝ᵞ

Patient

Disease

Medication

⨝ ᵞ

SQL join order

|P| < |D| < |M|

Page 157: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Patient Disease

⨝Medication

⨝ᵞ

Patient

Disease

Medication

⨝ ᵞ

SQL join order

Opaque join order |P| < |D| < |M|

Page 158: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Query optimization - mixed sensitivity

D_IDAGENAMEP_ID

END_DATESTART_DATE

PID

COMMENTDOCTOR

T_ID

DOSAGEEND_TIMESTART_TIME

M_IDT_ID

COMMENT

DATE

TR_ID

G_IDNAMED_ID

COSTD_IDNAMEM_ID

COMMENTNAMEG_ID

Patient (P_)

Treatment Plan (TP_)

Treatment Record (TR_)

Disease (D_) Medication (M_)

Gene (G_)

SELECT p_name, d_name, med_costFROM patient, disease, (SELECT d_id, min(cost) AS med_cost FROM medication GROUP BY d_id) AS medWHERE disease.d_id = patient.d_id AND disease.d_id = med.d_id

Patient Disease

⨝Medication

⨝ᵞ

Patient

Disease

Medication

⨝ ᵞ

SQL join order

Opaque join order |P| < |D| < |M|

Page 159: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation setup

Page 160: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation setup• Single machine experiments

– Intel Xeon E3-1280 v5, 4 cores, 64 GB RAM – Intel SGX: 128 MB of enclave page cache (EPC) – Hardware mode

Page 161: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation setup• Single machine experiments

– Intel Xeon E3-1280 v5, 4 cores, 64 GB RAM – Intel SGX: 128 MB of enclave page cache (EPC) – Hardware mode

• Distributed experiments – EC2: five r3.2xlarge instances, 8 cores, 61 GB RAM – Simulation mode only

Page 162: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation

Page 163: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation• How does Opaque compare to Spark SQL?

Page 164: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation• How does Opaque compare to Spark SQL?

– Big Data Benchmark (BDB)

Page 165: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation• How does Opaque compare to Spark SQL?

– Big Data Benchmark (BDB)• Queries 1, 2, 3: filter, aggregation, join

Page 166: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation• How does Opaque compare to Spark SQL?

– Big Data Benchmark (BDB)• Queries 1, 2, 3: filter, aggregation, join• 1 million records

Page 167: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation• How does Opaque compare to Spark SQL?

– Big Data Benchmark (BDB)• Queries 1, 2, 3: filter, aggregation, join• 1 million records

• How does Opaque compare to state-of-the-art oblivious systems?

Page 168: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation• How does Opaque compare to Spark SQL?

– Big Data Benchmark (BDB)• Queries 1, 2, 3: filter, aggregation, join• 1 million records

• How does Opaque compare to state-of-the-art oblivious systems?

– GraphSC (graph analytics)

Page 169: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Evaluation• How does Opaque compare to Spark SQL?

– Big Data Benchmark (BDB)• Queries 1, 2, 3: filter, aggregation, join• 1 million records

• How does Opaque compare to state-of-the-art oblivious systems?

– GraphSC (graph analytics)• PageRank

Page 170: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Single machine

Page 171: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Single machine

Page 172: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Single machine

Page 173: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Single machine

Page 174: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

DistributedSingle machine

Page 175: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

DistributedSingle machine

Page 176: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

DistributedSingle machine

Page 177: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

DistributedSingle machine

Page 178: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Big Data Benchmark (encryption mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Distributed

With very little cost, you will have data encryption, authentication and computation protection!

Single machine

Page 179: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Single machine Distributed

Big Data Benchmark (oblivious mode)

Page 180: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Single machine Distributed

Big Data Benchmark (oblivious mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Page 181: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Single machine Distributed

Big Data Benchmark (oblivious mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Page 182: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Single machine Distributed

Big Data Benchmark (oblivious mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Page 183: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Run

time

(s)

0.01

0.1

1

10

100

Query number

Query 1 Query 2 Query 3

Spark SQL Opaque

Single machine Distributed

Big Data Benchmark (oblivious mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Page 184: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Run

time

(s)

0.01

0.1

1

10

100

Query number

Query 1 Query 2 Query 3

Spark SQL Opaque

Single machine Distributed

Big Data Benchmark (oblivious mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Page 185: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Run

time

(s)

0.01

0.1

1

10

100

Query number

Query 1 Query 2 Query 3

Spark SQL Opaque

Single machine Distributed

Big Data Benchmark (oblivious mode)

Run

time

(s)

0.01

0.1

1

10

100

Query numberQuery 1 Query 2 Query 3

Spark SQL Opaque

Page 186: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

PageRank: comparison with GraphSC

Page 187: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

We have an NSDI 2017 paper!

Page 188: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release

Page 189: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release• Available at github.com/ucbrise/opaque

Page 190: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release• Available at github.com/ucbrise/opaque• Opaque is implemented as a Spark package

Page 191: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release• Available at github.com/ucbrise/opaque• Opaque is implemented as a Spark package• Features

Page 192: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release• Available at github.com/ucbrise/opaque• Opaque is implemented as a Spark package• Features

– Supports DataFrame select, filter, group by, join

Page 193: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release• Available at github.com/ucbrise/opaque• Opaque is implemented as a Spark package• Features

– Supports DataFrame select, filter, group by, join– Allows users to specify DataFrames in encryption/

oblivious modes

Page 194: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release• Available at github.com/ucbrise/opaque• Opaque is implemented as a Spark package• Features

– Supports DataFrame select, filter, group by, join– Allows users to specify DataFrames in encryption/

oblivious modes• Automatic sensitivity propagation in mixed

sensitivity

Page 195: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release

Page 196: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

Page 197: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++

Page 198: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet

Page 199: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet– Possible solutions

Page 200: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet– Possible solutions

• Automatically generate C++

Page 201: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet– Possible solutions

• Automatically generate C++• Run JVM in the enclave

Page 202: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet– Possible solutions

• Automatically generate C++• Run JVM in the enclave

• Deployment

Page 203: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet– Possible solutions

• Automatically generate C++• Run JVM in the enclave

• Deployment– Master must be trusted

Page 204: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet– Possible solutions

• Automatically generate C++• Run JVM in the enclave

• Deployment– Master must be trusted– SGX available now on Skylake processors

Page 205: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Open source release • Extension

– More functionality requires rewriting operators in C++– No UDF support yet– Possible solutions

• Automatically generate C++• Run JVM in the enclave

• Deployment– Master must be trusted– SGX available now on Skylake processors

• Cloud providers have no support yet

Page 206: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

Demo

Page 207: Opaque: A Data Analytics Platform with Strong Security: Spark Summit East talk by Wenting Zheng

ConclusionOpaque is a secure distributed analytics platform

Opaque

SQL Machine Learning

Graph Analytics

Try it out at github.com/ucbrise/opaque Wenting Zheng - [email protected]