Top Banner
C2D: Conclave Cloud Dataverse Privacy-Preserving Scientific Data Analysis in an Open Cloud Mayank Varia, Andrei Lapets, Ata Turk, Orran Krieger, Robert Bartlett Baron, Ben Getchell, Nicolas Haddad, Parul Singh
13

Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Jul 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

C2D: Conclave Cloud Dataverse

Privacy-Preserving Scientific Data Analysisin an Open Cloud

Mayank Varia, Andrei Lapets, Ata Turk, Orran Krieger, Robert Bartlett Baron, Ben Getchell, Nicolas Haddad, Parul Singh

Page 2: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Data Utility vs Data Privacy

• Companies in MA want to compute average salary differences across genders, ethnicities, ... without exposing average salary of any company

• Tier-1 trauma centers in Boston want to generate aggregate reports about cases they service without revealing any patient data• E.g. how many trauma cases they serviced during the marathon bombing

• Researchers in hospitals want to generate aggregate statistics about rare diseases across multiple hospitals without revealing patient data

• Companies want to run data analytics in the public cloud but do not trust a single public cloud provider

Page 3: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Privacy-Preserving Scientific Data Analysis in an Open Cloud

Mass Open Cloud

• Multi-vendor public cloud datacenter

• Collaborative effort: 5 universities, government, industry

Page 4: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Privacy-Preserving Scientific Data Analysis in an Open Cloud

Dataverse Mass Open Cloud

• Open-source platform for data repositories

• Mechanisms to control access

• Incentives to share and credit use of data

Page 5: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Privacy-Preserving Scientific Data Analysis in an Open Cloud

Page 6: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Images: Facebook, Wikipedia

Toxicsilo data → safeguard privacy

Valuableshare data → new social insights

Page 7: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Toxicsilo data → safeguard privacy

Valuableshare data → new social insights

MPC enables secure data analysis for social good

and

Page 8: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Conclave: MPC for relational queries on big data

MPC query compilation from (unannotated) relational queries

• Static analysis to minimize MPC use while maintaining security

• Trust annotations to indicate when data sharing in the clear is acceptable for even better performance

Prototype implementation that:• Connects to existing backend data stacks like Spark and Hadoop

• Scales 4 magnitudes higher than most MPC engines (~100 GB range)

Code at https://github.com/cici-conclave

Page 9: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

• C2D framework runs on containers

• Each container stores data owned by a single project

• Containers never share data with each other

• Built an OpenShift / K8s container orchestration product with

• In-built job framework

• Capability to manage slack resources on MOC

• Integrate with Elastic Secure Infrastructure to build trusted secure bare-metal enclaves for parties is ongoing

• Demo video at https://youtu.be/_vEJmd_rO-0

The C2D framework

Page 10: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Benefits of integration

Benefit:Bring cryptographically secure computing to where the data live

Page 11: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Benefits of integration

Benefit:Leverage unique open cloud environment to improve performance

Page 12: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Benefits of integration

Synergistic payoff: Separate the responsibilities and amortize the effort of each expert (developers, IT staff, privacy experts, etc.)

Page 13: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Thanks!

Conclave