GraphSC: Parallel Secure Computation Made Easy Kartik Nayak With Xiao Shaun Wang, Stratis Ioannidis, Udi Weinsberg, Nina Taft, Elaine Shi 1.

1

GraphSC: Parallel Secure Computation Made Easy

Kartik Nayak

With Xiao Shaun Wang, Stratis Ioannidis, Udi Weinsberg, Nina

Taft, Elaine Shi

2

Users

Data

Data

Privacy concern!

Data Mining Engine

Data Model

Data Mining on User Data

3

Graph representing social connections

Graph representing professional connections

Compute user’s influence in both

circles

Companies Computing on Private Data

4

Companies want to run

machine learning algorithms

Users/Companies do NOT want to reveal data

Can we enable this in practice?

Cryptography to the rescue:

Secure Multiparty Computation

Ensures that we learn only the outcome

6

Key Challenges

Generic Solutions1

Lot of work improving individual algorithmsDeparture from one-at-a-time approach

7

Key Challenges

Convert Program to

Run on Secure Computation

(Cost of obliviousness)

2

8

Key Challenges

Parallelizability3There’s a lot of data – maintain benefits of parallelism in the insecure settingWith cryptography, expensive computation

9

Key Contributions

10

Key Contributions

Generic Framework for

“Graph-parallel”

AlgorithmsPregel

by

PageRank

Matrix Factorization using gradient descent

Risk Minimization using ADMMAnd many more

Matrix Factorization using ALS

Challenge:Generic

Solutions

11

Key Contributions

Efficiently Convert Graph-parallel Programs to

Oblivious ProgramsTotal work blowup is O(log |

V|)

Blowup for naïve solution: O(|V|) for sparse graphs

Challenge:Convert

program to run on Secure

Computation

12

Key Contributions

Maintain Parallelizability

Depth of the computation is O(log |V|)

Matrix Factorization:4K ratings, 32 threads [NIWJTB’13]1.4 hours

Challenge:Parallelizability

< 4 mins

13

1

2

3

Efficiently Convert to Oblivious Programs

Maintain Parallelizability

Generic Framework for Graph-parallel Algorithms

Key Contributions

14

function bs(val, s, t) mid = (s + t) / 2; if (val < mem[mid]) bs(val, 0, mid) else bs(val, mid+1, t)

Programmer’s favorite model

Cryptographer’s favorite model

15

Programmer’s model:

Programs

Oblivious Programs

Cryptographer’s model: Circuits

Intuitively,Program traces should not depend on

input data

16

function bs(val, s, t) mid = (s + t) / 2; if (val < mem[mid]) bs(val, 0, mid) else bs(val, mid+1, t)

Programmer’s favorite model

Cryptographer’s favorite model

17

Programmer’s model:

Programs

Oblivious Programs

Cryptographer’s model: Circuits

Intuitively,Program traces should not depend on

input data

Easy

Hard

18

Achieving Parallelism

Goal: Low Depth

Circuits

Oblivious Parallel RAM [BCP’14]

Polylogarithmic Blowup:

Not practical

GraphSC: O(log |V|) blowup

19

Pregel by

“Graph-parallel” algorithms

[LGKB’10, GLGBG’12, MABDHLC’10, ZCF’10]

20

Graph-parallel Algorithms

A

B

C

D

1

2

4

5

1

1

1

2

47

1

0

1Scatter: Send data to

edges

Gather: Aggregate data from edges

Apply: Perform some computation

21

Obliviousness of Graph-parallel Algorithms

Do not reveal edge/vertex data

Do not reveal structure of the graph

Naïve Solution: O(|V|2)

A

B

C

D1

1

1

2

4 Our Solution:

O(|E| log|V|)

7

1

0

1

22

Oblivious Gather – Key Trick

3

4

1

2

Oblivious Gather – Key Trick

Oblivious Sort with (v, isVertex)

Single pass

Sort: O(|E| log |V|)Single pass: O(|E|)

Oblivious Gather: (|E| log |V|)

Gather in clear: O(|E|)

24

Scatter

Complexity of Our Algorithms

Gather

Apply

Sequential Insecure (Total Work)

Parallel Oblivious (Total Work)

ParallelOblivious

(Parallel Time)

O(|E|)

O(|V|)

O(|E| log |V|)

O(|E|)

O(log |V|)

O(1)

NaïveOblivious(Total Work)

O(|V|2)

O(|E|)

25

Algorithms on GraphSCHistogram computationPageRankMatrix Factorization using gradient descentMatrix Factorization using alternating least squares

Bellman-Ford shortest path

Bipartite matching

Parallel empirical risk minimization through alternating direction method of multipliers (ADMM)

Pregel by

26

Experimental Setup

……Cloud 1

(Garblers)Cloud 2

(Evaluators)Two Scenarios:1. LAN2. Across Data

Centers (WAN)

27

Key Evaluation Results

Histogram

Input Size

1K – 0.5M

Parallel Time (32

processors)4 sec – 34 minPageRank (1

iteration)Matrix

Factorization (1

iteration)

Using GD

Using ALS

4K – 128K

20 sec – 15.5 min1K –

32K47 sec – 34 min

64 – 4K 2 min – 2.35 hours

28

Max: 16K ratings (64x smaller data) [NIWJTB’13]

Running at Scale

Matrix Factorization using gradient descent: 1M ratings, 6K users, 4K movies [KBV’09]

7 machine cluster, 128 processors, 525 GB RAM

Time taken: ~13 hours(1 iteration)

4K ratings, 32 threads1.4 hours < 4 mins

We used only 7 machines!

13 hours -> few mins by using more machines

29

Across Data Centers

Page RankGarblers: OregonEvaluators: N. Virginia

B/W provisioned: 2 Gbps

Time reduces linearly with increasing processors

30

GraphSC is a parallel secure computation framework for Graph-

parallel algorithms

www.oblivm.com

Thank [email protected]

Conclusion

GraphSC: Parallel Secure Computation Made Easy Kartik Nayak With Xiao Shaun Wang, Stratis Ioannidis, Udi Weinsberg, Nina Taft, Elaine Shi 1.

Documents