Top Banner
Stale Synchronous Parallel Iterations on Flink TRAN Nam-Luc / Engineer @EURA NOVA Research & Development FLINK FORWARD 2015 BERLIN, GERMANY OCTOBER 2015
39

Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Jan 08, 2017

Download

Technology

Flink Forward
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Stale Synchronous Parallel Iterations on Flink

TRAN Nam-Luc / Engineer @EURA NOVA Research & Development

FLINK FORWARD 2015BERLIN, GERMANYOCTOBER 2015

Page 2: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Our people:

40 employees from business engineers to data scientists

7 freelances 3 founding partners

EURA NOVA?OUR INNOVATION-DRIVEN MODEL & DISRUPTIVE CULTURE

KEY FIGURES“EURA NOVA is a team of passionated IT

experts devoted to providing knowledge & skills to people with great ideas”

Data Science, Distributed computing, Software engineering, Big Data.

Our researches Since 2009

2 Phd thesis & 18 master thesis

with 4 renowned Universities20 publications

in conferences as lecturer4 large R&D projects

3 open-source products

Page 3: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

How not to synchronize workers

Worker 1

Worker 2Worker 3

Worker 4

Worker 6

Worker 5

Worker 7

Worker 8

Worker 9

Worker 10

STRAGGLER

Page 4: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Bulk Synchronous Parallelism synchronizes threads after each iteration.

THE BIG PICTURE

4

There are always stragglers in a cluster.

In large clusters, that causes a lot of workers waiting !

Page 5: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Gonna dig me a hole (gonna dig me a hole),

Gonna put a nerd in it (gonna put a nerd in it),

Gonna take a firecracker (gonna take a firecracker)…

Worker 1

Worker 2

Worker 3

Page 6: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

CONTRIBUTION

6

1. STALE SYNCHRONOUS PARALLEL ITERATIONS

Tackling the straggler problem within Flink

2. DISTRIBUTED FRANK-WOLFE ALGORITHM

Applied on LASSO regression, as use case

Page 7: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

PART 1: STALE SYNCHRONOUS PARALLEL ITERATIONS ON FLINK

Page 8: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

There are stragglers in distributed processing frameworks …

→ Hardware heterogeneity→ Skewed data distribution→ Garbage collection

8

THE STRAGGLER PROBLEM

Iteration time

Not predictableCostly to reschedule !

… especially in the context of data center operating systems:

Page 9: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Distribution of iterative-convergent algorithms:

9

BULK VS STALE SYNCHRONOUS

STALE STALE

Classic

Explicit synchronization

barrier

Page 10: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

10

PARAMETER SERVER

STALE STALE

Explicit synchronization

barrier

How to keep workers up-to-date?

x

x

x

Parameter server

Page 11: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

1. SSP iteration control model

2. Parameter server

11

INTEGRATION WITH FLINK

What does Flink need to enable SSP?

Page 12: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

if clocki <= cluster-wide clock + staleness

do iteration

++clocki , then send to clock

i synchronization sink

else wait until clocki <= cluster-wide clock + staleness

12

ITERATION CONTROL MODEL IN FLINK

Worker pi

Clock Synchronization Sink

clocki

cluster-wide clock

store clocki in C

cluster-wide clock = min(C)

broadcast cluster-wide clock if changed

Page 13: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

ITERATION CONTROL MODEL IN FLINK

ClockEvent

13

IterationHead

worker done

worker done

worker done

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

all workers done

all workers done

all workers done

IterationSynchronizationTask

Page 14: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

ITERATION CONTROL MODEL IN FLINK

ClockEvent

14

IterationHead

Clock pi

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

IterationHead

IterationIntermediate

IterationTail

backchannel

ClockSynchronizationTask

cluster-wide clock

Clock pi

Clock pi

cluster-wide clock

cluster-wide clock

Page 15: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

15

ITERATION CONTROL MODEL IN FLINK

SuperstepBarrier

IterationHeadPACTTask

SyncEventHandler

IterationSynchronizationTask SSPIterationHeadPACTTask

ClockHolder

ClockSyncEventHandler

ClockSynchronizationTask

Page 16: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

BULK SYNCHRONOUS PARALLEL

Convergence determined at synchronization barrier

16

CONVERGENCE CHECK

STALE SYNCHRONOUS PARALLEL

Convergence reached when no more worker can improve the solution

Page 17: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

dataSet.Iterate(nIterations)

17

STALE SYNCHRONOUS API

dataSet.IterateWithSSP(nIterations, staleness)

Page 18: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Simple APIRichMapFunctionWithParameterServer extends RichMapFunction {

update(id, clock, parameter)

get(id)

}

18

PARAMETER SERVER

DATA GRID

SHARED MODEL

Worker Worker Worker Worker

Architecture

Page 19: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

PART 2: DISTRIBUTED FRANK-WOLFE ALGORITHM

Page 20: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Solving the current optimization problem:

Distributed version (Bellet et al. 2015):

20

DISTRIBUTED FRANK-WOLFE ALGORITHM

Linear combination of atoms

sparse coefficients

Page 21: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Distributed version (Bellet et al. 2015):

21

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

Linear combination of atoms

sparse coefficients

Page 22: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Distributed version (Bellet et al. 2015):

22

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

Page 23: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Distributed version (Bellet et al. 2015):

23

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

1. Local selection of atoms

Page 24: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Distributed version (Bellet et al. 2015):

24

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

2. Global consensus

Page 25: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Distributed version (Bellet et al. 2015):

25

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

3. α Coefficients update

Page 26: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Stale synchronous version:

26

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

1. Get α coefficients from parameter server

Parameter Server

Page 27: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Stale synchronous version:

27

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

2. Local selection of atoms

Parameter Server

Page 28: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Stale synchronous version:

28

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

3. Compute α coefficients from locally selected atoms

Parameter Server

Page 29: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Stale synchronous version:

29

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

4. Update α coefficients to parameter server

Parameter Server

Page 30: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Stale synchronous version:

30

DISTRIBUTED FRANK-WOLFE ALGORITHM

Ato

m 1

Ato

m 2

Ato

m 3

Ato

m 4

...

Ato

m n

W1 W2 W3

Repeat while within staleness bounds

Parameter Server

Page 31: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

See our full paper for

� full implementation details� properties � application to LASSO REGRESSION� convergence proof

N-L Tran, T Peel, S Skhiri, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, proceedings of IEEE BigData 2015, Santa Clara, November 2015

DISTRIBUTED FRANK-WOLFE ALGORITHM

31

Page 32: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Application on LASSO regression

Random sparse 1.000 x 10.000 matrices

Sparsity ratio = 0,001

Generated load: at any time, 1 random node under 100% load during 12 seconds

32

EXPERIMENTS

5 nodes, 2 Ghz, 3Gb RAM

Page 33: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

33

RESULTS

Convergence of the objective function

Page 34: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Stragglers in a cluster are an issue.

Mitigate them with Stale Synchronous Parallel Iterations.

34

RECAP

Page 35: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

Pull request #967

35

WANNA TRY IT OUT?

Stale Synchronous Parallel iterations + API

Pull request #1101

Frank-Wolfe algorithm + LASSO regression

Page 36: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

THANK YOU!Do you have any questions?

[email protected]

Page 37: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

AGENDA

37

1. STALE SYNCHRONOUS PARALLEL ITERATIONS∙ The straggler problem∙ BSP vs SSP∙ Integration with Flink∙ Iteration control model∙ API

2. DISTRIBUTED FRANK-WOLFE ALGORITHM∙ Problem statement∙ Application: LASSO regression∙ Experiments

Page 38: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

38

RESULTS

Sparsity of the coefficients

Page 39: Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

The parameter server keeps track of the intermediate results

→ Key-object store→ Distributed, with local caching

39

PARAMETER SERVER