Top Banner
MBrace: Cloud Computing with Monads Jan Dzik Nick Palladinos Kostas Rontogiannis Eirik Tsarpalis Nikolaos Vathis Nessos Information Technologies, SA 7th Workshop on Programming Languages and Operating Systems November 3, 2013 Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 1 / 29
29

Mbrace plos-slides final

May 10, 2015

Download

Technology

“MBrace: Cloud Computing with Monads”, has been accepted for presentation at the Programming Languages and Operating Systems workshop, co-located with the SOSP 2013 conference.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mbrace plos-slides final

MBrace: Cloud Computing with Monads

Jan Dzik Nick Palladinos Kostas RontogiannisEirik Tsarpalis Nikolaos Vathis

Nessos Information Technologies, SA

7th Workshop on Programming Languagesand Operating Systems

November 3, 2013

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 1 / 29

Page 2: Mbrace plos-slides final

Introduction Motivation

Motivation

Distributed Computation is Challenging.

Key to success: choose the right distribution framework.

Each framework tied to particular programming abstraction.

Map-Reduce, Actor model, Dataflow model, etc.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 2 / 29

Page 3: Mbrace plos-slides final

Introduction Motivation

Established distributed frameworks

Restrict to specific distribution patterns.

Not expressive enough for certain classes of algorithms.

Difficult to influence task granularity.

Time consuming to deploy, manage and debug.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 3 / 29

Page 4: Mbrace plos-slides final

Introduction What is MBrace?

What is MBrace?

1 A new programming model for the cloud.

2 An elastic, fault tolerant, multitasking cluster infrastructure.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 4 / 29

Page 5: Mbrace plos-slides final

Introduction In This Talk

In This Talk

Concentrate on the programming model.

Distributed Computation.Distributed Data.

Benchmarks.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 5 / 29

Page 6: Mbrace plos-slides final

The MBrace Programming Model The Cloud Monad

The MBrace Programming Model

A monad for composing distribution workflows.

Essentially a continuation monad that admits distribution.

Based on F# computation expressions.

Inspired by the successful F# asynchronous workflows.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 6 / 29

Page 7: Mbrace plos-slides final

The MBrace Programming Model The Cloud Monad

A Basic cloud workflow

let download (url : string) = cloud {

let client = new System.Net.WebClient()

let content = client.DownloadString(url)

return content

} : Cloud<string>

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 7 / 29

Page 8: Mbrace plos-slides final

The MBrace Programming Model The Cloud Monad

Composing cloud workflows

let downloadSequential () = cloud {

let! c1 = download "http://m-brace.net/"

let! c2 = download "http://nessos.gr/"

let c = c1 + c2

return c

}

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 8 / 29

Page 9: Mbrace plos-slides final

The MBrace Programming Model Distribution Combinators

Parallel Composition

let downloadParallel () = cloud {

let! c1,c2 =

download "http://m-brace.net/"

<||>

download "http://nessos.gr/"

return c1 + c2

}

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 9 / 29

Page 10: Mbrace plos-slides final

The MBrace Programming Model Distribution Combinators

Distribution Primitives: an overview

Binary parallel operator:

<||> : Cloud<'T> -> Cloud<'U> -> Cloud<'T * 'U>

Variadic parallel combinator:

Cloud.Parallel : Cloud<'T> [] -> Cloud<'T []>

Non-deterministic parallel combinator:

Cloud.Choice : Cloud<'T option> [] -> Cloud<'T option>

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 10 / 29

Page 11: Mbrace plos-slides final

The MBrace Programming Model Additional Constructs

Cloud Monad: additional constructs

Monadic for loops.

Monadic while loops.

Monadic exception handling.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 11 / 29

Page 12: Mbrace plos-slides final

The MBrace Programming Model Additional Constructs

Example: Inverse squares

let inverseSquares (inputs : int []) = cloud {

let jobs : Cloud<float> [] =

[|

for i in inputs ->

cloud { return 1.0 / float (i * i) }

|]

try

let! results = Cloud.Parallel jobs

return Array.sum results

with :? DivideByZeroException ->

return -1.0

}

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 12 / 29

Page 13: Mbrace plos-slides final

The MBrace Programming Model Evaluation in the Cloud

How is it all executed?

Scheduler/worker cluster organization.

Symbolic execution stack (free monad/trampolines).

Scheduler interprets “monadic skeleton”.

Native “leaf expressions” dispatched to workers.

Symbolic stack winds across multiple machines.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 13 / 29

Page 14: Mbrace plos-slides final

The MBrace Programming Model Map-Reduce

A Map-Reduce implementation

let rec mapReduce (map : 'T -> Cloud<'R>)(reduce : 'R -> 'R -> Cloud<'R>)(identity : 'R)(input : 'T list) =

cloud {

match input with

| [] -> return identity

| [value] -> return! map value

| _ ->

let left, right = List.split input

let! r1, r2 =

(mapReduce map reduce identity left)

<||>

(mapReduce map reduce identity right)

return! reduce r1 r2

}

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 14 / 29

Page 15: Mbrace plos-slides final

The Distributed Data Programming Model Introduction

What about Data Distribution?

MBrace does NOT include a storage service (for now).

Relies on third-party storage services.

Storage Provider plugin architecture.

Out-of-the-box support for FileSystem, SQL and Azure.

Future support for HDFS and Amazon S3.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 15 / 29

Page 16: Mbrace plos-slides final

The Distributed Data Programming Model The MBrace Data Programming Model

The MBrace Data Programming Model

Storage services interfaced through data primitives.

Data primitives act as references to distributed resources.

Initialized or updated through the monad.

Come in immutable or mutable flavors.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 16 / 29

Page 17: Mbrace plos-slides final

The Distributed Data Programming Model Cloud Ref

Cloud Ref

Simplest distributed data primitive of MBrace.

Generic reference to a stored value.

Conceptually similar to ML ref cells.

Immutable by design.

Cached in worker nodes for performance.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 17 / 29

Page 18: Mbrace plos-slides final

The Distributed Data Programming Model Cloud Ref

Cloud Ref: Example

let createRef (inputs : int []) = cloud {

let! ref = CloudRef.New inputs

return ref : CloudRef<int []>

}

let deRef (ref : CloudRef<int []>) = cloud {

let content = ref.Value

return content : int []

}

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 18 / 29

Page 19: Mbrace plos-slides final

The Distributed Data Programming Model Cloud Ref

Application: Data Sharding

type DistribTree<'T> =

| Leaf of 'T| Branch of CloudRef<DistribTree<'T>> *

CloudRef<DistribTree<'T>>

let rec map (f : 'T -> 'S) (tree : DistribTree<'T>) =

cloud {

match tree with

| Leaf t -> return! CloudRef.New (Leaf (f t))

| Branch(l,r) ->

let! l', r' = map f l.Value <||> map f r.Value

return! CloudRef.New (Branch(l',r'))}

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 19 / 29

Page 20: Mbrace plos-slides final

The Distributed Data Programming Model Cloud File

Cloud File

References files in the distributed store.

Untyped, immutable, binary blobs.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 20 / 29

Page 21: Mbrace plos-slides final

The Distributed Data Programming Model Cloud File

Cloud File : Example

let getSize (file : CloudFile) = cloud {

let! bytes = CloudFile.ReadAllBytes file

return bytes.Length / 1024

}

cloud {

let! files = CloudDir.GetFiles "/path/to/files"

let jobs = Array.map getSize files

let! sizes = Cloud.Parallel jobs

return Array.sum sizes

}

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 21 / 29

Page 22: Mbrace plos-slides final

The MBrace Framework Performance

Performance

We tested MBrace against Hadoop.

Both frameworks were run on Windows Azure.

Clusters consisted of 4, 8, 16 and 32 quad-core nodes.

Two algorithms were tested, grep and k-means.

Source code available on github.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 22 / 29

Page 23: Mbrace plos-slides final

The MBrace Framework Performance

Distributed Grep (Windows Azure)

Count occurrences of given pattern from input files.

Straightforward Map-Reduce algorithm.

Input data was 32, 64, 128 and 256 GB of text.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 23 / 29

Page 24: Mbrace plos-slides final

The MBrace Framework Performance

Distributed Grep (Windows Azure)

20 40 60 80 100 1200

100

200

300

400

worker cores

Tim

e(s

ec)

MBraceHadoop

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 24 / 29

Page 25: Mbrace plos-slides final

The MBrace Framework Performance

k-means Clustering (Windows Azure)

Centroid computation out of a set of vectors.

Iterative algorithm.

Not naturally definable with Map-Reduce workflows.

Hadoop implementation from Apache Mahout library.

Input was 106, randomly generated, 100-dimensional points.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 25 / 29

Page 26: Mbrace plos-slides final

The MBrace Framework Performance

k-means Clustering (Windows Azure)

20 40 60 80 100 1200

500

1,000

1,500

worker cores

Tim

e(s

ec)

MBraceHadoop

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 26 / 29

Page 27: Mbrace plos-slides final

Conclusions & Future Work

Conclusions

A big data platform for the .NET framework.

Language-integrated cloud workflows.

User-specifiable parallelism patterns and task granularity.

Distributed exception handling.

Pluggable storage services.

Data API integrated with programming model.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 27 / 29

Page 28: Mbrace plos-slides final

Conclusions & Future Work

Future Work

Improved C# support.

A rich library of combinators and parallelism patterns.

A LINQ provider for data parallelism.

Support for the Mono framework and Linux.

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 28 / 29

Page 29: Mbrace plos-slides final

Conclusions & Future Work

Thank You!

Questions?

http://m-brace.net

Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS ’13 29 / 29