Programming the cloud with Skywriting

Post on 12-Jul-2015

1799 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

Transcript

Programming the cloudwith Skywriting"

Derek Murray"

Outline"•  Existing parallel programming models"

•  Introduction to Skywriting"

•  Ongoing work"

SMP programming"•  Symmetric multiprocessing"– All cores share the same address space"– Usually assumes cache-coherency"

•  Multi-threaded programs"– Threads created dynamically"– Shared, writable data structures"– Synchronisation"

•  Atomic compare-and-swap"•  Mutexes, Semaphores"•  {Software, Hardware} Transactional Memory"

Distributed programming"•  Shared nothing*"– Processors communicate by message passing"– Standard assumption for large supercomputers"–  ...and data centres"–  ...and recent manycore machines (e.g. Intel SCC)"

•  Explicit messages"– MPI, Pregel, Actor-based"

•  Implicit messages: task parallelism"– MapReduce, Dryad"

Task parallelism"•  Master-worker architecture"– Master maintains task queue"– Workers execute independent tasks in parallel"

•  Fault tolerance"– Re-execute tasks on failed workers"– Speculatively re-execute “slow” tasks"

•  Load balancing"– Workers consume tasks at their own rate"– Task granularity must be optimised"

Task graph"

A" B"

A runs before B"

B depends on A"

MapReduce"•  Two kinds of task: map and reduce"•  User-specified map and reduce functions"– map() a record to a list of key-value pairs"– reduce() a key and the list of related values"

•  Tasks apply functions to data partitions"M" R"

R"

R"

M"

M"

Dryad"•  Task graph is first class"– Vertices run arbitrary code"– Multiple inputs and inputs"– Channels specify data transport"

•  Graph must be acyclic and finite"– Permits topological sorting"– Prevents unbounded iteration"

Architecture"

Driver program"

Code"

Results"

while  (!converged)          do  work  in  parallel;  

Existing systems"

Driver program"

Code"

Results"

Code"

Results"

Code"

Results"

Code"

Results"

Driver program"

while  (…)    submitJob();  

Existing systems"

Driver program"

Code"

Results"

Code"Driver

program"

while  (…)    submitJob();  

Skywriting"

Code"

Results"

while  (…)    doStuff();  

•  Turing-complete language for job specification"

•  Whole job executed on the cluster"

Spawning a Skywriting task"

function  f(arg1,  arg2)  {  …  }  

result  =  spawn(f,  [arg1,  arg2]);  

//  result  is  a  “future”  value_of_result  =  *result;  

Building a task graph"function  f(x,  y)  {  …  }  function  g(x,  y)  {  …  }  function  h(x,  y)  {  …  }  

a  =  spawn(f,  [7,  8]);  b  =  spawn(g,  [a,  0]);  c  =  spawn(g,  [a,  1]);  d  =  spawn(h,  [b,  c]);  return  d;  

a   a  

c  b  

d  

f  

g   g  

h  

Iterative algorithm"current  =  …;  do  {          prev  =  current;          a  =  spawn(f,  [prev,  0]);          b  =  spawn(f,  [prev,  1]);          c  =  spawn(f,  [prev,  2]);          current  =  spawn(g,  [a,  b,  c]);          done  =  spawn(h,  [current]);  while  (!*done);    

Iterative algorithm"

f   f   f  

g   h  

f   f   f  

Aside: recursive algorithm"function  f(x)  {          if  (/*  x  is  small  enough  */)  {                  return  /*  do  something  with  x  */;          }  else  {                  x_lo  =  /*  bottom  half  of  x  */;                  x_hi  =  /*  top  half  of  x  */;                  return  [spawn(f,  [x_lo]),                                  spawn(f,  [x_hi])];          }  }  

Performance case studies"•  All experiments used Amazon EC2"– m1.small instances, running Ubuntu 8.10"

•  Microbenchmark"

•  Smith-Waterman"

Job creation overhead"

0"

10"

20"

30"

40"

50"

60"

0" 20" 40" 60" 80" 100"

Ove

rhea

d (s

econ

ds)"

Number of workers"

Hadoop"Skywriting"

Smith-Waterman data flow"

Parallel Smith-Waterman"

Parallel Smith-Waterman"

0"50"

100"150"200"250"300"350"

1" 10" 100" 1000" 10000"

Tim

e (s

econ

ds)"

Number of tasks"

Future work: manycore"•  Trade-offs are different"– Centralised master may become a bottleneck"– Switch to local work-queues and work-stealing"– Distributed scoreboards for futures"– Optimised interpreter/compilation?"

•  Multi-scale hybrid approaches"– Multiple cores"– Multiple machines"– Multiple clouds..."

Future work: message-passing"

•  Language designed for batch processing"– However, batches may be infinitely long!"

•  Can we apply it to streaming data?"– Log files"– Financial reports"– Sensor data"

•  Can we include data dependencies?"– High- and low-frequency streams"

Conclusions"•  Turing-complete programming language

for cloud computing"

•  Runs real jobs with low overhead"

•  Lots more still to do!"

Questions?"•  Email"

–  Derek.Murray@cl.cam.ac.uk"•  Project website"

–  http://www.cl.cam.ac.uk/netos/skywriting/  

top related