Top Banner

Click here to load reader

Ph.D. Defense

Nov 01, 2014



These slides are from my Ph.D. defense at the University of California, Santa Barbara, discussing how we contribute research tools to forward how science is performed with cloud systems.

  • 1. Automated Conguration andDeployment of Applications inHeterogeneous Cloud Environments Chris BunchPh.D. DefenseNovember 30, 2012

2. Public CloudComputing Utility-oriented approach to computing Pay for only resources that you use Rent resources from large datacentersmaintained by Amazon, Microsoft, Google Dont maintain a rack in your ofce - justuse somebody elses rack 3. Using the Cloud for Apps Cloud services have seen uptake in: Web services domain High performance computing General-purpose applications 4. Challenges in Cloud Adoption Primary barriers to entry: Wide array of services Varying cost models Many technologies providing APIs 5. Plethora of Services Storage Services Queue Services Compute Services Fully Managed Software Stacks Web services only MapReduce only 6. Varying Cost Models Unlimited usage per-hour (EC2) Unlimited usage per-wall-clock-hour(Azure) First 15-minutes, then charge per-minute(App Engine) Meter per API call (SQS, App Engine) 7. Accessing Services viaAPIs Need an API to connect your application tothe cloud service First-party native libraries, per-language Typically only for popular languages Cross-language serialization services Convert from your language to popularlanguage 8. Thesis Question How can we enable applications to beexecuted on cloud systems, byautomatically conguring and deployingapplications across cloud offerings that varybased on the type of service offered, costmodel employed, and APIs via whichservices are exposed? 9. Our Solution Provide research tools to executecomputationally intensive applications Automatically congure and deployapplications for use with cloud services Programming language support, tofacilitate expressive workows 10. Design Space Language / PlatformDomainSupportAppScale (IEEE Web Services CLOUD10)NeptuneHigh Performance (ScienceCloud10, Computing DataCloud12) MEDEA (IPDPS13)*,General Purpose Exodus (CCGrid13)* 11. Design Space Language / PlatformDomainSupportAppScale (IEEE Web Services CLOUD10)NeptuneHigh Performance (ScienceCloud10, Computing DataCloud12) MEDEA (IPDPS13)*,General Purpose Exodus (CCGrid13)* 12. Cloud Computing Three tiers of abstraction: Infrastructure: Scalable hardware Platform: Scalable software stack Software: Scalable applications 13. PaaS for Science Need a cloud that is extensible to: Services from competing cloud vendors Differing cost models from each cloud Varying APIs offered by cloud vendors And it must be open source! 14. Introducing AppScale An open source implementation of theGoogle App Engine APIs Deploys over Amazon EC2 or Eucalyptus Congures and deploysautomatically User only needs to specify the number ofnodes to run over 15. One-ButtonDeployment 16. Limitations Recipes are statically dened Limited to three-tier web applications Runtime environment is restricted toenable autoscaling Not cost-aware 17. Design Space Language / PlatformDomainSupportAppScale (IEEE Web Services CLOUD10)NeptuneHigh Performance (ScienceCloud10, Computing DataCloud12) MEDEA (IPDPS13)*,General Purpose Exodus (CCGrid13)* 18. HPC in the Cloud Easy access to vast resources Hard to automatically congure and deploylibraries Requires in-depth knowledge of eachtechnology required Hard to get performance on opaque cloud Wide range of APIs for similar services 19. Introducing Neptune A domain specic language for runningHPC applications Supports MPI, UPC, X10 programs Congures and deploysautomatically Scientists need only specify the number ofnodes to execute over 20. One-ButtonDeployment 21. Language-BasedDeploymentneptune :type => :mpi, :code => /home/user/cpi, :nodes_to_use => 32, :output => /output/cpi-32 22. Automated Application Execution Calls to neptune() are translated intoSOAP messages, dispatched to AppScale AppScale pulls in library support thatdetails how to run each type of job Acquires nodes, runs job, saves output Cost awareness for VMs 23. Limitations Recipes for each framework are static Must be pre-dened by an expert user Software must be pre-installed on VMs Metadata not easily accessible Limited by underlying hardware 24. Design Space Language / PlatformDomainSupportAppScale (IEEE Web Services CLOUD10)NeptuneHigh Performance (ScienceCloud10, Computing DataCloud12) MEDEA (IPDPS13)*,General Purpose Exodus (CCGrid13)* 25. Problem Domain Easy access to vast resources Hard to automatically congure and deploy Hard to evaluate services b/c of: The abstractions they expose The cost model they charge with Varying APIs for each language 26. Introducing MEDEA Extends Neptune to provide an executionmodel for applications Abstract away compute, storage, queueservices via a common interface Automatically manage cost for the user Automatically connect competing APIs 27. High-Level Design Scripting language support Maximizes exibility and interoperabilitywith other code Deployment engine (PaaS layer) Automatically congure and deployapplications over cloud services 28. System Design 29. Scripting Language Support Extends the Neptune DSL Adds a function call, medea() Users specify code, inputs, services to use (M)essages the Deployment Service withthis data, called a task 30. n-body in AWSresult = medea( :executable => python, :code => /home/user/, :compute => ec2, :storage => s3, :queue => sqs)puts results.stdout 31. n-body in Azureresult = medea( :executable => python, :code => /home/user/, :compute => azure-compute, :storage => azure-storage, :queue => azure-queue)puts results.stdout 32. Deployment Engine Extends the AppScale PaaS Consists of two new services: Task Manager: Manages workers, tasks Task Worker: Executes tasks Receives task (M)essages from ScriptingLanguage Support, code, inputs to execute 33. (E, D) Queue services Task Manager (E)nqueues the task Task Workers (D)equeues tasks Both use a QueueInterface, requiring: FIFO queue: Push / Pop / Queue Length Supports Amazon, Google, Microsoft,VMWare Queues 34. (E) Compute services Task Manager spawns Task Workers in thespecied clouds Task Workers (E)xecute tasks Follows ComputeInterface, requiring: dispatch_task / get_task_status /get_task_results Supports EC2, Eucalyptus, App Engine, Azure 35. (A) Storage services Task Workers store the following outputs: Standard output of job Standard error of job Metadata Users script (A)ccesses result of job Supports S3, App Engine, Azure, and AppScaledatastores (HBase, Cassandra, etc.) 36. Use Cases Execute scientic apps and share theresults Execute quickly (but expensively) Execute inexpensively (but slowly) Community cloud for benchmarkingprogramming language performance 37. Scientic Use Cases Computational systems biology application Simulates conditions found in yeast Written in Python, Java Deploy to EC2, App Engine, Azure All values are the average of ve runs 38. Scientic App Execution 39. Scientic App Execution 40. Polyglot Science Implementations of the n-body applicationin eleven programming languages Execute with Amazon EC2, SQS, and S3 Measure time taken to execute, cost All values are the average of ten runs 41. n-body benchmark 42. n-body in Amazon EC2Language Per-Second Cost C $0.0069Java $0.0075 Python$0.5876Ruby $2.1944Scala$0.0075 43. n-body across clouds CloudCost To ExecuteAmazon EC2 $0.32App Engine (Java) $0.0013 App Engine $0.0049(Python) Microsoft Azure $0.02 44. Related Work Pegasus / Swift (WORKS 11) YCSB (SOCC 10),YCSB++ (SOCC 11) Elastisizer (SOCC 11) Condor / StratUm (BIOINFORMATICS 12) AME (WORKS 11) Google App Engine Pipeline API 45. Review MEDEA automatically congures anddeploys applications, over multiple clouds Abstracts away cloud compute, storage,and queue services from the user Extensible to support other clouds Programming language support toenable Turing-complete workows 46. Limitations Does not intelligently schedule Many different hardware proles offered bycompute services Hard to use them effectively b/c of: Opaque pricing models Lack of Cost APIs 47. Introducing Exodus An Application Programming Interface (API) Determines how to optimally executetasks, when optimal means: Minimizing cost Minimizing total execution time User-dened functions 48. System Design 49. API Support Adds a Neptune function call, exodus() Users specify :optimize_for: Cost, performance, or a users Function Proles code locally or remotely Estimates time and cost to use each instancetype at each number of machines Constructs and executes tasks via MEDEA 50. exampleresults = exodus(:clouds_to_use => [:AmazonEC2],:code => /home/user/,:num_tasks => 1000,:optimize_for => :cost) 51. exampleresults = exodus:clouds_to_use => [:AmazonEC2],:code => /home/user/,:num_tasks => 1000,:optimize_for => oat func(t,c) {...} 52. Cost-Aware Science Same app as evaluated with MEDEA Computational systems biology application Written in C Try to optimize cost, performance, or aweighted average of the two All values are the average of ve runs 53. Time v. Cost 54. Related Work RO-BURST (CCGrid 2012) Cannot schedule a priori Bicer, Chiu, and Agrawal (CCGrid 2012) Cost-aware middleware for MapReduce Java apps only, can budget based on timeor cost 55. Review Exodus automatically optimizesapplication deployment over multipleclouds Extensible to support evolving use cases Programming language support toenable Turing-complete problemdescriptions 56. Contributions AppScale cloud platform Neptune programming language MEDEA extensions to Neptune Exodus extensions to MEDEA In combination 57. Impact Publications in peer-reviewed conferences Best Paper award for Neptune at HPDCsScienceCloud All work done released as open source >10,000 downloads of AppScale / Neptune 58. Future Work Autoscaling in conjunction with IaaS Adaptive proling for app execution Cost-aware fault tolerance Budgeting and deadlines for entire Exodusprograms, across invocations to e