Running Scientific Workflow Running Scientific Workflow Applications on the Amazon EC2 Applications on the Amazon EC2 Cloud Cloud Bruce Berriman Bruce Berriman NASA Exoplanet Science Institute, IPAC NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan Vahi, Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Gaurang Mehta Information Sciences Institute, USC Information Sciences Institute, USC Benjamin Berman Benjamin Berman USC Epigenome Center USC Epigenome Center Phil Maechling Phil Maechling So Cal Earthquake Center So Cal Earthquake Center
19
Embed
Running Scientific Workflow Applications on the Amazon EC2 Cloud
Running Scientific Workflow Applications on the Amazon EC2 Cloud. Bruce Berriman NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta Information Sciences Institute, USC Benjamin Berman USC Epigenome Center Phil Maechling So Cal Earthquake Center. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Running Scientific Workflow Running Scientific Workflow Applications on the Amazon EC2 Applications on the Amazon EC2
CloudCloud
Bruce Berriman Bruce Berriman NASA Exoplanet Science Institute, IPACNASA Exoplanet Science Institute, IPAC
Information Sciences Institute, USCInformation Sciences Institute, USCBenjamin BermanBenjamin Berman
USC Epigenome CenterUSC Epigenome Center Phil Maechling Phil Maechling
So Cal Earthquake CenterSo Cal Earthquake Center
Clouds (Utility Computing)
Pay for what you use rather than purchase compute and storage resources that end up underutilized Analogous to household utilities
Originated in the business domain to provide services for small companies who did not want to maintain an IT Department
Provided by data centers that are built on compute and storage virtualization technologies.
Clouds built with commodity hardware. They are a “new purchasing paradigm” rather than a new technology.
Benefits and Concerns
Benefits
Pay only for what you need
Elasticity - increase or decrease capacity within minutes
Ease strain on local physical plant
Control local system administration costs
Concerns
What if they become oversubscribed and user cannot increase capacity on demand?
How will the cost structure change with time?
If we become dependent on them, will we be at the cloud providers’ mercy?
Are clouds secure?
Are they up to the demands of science applications?
Cloud Providers Pricing Structures vary widely
Amazon EC2 charges for hourly usage
Skytap charges per month
IBM requires an annual subscription
Savvis offers servers for purchase
Uses Running business applications
Web hosting
Provide additional capacity for heavy loads
Application testing
ProviderAmazon.com EC2
AT&T Synaptic Hosting
GNi Dedicated Hosting
IBM Computing on Demand
Rackspace Cloud Servers
Savvis Open Cloud
ServePath GoGrid
Skytap Virtual Lab
3Tera
Unisys Secure
Verizon Computing
Zimory Gateway
Source Information Week, 9/4/09
Purposes of Our Study
How useful is cloud computing for scientific workflow applications?
An experimental study of the performance of three workflows with different I/O, memory and CPU requirements on a commercial cloud
A comparison of the performance of cloud resources and typical HPC resources, and
An analysis of the various costs associated with running workflows on a commercial cloud.
Clouds are well suited to processing of workflows Workflows are loosely-couple applications composed of tasks connected by data Allocate resources as needed for processing tasks and decrease scheduling overheads
Establish equivalent software environments on the two platforms “Submit” host used to send jobs to EC2 or Abe.All workflows used the Pegasus Workflow Management System with DAGMan and Condor.
Slowest on m1.small, but fastest on those machines with the most cores: m1.xlarge, c1.xlarge and abe.lustre, abe.local.
The parallel file system on abe.lustre offers a big performance advantage for I/O bound systems – cloud providers would need to offer parallel file system and high-speed networks.
Virtualization overhead <10%
Broadband Performance (Memory bound)
Lower I/O requirements – not much difference between abe.lustre and abe.local; both have 8 GB memory. Only slightly worse performance on c1.xlarge, 7.5 GB memory.
Poor performance on c1.medium – only 1.7 GB of memory. Cores may sit idle to prevent system running out of memory.
Virtualization overhead small
Epigenome Performance (CPU Bound)
c1.xlarge, abe.lustre and abe.local give best performance – they are the three most powerful machines (64-bit, 2.3-2.6 GHz)
The parallel file system on abe.lustre offers little benefit.
Virtualization overhead is roughly 10%, largest of three apps - competing for CPU with OS.
Resource Cost
Analysis
You get what you pay for!
The cheapest instances are the least powerful.
Instance Cost $/hr
m1.small 0.10
m1.large 0.40
m1.xlarge 0.80
c1.medium 0.20
c1.xlarge 0.80
c1.medium a good choice for Montage but more powerful processors better for other two.
Data Transfer Costs
Operation Cost $/GB
Transfer In 0.10
Transfer Out 0.17
For Broadband and Epigenome, economical to transfer data out of the cloud For Montage, output larger than input, so the costs to transfer data out are equal to or higher than processing costs for all but one processing instance.
Assume 1,000 2MASS mosaics of 4 deg sq centered on M17 per month for 3 years. Assume c1.medium processor on Amazon EC2
Conclusions Clouds can be used effectively and fairly efficiently for
scientific applications. The virtualization overhead is low.
The high speed network and parallel file systems give HPC clusters a significant performance advantage over cloud computing for I/O bound applications.
On Amazon EC2, primary cost for Montage is data transfer. Processing is primary cost for Broadband, epigenome.
Amazon EC2 offers no dramatic cost benefits over a locally mounted image-mosaic service.
Reference: G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman,
and P. Maechling, "Scientific Workflow Applications on Amazon EC2," in
CloudComputing Workshop in Conjunction with e-Science Oxford, UK: