Researchers at Clemson University assigned a student summer intern to explore bioinformatics cloud solutions that leverage MPI, the OrangeFS parallel file system, AWS CloudFormation templates, and a Cluster Scheduler. The result was an AWS cluster that runs bioinformatics code optimized using MPI-IO. We give an overview of the process and show how easy it is to create clusters in AWS.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
An MPI-IO Cloud Cluster Bioinformatics Summer Project
Brandon Posey, Dougal Ballantyne, Boyd Wilson
November 13, 2013
Filesystems on AWS
What filesystems *MUST* you use on AWS?
The one that means the needs of your unique application needs!
Some things to consider: • Total amount of storage required? • Resilience required? • Expected number of clients? • Locality of servers and clients? • Average file sizes? (KB, MB, GB, TB) • Block sizes used by applications? • IO profile? Read/Write%? • Typical IO use case?
Filesystems on AWS are all about building blocks!
Building Blocks • Amazon Elastic Compute Cloud (Amazon EC2)
– 1ECU to 88ECU of compute power – 613MB to 240GB of memory – Shared network, EBS optimized, dedicated 10Gb
– 150GB to 3360GB per instance – HDD and SSD – FREE! (part of instance cost)
• Amazon Elastic Block Store (Amazon EBS) – 1G to 1000GB per volume – Standard and Provisioned IOPS – Multiple volumes per instance – Supports snapshot to Amazon S3
Amazon EBS
Ephemeral Disk
Storage-optimized EC2 instances http://aws.amazon.com/ec2/instance-types/ "This family includes the HI1 and HS1 instance types, and provides you with Intel Xeon processors and direct-attached storage options optimized for applications with specific disk I/O and storage capacity requirements." • HI1 instances features SSD storage • HS1 instances feature direct attach HDD
Amazon EBS optimized instances http://aws.amazon.com/ebs/ "To enable your Amazon EC2 instances to fully utilize the IOPS provisioned on an EBS volume, you can launch selected Amazon EC2 instance types as “EBS-Optimized” instances."
Temporary Storage • Local ephemeral for scratch • Distributed filesystem for high-performance
scratch – OrangeFS – Lustre – Ceph
• Pull data from Amazon S3
How much? • With Amazon S3, you pay for what you use • With Amazon EBS, you pay for what you
provision • Keeping data in Amazon S3 and only pulling
what is needed helps mange cost
How fast? • Ephemeral storage can deliver up to 2.2GB/sec
– more instances == more throughput
• Amazon EBS volumes support up to 4000 IOPS – more volumes == more IOPS
• Amazon S3 scales horizontally – more client == more throughput – more connections == more throughput
Making filesystems persist • Use Amazon EBS for block storage • Use Amazon EBS snapshots for recovery • Use a replicated distributed filesystem
Automating deployments • AWS CloudFormation • Drive storage through parameters • Easy to set up and tear down • Track template changes in SCM
Solutions on AWS • OrangeFS from Omnibond
• Red Hat Storage 2.0
• Intel Cloud Edition Lustre - Private Beta
Customer presentation
RNA-Seq Differential Gene Expression Workflow
Clemson University Professor, Dr. Alex Feltus had been discussing with Eddie Duffy and Dr. Barr Von Oehsen, about optimizing the Gene Expression Workflow. As a result, a summer project with Brandon Posey was started to work with this optimization in the AWS cloud. The longest processing steps were the FastQ steps and is where the optimization started.
*Workflow chart provided with permission from Allele Systems (www.allelesystems.com)
OrangeFS – Scalable Parallel File System on AWS
Available on the AWS Marketplace and brought to you by Omnibond
OrangeFS Instance
Unified High Performance File System
Amazon DynamoDB
Amazon EBS
volumes
Cloud Cluster Built using AWS, Torque/Maui, OrangeFS
OrangeFS WebDAV
Torque / Maui
Optimization Areas • Data uploaded and
retrieved via OrangeFS WebDav Interface
• MPI Jobs are submitted via Torque & Maui Scheduler