Top Banner
Amazon Web Services: The Pla6orm for Data Science Jamie Kinney [email protected]
34

Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Mar 31, 2018

Download

Documents

duongdiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Amazon  Web  Services:    

The  Pla6orm  for  Data  Science  

       

Jamie  Kinney  [email protected]  

Page 2: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

The challenge…

On premise infrastructure leads to static, lowest common denominator hardware…

…and either long lines for use, low utilization (or both!)

Page 3: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

What’s at stake?

Page 4: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Scientific Community Requirements -  Computation on demand

-  a flexible on-demand and cost-effective infrastructure

-  Data Management -  Handling growth of data -  Long term storage -  Data transfer between participants

-  Data Analysis -  A robust infrastructure -  Scalability -  Flexibility

-  Reproducibility of results -  A programmable infrastructure

Page 5: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

AWS Platform Your Applications

Building Block Services

Foundation Services

Compute Amazon EC2 Auto Scale

Storage Amazon S3

Amazon EBS Amazon StorageGateway

Database Amazon RDS

Amazon SimpleDB Amazon ElastiCache Amazon DynamoDB

Networking Amazon VPC

Elastic Load Balancing Amazon Route 53

AWS Direct Connect

Management & Administration

Application Platform Services

Content Distribution

Amazon CloudFront

Messaging Amazon SNS Amazon SQS Amazon SES

Parallel Processing

Elastic MapReduce

Libraries & SDKs Java, PHP, Python,

Ruby, .NET

Identity & Access AWS IAM

Identity Federation Consolidated Billing

Web Interface Management Console

Monitoring Amazon CloudWatch

Deployment & Automation

AWS Elastic Beanstalk AWS CloudFormation

Simple Workflow Service

AWS Global Infrastructure Regions

Availability Zones Edge Locations

Page 6: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

AWS Pace of Innovation

New Service Announcements & Updates

Including:

AWS Oregon Region

Elastic Beanstalk (Beta)

Amazon SES (Beta)

AWS CloudFormation

Amazon RDS for Oracle

AWS Direct Connect

AWS GovCloud (US)

Amazon ElastiCache

VPC Virtual Networking

VPC Dedicated Instances

SMS Text Notification

CloudFront Live Streaming

AWS Tokyo Region

SAP RDS on EC2

SAP BO on EC2

Win Srv 2008 R2 on EC2

Win Srv 2003 VM Import

Amazon S3 SSE

2011 2010 2009 2008

74

61

48

24

Including:

Amazon SNS

Amazon CloudFront

Amazon Route 53

S3 Bucket Policies

RDS Multi-AZ Support

RDS Reserved Databases

AWS Import/Export

AWS IAM Beta

AWS Singapore Region

Cluster Instances for EC2

Micro Instances for EC2

Amazon Linux AMI

Oracle Apps on EC2

SUSE Linux on EC2

VM Import for EC2

Including:

Amazon RDS

Amazon VPC

Amazon EMR

EC2 Auto Scaling

EC2 Reserved Instances

EC2 Elastic Load Balance

AWS Import/Export

AWS Mngmt Console

Win Srv 2008 on EC2

IBM Apps on EC2

Including:

Amazon SimpleDB

Amazon Cloudfront

Amazon EBS

EC2 Availability Zones

EC2 Elastic IP Addresses

Including:

Amazon FPS

Red Hat Enterprise on EC2

2007

9

“AWS is extraordinarily innovative, exceptionally agile and very responsive to the market.”

75

25

50

And wait, there’s more!

Dec 2011 – Feb 2012

ElasicCache in 4 more AWS Regions

CloudFront & Route 53 in 3 new edge locations

S3 announces Multi-object delete

SES now supports SMTP

EMR supports Hadoop 0.20.205 and Pig 0.9.1

New AWS Region in Sao Paulo, Brazil

VPC adds multiple network interfaces

EMR support for cc2.8xlarge

S3 announces Object Expiration

SNS adds support for Delivery Policies

SNS adds support for Message Formatting

Direct Connect adds four new locations

AWS Free Usage Tier for Windows

AWS Dynamo DB

AWS IAM Identity Federation

AWS Storage Gateway

AWS Simple Workflow Service

March 2012

New m1.medium (2 ECU’s and 3.75 GB RAM)

Lowered Reserved EC2 37% and RDS 42%

Page 7: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Global Infrastructure for Global Enterprises US West

(Northern California)

US East (Northern Virginia)

EU (Ireland)

Asia Pacific

(Singapore)

Asia Pacific (Tokyo)

AWS Regions

AWS Edge Locations

GovCloud (US ITAR Region)

US West (Oregon)

South America (Sao Paulo)

Page 8: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

AWS Regions and Availability Zones

Customer Decides Where Applications and Data Reside

Page 9: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Built to Enterprise & Gov’t Security Requirements

Security & Compliance Resources •  Security & Compliance Center: http://

aws.amazon.com/security

•  Security Overview & Best Practices

•  AWS Risk & Compliance Whitepaper

•  Creating HIPAA Compliant Applications

Hardware, Software & Network •  Systematic change management

•  Phased updates deployment

•  Safe storage decommission

•  Automated monitoring and self-audit

•  Advanced network protection systems

Certifications and Accreditations •  ISO 27001 •  SSAE 16 / ISAE 3402 / SOC1 (formerly

U.S. standard SAS-70 Type II) •  FISMA Moderate Controls; ITAR region •  HIPAA applications certified on AWS •  Payment Card Industry (PCI) Data

Security Standard (DSS) Level 1 •  DIACAP Controls

Physical •  Datacenters in nondescript facilities

•  Physical access strictly controlled

•  Must pass two-factor authentication at least twice for floor access

•  Physical access logged and audited

Page 10: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Amazon Simple Storage Service (S3)

"   Distributed, replicated object store

"   99.999999999% durability

"   ~1 trillion objects and > 700,000 requests/second

"   Store anything…pictures, XML docs, encrypted blobs

"   You determine the AWS region, we replicate across

AZs

"   AWS Import/Export Service

"   AWS Storage Gateway

Page 11: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Amazon Elastic Compute Cloud (EC2)

"   Virtual machines running Windows or Linux

"   Full Windows admin or Linux root privileges

"   Instance types ranging from t1.micro to cc2.8xl

"   HPC instances have 10Gb full bisection bandwidth

"   Ephemeral storage, Elastic Block Storage and SSDs

"   We constantly modernize our infrastructure

Page 12: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Amazon EC2 Pricing Models

"   On-Demand

"   Reserved Instances

  Light

  Moderate

  Heavy

"   Spot

"   Dedicated Instances

Page 13: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Compute: Amazon EC2 Instances

Page 14: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

2 * Intel Xeon ES-2670 “Sandy Bridge” Architecture

16 cores w/ HT 60.5 GB RAM

3.4 TB disk HVM

cc2.8xlarge

Page 15: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

2 * 1TB SSD LUNs 16 cores

60.5 GB RAM 35 ECUs

10 Gigabit Ethernet hi1.4xlarge

Page 16: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

#42

Page 17: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Top-5 Pharma Client

Page 18: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Not just about high performance infrastructure

-  Choose the right infrastructure best suited for your applications and pipelines

-  A programmable infrastructure

-  No longer bound by physical limits -  No queued jobs

-  Experiment at scale

Page 19: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

12.7 Teraflops for < $35/hour!

Page 20: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Customer’s Network

Amazon Web Services Cloud

Secure VPN Connection over the Internet

Subnets

Customer’s isolated AWS resources

Amazon VPC Architecture

Router VPN

Gateway

Internet NAT

Page 21: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Database Options

Database Server on Amazon EC2

Your choice of database running on

Amazon EC2

Bring Your Own License (BYOL)

Amazon Relational Database Service (RDS)

Oracle or MySQL offered as a service

Flexible Licensing: BYOL or License Included

Amazon DynamoDB

NoSQL data store

SSD storage

Seamless scalability

with zero administration

Self-Managed Managed Databases

Page 22: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Higher-Level Services

Developer Centers

Your choice of programming language

(Java, PHP, Python, Ruby, .NET) and mobile platform (Android, iOS)

Libraries & SDKs

Amazon Elastic MapReduce

Allows customers to easily

and cost-effectively process vast amounts of data utilizing a Hadoop

framework running Amazon EC2 instances

Parallel Processing

Amazon Simple Queue Service

Reliable and highly scalable message queue for cloud

applications

Amazon Simple Notification Service

Push notifications from the cloud to subscribers or client

applications

Amazon Simple Email Service

Send bulk and transactional emails in a quick and cost-

effective manner

Messaging

Page 23: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

AWS

CloudFormation

Use application templates to create a collection of related

AWS in order to provision and update

them in an orderly and predictable way

Deployment

Amazon

CloudWatch

Monitor AWS resources and track metrics to

gain insight and react immediately to keep applications running

smoothly

AWS Elastic Beanstalk

Provision an Apache Tomcat environment and deploy your Java

applications in minutes

Monitoring Automation

Deployment & Administration Services

Page 24: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Data Collaboration •  Storage Services

•  Amazon S3 •  Amazon EBS •  Amazon DynamoDB

•  Transfer Services •  AWS Import/Export •  AWS Storage Gateway

•  Identity and Access Management •  Federation

•  Encryption features •  Amazon S3 Server Side Encryption •  Client side encryption

•  Key Management (Partners)

Page 25: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Leverage Public Datasets available on AWS

•  A centralized repository of public data sets •  Seamless integration with cloud-based applications •  No charge to the community •  http://aws.amazon.com/publicdatasets/

"   Some data sets of interest: -  1000 Genomes project -  Ensembl

-  Annotated Human Genome Data – for FASTA -  Illumina

-  Jay Flateley Human Genome Data Set -  YRI Trio Dataset -  The Cannabis Sativa Genome -  GenBank -  UniGene -  Influenza Virus

-  (including updated Swine Flu sequences)

Page 26: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

AWS Grant Program

" http://aws.amazon.com/education/

"   Recipients selected based on: •  Uniqueness of work •  Application of Amazon Web Services •  Ability to disseminate work publicly via papers, events

or public relations •  Great way to develop new public data sets and AMIs

Page 27: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Consolidated Billing with IAM

"   Allows you to get one bill for multiple accounts "   You can easily track each account's costs and

download the cost data in CSV format "   You may be able to reduce costs by combining

usage from all the accounts to qualify for volume pricing discounts

Page 28: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security
Page 29: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security
Page 30: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

www.adscfd.com AeroDynamic Solutions, Inc. (877) RICHCFD

Air Force Conducts Large-Scale Aerodynamic Simulation with Amazon EC2

To speed the development of more fuel efficient and durable jet engines, the U.S. Air Force Research Laboratory and AeroDynamic Solutions (ADS) partnered with Amazon Web Services to devise an effective design simulation solution. With Amazon Elastic Compute Cloud (Amazon EC2), ADS proved that large-scale aerodynamic simulations can b dialed up on-demand and performed affordably and within the time constraints of commercial design.

Background

For the world’s leading manufacturers of jet engines, product development remains an extremely costly and time-consuming task. Modern designs have pushed traditional analysis methods to the limit, demanding the use of advanced simulation techniques to better tackle performance and durability issues before committing to hardware. To address these concerns, the Turbine Branch of the United States Air Force Research Laboratory (AFRL) and ADS joined forces to advance one such simulation technique—large-scale time accurate simulation—for the U.S. gas turbine industry. The Turbine Branch is responsible for advancing the technical capability of turbo propulsion systems, and ADS provides Computational Fluid Dynamics (CFD) software and analysis services to the world’s manufacturers of jet engines, industrial gas turbines, and compressors.

Time accurate simulation enables designers to understand how time-varying aerodynamic loads can lead to performance loss and structural fatigue. Though this analytical method has long been available, it has largely remained out of reach for commercial design due to its high computational cost and long turnaround time.

To carry out large-scale time accurate simulation, hundreds of clustered processors may often be required, necessitating enormous upfront hardware, software and support personnel costs. As a result, this type of simulation has largely remained out of reach to all but the largest of gas turbine manufacturers. Another problem is that time accurate simulation can take weeks to run, rendering it impractical for commercial design cycles.

Working under an SBIR Phase II award from the U.S. Air Force, ADS enhanced its large-scale time accurate analysis capabilities to tackle these issues. As part of this effort, ADS turned to Amazon Web Services (AWS). Using Amazon EC2, AeroDynamic Solutions gained the capabilities of a large commercial cluster on demand and at a fraction of the cost.

Approach

To demonstrate the capabilities of the ADS/AWS solution, the U.S. Air Force-designed Notre Dame HiLT 1.5 stage turbine was analyzed for unsteady effects. Consisting of 165 passages with 60/70/35 airfoil counts per row, the HiLT turbine is a highly

loaded, transonic, low pressure turbine representative of today's modern designs. For the analysis, one-fifth (1/5) of the full wheel (12/14/7) was simulated for a complete revolution. After generating the mesh and completing initial 3-D multi-stage analysis, the EC2-enabled ADS solution performed as follows:

• The mesh (10.6 million elements) was partitioned into 40 blocks for parallel execution on the ADS solver Code Leo.

• 40 processors were dynamically provisioned on Amazon EC2 utilizing five cc1 cluster compute instances.

• Code Leo was invoked across the 40-processor cluster, simulating 10,500 time steps with 20 inner iterations per time step.

• Results were gathered and delivered to local servers for post-processing and analysis.

• EC2 instances were deleted upon completion.

Security issues were addressed as well: SAS 70 Type II certification and VPN-level access were required; uploaded and downloaded data was encrypted; dedicated cc1 instances were provisioned to ensure that data mingling did not occur; and data was purged upon completion of the case.

Results

The results of this case were impressive. Using Amazon EC2 the large-scale, time accurate simulation was turned around in just 72 hours with computing infrastructure costs well below $1,000. Additionally, time accurate analysis revealed critical insights that were not detected using traditional analysis techniques—most notably a 2% drop in efficiency relative to conventional 3-D multi-stage steady predictions.

Dr. John Clark, Turbine Branch, Turbine Engine Division, Propulsion Directorate of the Air Force Research Laboratory, explains the importance of this case: “Advancing turbine durability and performance remains critical for the U.S. gas turbine industry. The combination of high fidelity time accurate analysis from ADS and on-demand CFD analysis resources from Amazon makes it possible for turbine manufacturers to tackle these issues during design—quickly and without the need for large hardware investment.”

George Fan, CEO of AeroDynamic Solutions, is equally thrilled with the results: “Traditional 3D steady analysis techniques are no longer sufficient to support the design of today’s advanced jet engines. To improve durability and performance, advanced analysis capabilities such as time accurate simulation must be made to work within the accuracy, time, and cost constraints of a commercial design cycle. We’re delighted to be working closely with the Air Force and AWS to make this a reality for designers large and small.”

"Advancing turbine durability and performance remains critical for the U.S. gas turbine industry. The

combination of high fidelity time accurate analysis from ADS and on-demand CFD analysis resources

from Amazon makes it possible for turbine manufacturers to tackle these issues during design—quickly and without the need for large hardware investment."

Dr. John Clark Air Force Research Laboratory

http://aws.amazon.com/solutions/case-studies/aerodynamic-solutions/

Page 31: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Example: Galaxy -  An open, web-based platform – http://usegalaxy.org -  Perform, reproduce and share complete analyses -  Automatically tracks and manages data provenance and provides

support for capturing the context and intent of computational methods

Page 32: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Example: Ion Flux

-  Services to analyze DNA sequence data for researchers and health professionals in genomic medicine

http://www.ionflux.com/

Page 33: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

For More Information…

" http://www.bigdatahpc.com

" http://aws.amazon.com/ec2/spot-and-science/

" http://aws.amazon.com/hpc-applications/

" http://aws.amazon.com/ec2/instance-types/

" http://aws.typepad.com

Page 34: Amazon’Web’ Services: ThePlaormfor DataSciencejkinney.s3.amazonaws.com/AWS_for_Scientists_Overview_ANL.pdf · Amazon VPC Amazon EMR EC2 Auto Scaling ... aws.amazon.com/security

Thank you!

Jamie Kinney [email protected] Twitter:@jamiekinney

http://linkedin.com/pub/jamie-kinney/0/b33/668