Top Banner
Accelerating Time to Science: Transforming Research in the Cloud Jamie Kinney @jamiekinney Director of Scientific Computing, a.k.a. “SciCo” – Amazon Web Services
32

Accelerating Time to Science: Transforming Research in the Cloud

Jan 23, 2018

Download

Science

Jamie Kinney
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Accelerating Time to Science: Transforming Research in the Cloud

Accelerating  Time  to  Science:Transforming  Research  in  the  Cloud

Jamie  Kinney  -­ @jamiekinneyDirector  of  Scientific  Computing,  a.k.a.  “SciCo”  – Amazon  Web  Services

Page 2: Accelerating Time to Science: Transforming Research in the Cloud

Why  does  Amazon  care  about  Scientific  Computing?• In  order  to  meaningfully  change  our  world  for  the  better  by  accelerating  the  pace  of  scientific  discovery

• It  is  a  great  application  of  AWS  with  a  broad  customer  base

• The  scientific  community  helps  us  innovate  on  behalf  of  all  customers– Streaming  data  processing  &  analytics– Exabyte  scale  data  management  solutions  and  exaflop scale  compute– Collaborative  research  tools  and  techniques– New  AWS  regions– Significant  advances  in  low-­power  compute,  storage  and  data  centers– Efficiencies  which  will  lower  our  costs  and  therefore  pricing  for  all  customers

Page 3: Accelerating Time to Science: Transforming Research in the Cloud

Why  Did  We  Create  SciCo?

In  order  to  make  it  easy  to  find,  discover  and  use  AWS  for  scientific  computing  at  any  scale.    Specifically,  SciCo  helps  AWS:

• More  effectively  support  global  “Big  Science”  Collaborations• Develop  a  solution-­centric  focus  for  engaging  with  the  global  scientific  and  engineering  communities

• Accelerate  the  development  of  a  scientific  computing  ecosystem  on  AWS• Educate  and  Evangelize  our  role  in  Scientific  Computing

Page 4: Accelerating Time to Science: Transforming Research in the Cloud

Our  Virtuous  Cycle

Source  – Virtuous  Cycle:    Jeff  Bezos,  September,  2001

Page 5: Accelerating Time to Science: Transforming Research in the Cloud

How  is  AWS  Used  for  Scientific  Computing?

• High  Performance  Computing  (HPC)  for  Engineering  and  Simulation• High  Throughput  Computing  (HTC)  for  Data-­Intensive  Analytics• Collaborative  Research  Environments• Monte-­Carlo  Simulations• Data  Visualization• Hybrid  Supercomputing  centers• Citizen  Science• Science-­as-­a-­Service• Internet  of  Things  (IOT)• Serverless Computing

Page 6: Accelerating Time to Science: Transforming Research in the Cloud

Why  do  researchers  love  using  AWS?

Time to ScienceAccess research

infrastructure in minutes

Low CostPay-as-you-go pricing

ElasticEasily add or remove capacity

Globally AccessibleEasily Collaborate with

researchers around the world

SecureA collection of tools to

protect data and privacy

ScalableAccess to effectively

limitless capacity

Page 7: Accelerating Time to Science: Transforming Research in the Cloud

Research  Grants

AWS  provides  free  usage  credits  to  help  researchers:• Teach  advanced  courses• Explore  new  projects• Create  resources  for  the  scientific  community

aws.amazon.com/grants

Page 8: Accelerating Time to Science: Transforming Research in the Cloud

Amazon  Public  Data  Sets

Page 9: Accelerating Time to Science: Transforming Research in the Cloud

AWS  hosts  “gold  standard”  reference  data  at  our  expense  in  order  to  catalyze  rapid  innovation  and  increased  AWS  adoptionA  few  examples:1,000  Genomes  ~250  TB

Cancer  Genomics  Data  Sets  ~5 PB  (TCGA  &  ICGC  )Astronomy  Data  and  APIs  1PB+Common  CrawlOpenStreetMapCensus  Data

Climate,  Meteorology,  &  Earth  Observing  DataIRRI  3000  Rice  Genome  Project

Public  Data  Sets

Page 10: Accelerating Time to Science: Transforming Research in the Cloud

• NEXRAD  level-­II  archives  going  back  to  1991

• Real-­time  data  via  Unidata

• Tools  and  tutorials:

• Using  Python  with  NEXRAD  on  AWS

• THREDDS  Data  Server

• Creating  Static  and  Animated  Maps  with  NEXRAD  on  AWS

• AWS  JavascriptS3  Explorer

• NOAA’s  Weather  &  Climate  Toolkit

• Unidata’s Integrated  Data  Viewer

NEXRAD  on  AWS

https://aws.amazon.com/noaa-big-data/nexrad/

Page 11: Accelerating Time to Science: Transforming Research in the Cloud

Restricted-­access  genomics  on  AWS

aws.amazon.com/genomics

Page 12: Accelerating Time to Science: Transforming Research in the Cloud

• Raw  and  processed  genomic,  transcriptomic,  and  epigonomic data  from  thousands  of  cancer  patients

• Controlled-­access  data  sets  available  to  qualified  researchers  with  access  administered  by  Seven  Bridges  Genomics  and  the  Ontario  Institute  for  Cancer  Research

• Available  to  users  of  the  NCI’s  Cancer  Genomics  Cloud  (CGC)  running  on  AWS

• Users  can  interact  with  the  data  via  the  CGC  web  portal  or  the  CGC’s  APIs

• The  PanCancer Launcher  is  an  open-­source  system  to  create  EC2  instances,  enqueueanalysis  work  items,  trigger  Docker-­based  analytic  pipelines,  and  clean  up  launched  resources  when  work  is  complete.

The Cancer  Genome  Atlas  (TCGA)  and  ICGC

https://aws.amazon.com/public-data-sets/tcga/https://aws.amazon.com/public-data-sets/icgc/https://aws.amazon.com/blogs/aws/new-aws-public-data-sets-tcga-and-icgc/

Page 13: Accelerating Time to Science: Transforming Research in the Cloud

A  few  examples…

Page 14: Accelerating Time to Science: Transforming Research in the Cloud

High  Throughput  Computing  at  Scale

The  Large  Hadron  Collider  @  CERN  includes  6,000+  researchers  from  over  40  countries  and  produces  approximately  25PB  of  data  each  year.    

The  ATLAS  and  CMS  experiments  are  using  AWS  for  Monte  Carlo  simulations,  processing,  and  analysis  of  LHC  data.

Page 15: Accelerating Time to Science: Transforming Research in the Cloud

Peering  with  all  global  research  networks

Image courtesy John Hover - Brookhaven National Lab

Page 16: Accelerating Time to Science: Transforming Research in the Cloud

Data-­Intensive  ComputingThe Square Kilometer Array will link 250,000 radio telescopes together, creating the world’s most sensitive telescope. The SKA will generate zettabytesof raw data, publishing exabytes annually over 30-40 years.

Researchers are using AWS to develop and test: • Data processing pipelines• Image visualization tools• Exabyte-scale research data management• Collaborative research environmentsaws.amazon.com/solutions/case-studies/icrar/

Page 17: Accelerating Time to Science: Transforming Research in the Cloud

Astrocompute in  the  Cloud  Program• AWS  is  adding  1PB  of  SKA  pre-­cursor  data  to  the  Amazon  Public  Data  Sets  program

• We  are  also  providing  $500K  in  AWS  Research  Grants  for  the  SKA  to  direct  towards  projects  focused  on:– High-­throughput  data  analysis– Image  analysis  algorithms– Data  mining  discoveries  (i.e.  ML,  CV  and  

data  compression)– Exascale data  management  techniques– Collaborative  research  enablement

https://www.skatelescope.org/ska-aws-astrocompute-call-for-proposals/

Page 18: Accelerating Time to Science: Transforming Research in the Cloud

Astronomy  Data  Visualization  on  AWS

Page 19: Accelerating Time to Science: Transforming Research in the Cloud

NepalEarthquake

Individuals around the world

are analyzing before/after imagery of Kathmandu

in order to more-effectively direct

emergency response and

recovery efforts

Page 20: Accelerating Time to Science: Transforming Research in the Cloud

Estimating  Trees  &  Shrubs  in  Sub-­Sahara

• NASA’s  Center  for  Climate  Simulation  (NCCS)  used  AWS  to  process  satellite  imagery  in  order  to  estimate  the  tree  and  brush  biomass  over  the  entire  arid  and  semi-­arid  zone  on  the  south  side  of  the  Sahara.

• By  using  AWS,  NASA  was  able  to  reduce  the  processing  time  from  ~1  year  to  less  than  a  month.• By  estimating  biomass,  NASA  is  able  to  develop  more  accurate  climate  models  which  will  help  us  

better  understand  and  mitigate  the  effects  of  climate  change.

Page 21: Accelerating Time to Science: Transforming Research in the Cloud

High  Performance  Computing

Simulations   in  the  Automotive  Sector• Crash  and  materials  simulations• Fluid  and  thermal  dynamics  simulations• Car  body  aerodynamics• Electronics  and  electromagnetic  simulations

Honda  materials  science  simulations  on  AWS:• Deploying  scalable  HPC  clusters  on  AWS  Spot  – up  to  1000  C3  instances• Running  more  simulations  than  before,  for  more  accurate  results

“Cloud  offers  us  an  opportunity,  as  we  can  innovate  faster   than  before.”-­ Ayumi Tada,  IT  System  Administrator,  Honda  R&D

Page 22: Accelerating Time to Science: Transforming Research in the Cloud

Social  Analytics  on  AWSWe  Feel  is  a  project  that  explores  whether  social  media  can  provide  an  accurate,  real-­time  signal  of  the  world’s  emotional  state  that  analyzes  approximately  27  million  tweets/day.

A  collaboration  between  CSIRO,  The  Black  Dog  Institute,  Amazon  Web  Services  and  GNIP.

The  outcomes?1. We  can  now  monitor,  in  real-­time,  the  emotional  

health  of  the  world2. Seamlessly  scale  infrastructure  up  or  down  in  

direct  relation  to  social  activity3. Amazon’s  Big  Data  platform  enables  real-­time  

trend  analysis,  queries  of  historical  data  and  geospatial  analytics

http://wefeel.csiro.au/

Page 23: Accelerating Time to Science: Transforming Research in the Cloud
Page 24: Accelerating Time to Science: Transforming Research in the Cloud

Enabling  Global  Collaboration

Bring  the  users  to  the  data,  don’t  send  the  data  to  the  users

Page 25: Accelerating Time to Science: Transforming Research in the Cloud

Enabling  Global  Collaboration

Bring  the  users  to  the  data,  don’t  send  the  data  to  the  users

Page 26: Accelerating Time to Science: Transforming Research in the Cloud

Baylor  College  of  Medicine  CHARGEBaylor  College  of  Medicine  Human  Genome  Sequencing  Center  and  DNANexus using  the  Mercury  Pipeline  for  the  Cohorts  for  Heart  and  Aging  Research  in  Genomic  Epidemiology  (CHARGE)  Consortium

Supports  300+  researchers  around  the  world

Analyzed  the  genomes  of  over  14,000  individuals,  encompassing  3,751  whole  genomes  and  10,940  whole  exomes (~1PB  of  data)

Used  3.3  million  core  hours  over  4  weeks  to  complete  the  job  5.7x  faster  than  what  could  have  been  accomplished  on-­premise

The  outcomes?• Easier  collaboration• Faster  time  to  science• Cost-­effective:  On-­premise  was  prohibitively  expensive• No  longer  constrained  by  on-­premise  capacity• Scientists  focusing  on  Science  as  opposed  to  infrastructure

https://aws.amazon.com/solutions/case-­studies/baylor/

Page 27: Accelerating Time to Science: Transforming Research in the Cloud

Schrodinger  &  Cycle  Computing:Computational  Chemistry  for  Better  Solar  Power

Simulation by Mark Thompson of the University of Southern California to see which of 205,000 organic compounds could be used for photovoltaic cells for

solar panel material.

Estimated  computation  time  264  years  completed  in  18  hours.

• 156,314 core cluster, 8 regions

• 1.21  petaFLOPS (Rpeak)

• $33,000  or  16¢ per molecule

Loosely  Coupled

Page 28: Accelerating Time to Science: Transforming Research in the Cloud

Science-­as-­a-­Service

Globus  Genomics,  DNAnexus,  and  SevenBridges Genomics  offer  inexpensive,  easy-­to-­use,  and  secure  platforms  for  processing  and  analyzing  genomic  data.

The  Weather  Company  pushes  four  gigabytes  of  data  to  AWS  each  second  in  order  to  delivers  15  billion forecasts  each  day  to  their  customers  around  the  world.

aws.amazon.com/solutions/case-studies/the-weather-company/

Page 29: Accelerating Time to Science: Transforming Research in the Cloud

Citizen  ScienceThe  Asteroid  Data  Hunters  competition  used  AWS  to  develop  better  mechanisms  for  finding  near-­Earth  asteroids.    The  top  algorithm  is  18%  better  at  finding  asteroids!

Page 30: Accelerating Time to Science: Transforming Research in the Cloud

Internet  of  Things• Connect  and  manage  devices• Secure  device  connections  &  data• Process  &  act  upon  device  data• Read  &  set  device  state  at  any  time

https://aws.amazon.com/iot

John Deere’s Vice President of Technology Information Solutions Patrick Pinkston describing how the 178-year-old company is using Amazon’s cloud.

John Deere is using AWS IoT solutions to help farmers track planting a crop downto the seed, both at the individual seed level and by the acreage of the field.

https://www.youtube.com/watch?v=uq4kQPsM4cQ

Page 31: Accelerating Time to Science: Transforming Research in the Cloud

Thank  you!Jamie  Kinney

[email protected]@jamiekinney

Page 32: Accelerating Time to Science: Transforming Research in the Cloud

Additional  resources…

• aws.amazon.com/big-­data• aws.amazon.com/compliance• aws.amazon.com/datasets• aws.amazon.com/grants• aws.amazon.com/genomics• aws.amazon.com/hpc• aws.amazon.com/security