Top Banner
Advancing Life Sciences Research with High Performance Computing and Cyberinfrastructure Ian Stokes-Rees Harvard Medical School SHOW - Making Biology Binary, June 2010
33
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2010 06 pre_show_computing_lifesciences_stokesrees

Advancing Life Sciences Research with High Performance Computing and Cyberinfrastructure

Ian Stokes-ReesHarvard Medical School

SHOW - Making Biology Binary, June 2010

Page 2: 2010 06 pre_show_computing_lifesciences_stokesrees

Dengue Virus Movie

animation, not simulation, informed by science

Page 3: 2010 06 pre_show_computing_lifesciences_stokesrees

digizyme.com

Page 4: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

Science Behind the MovieMulti-scale

Data intensive

Dynamic

Models

Simulation

Analysis

Page 5: 2010 06 pre_show_computing_lifesciences_stokesrees

Water channel through aquaporin tetramere in lipid bilayerTajkhorshid, E., Nollert, P., Jensen, M.O., Miercke, L.J., O'Connell, J., Stroud, R.M., and Schulten, K. (2002). Science 296, 525-530

Page 6: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

Molecular Dynamics

Computationally intensive

Necessarily parallel

Nanosecond scale today

Millisecond to second tomorrow

Rapidly growing interest

Page 7: 2010 06 pre_show_computing_lifesciences_stokesrees
Page 8: 2010 06 pre_show_computing_lifesciences_stokesrees

48 cores, single system image

Page 9: 2010 06 pre_show_computing_lifesciences_stokesrees

GPU Computing 200-800 stream processing cores per card

Page 10: 2010 06 pre_show_computing_lifesciences_stokesrees

NextGen Sequencing

Page 11: 2010 06 pre_show_computing_lifesciences_stokesrees
Page 12: 2010 06 pre_show_computing_lifesciences_stokesrees
Page 13: 2010 06 pre_show_computing_lifesciences_stokesrees

Collaborations and Communities

Page 14: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

Tufts

University

School of

Medicin

e

Boston Life SciencesUniversitiesHospitalsPharmaceuticalsResearch Institutes

Page 15: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

Rice UniversityE. NikonowiczY. ShamooY.J. Tao

CalTechP. BjorkmanW. ClemonsG. JensenD. Rees

StanfordA. BrungerK. GarciaT. Jardetzky

UCSFJJ MirandaY. Cheng

UC DavisH. Stahlberg

UCSDT. NakagawaH. Viadiu

WesternUM. Swairjo

U. WashingtonT. Gonen

Washington U. School of Med.T. EllenbergerD. Fremont

VanderbiltCenter for Structural Biology

Rosalind FranklinD. Harrison

A. LeschzinerK. MillerA. RaoT. RapoportM. SamsoP. SlizT. SpringerG. VerdineG. WagnerL. WalenskyS.WalkerT.WalzJ. WangS. Wong

N. Beglova S. BlacklowB. ChenJ. ChouJ. ClardyM. EckB. FurieR. GaudetM. GrantS.C. Harrison J. HogleD. JeruzalmiD. KahneT. Kirchhausen

Harvard and Affiliates

NE-CATR. OswaldC. ParrishH. Sondermann

R. CerioneB. CraneS. EalickM. JinA. Ke

Cornell U.

Brandeis U.N. Grigorieff

Tufts U.K. Heldwein

UMass MedicalW. Royer

NIHM. Mayer

U. MarylandE. Toth

K. ReinischJ. SchlessingerF. SigworthF. Zhou

T. BoggonD. BraddockY. HaE. Lolis

Yale U.

C. SandersB. SpillerM. Stone

M. Waterman

W. ChazinB. EichmanM. EgliB. LacyM. Ohi

Columbia U.Q. Fan

Rockefeller U.R. MacKinnon

Thomas JeffersonJ. Williams

Not Pictured: University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan

Page 16: 2010 06 pre_show_computing_lifesciences_stokesrees

If the particle physicists can use it...

Page 17: 2010 06 pre_show_computing_lifesciences_stokesrees

Open Science Grid

opensciencegrid.org

Page 18: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

Grid Computing

Federated and scalable

Secure

Standardized

Compute sharing & cycle scavenging

Dynamic formation of collaborations

Data sharing

Page 19: 2010 06 pre_show_computing_lifesciences_stokesrees
Page 20: 2010 06 pre_show_computing_lifesciences_stokesrees

Protein Structure Studies

Page 21: 2010 06 pre_show_computing_lifesciences_stokesrees
Page 22: 2010 06 pre_show_computing_lifesciences_stokesrees
Page 23: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, http://sbgrid.org

AcknowledgementsPiotr Sliz

PI and SBGrid team leader

Ian Levesque

Systems Architect

Ben Eisenbraun

Software Curator

Peter Doherty

Grid Administrator

Caitlin Colgrove

Intern Software Engineer

Steve Jahl

System Administrator

Page 24: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

SummaryCompute power increasingly affordable

New computational techniques

New hardware (multi-core, GPU)

Grid and cloud computing

Fast networking, cheap storage

Scientists developing necessary skillsBe in touch - [email protected]

Page 25: 2010 06 pre_show_computing_lifesciences_stokesrees

Extras

Page 26: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, SBGrid, Harvard Medical School October 13th, 2009

How to get a structural biologist using CIEase of use

No command line

X.509 (initial request, VOs, proxies, Roles, etc.) are really complicated

Support infrastructure (mailing lists, tickets, phone, training)

Killer apps

They will use it if they see peers using it to advance scientific goals

They will use it if some novel workflows or workflow patterns are established

Data management is a big problem for everyone (see bonus, time permitting) -- we believe grid infrastructure could provide a solution

Security

Data needs to be secure ...

... but users still want to control sharing/access

Roadblocks

Reliability of underlying infrastructure and difficulty in debugging

Applications tied to GUIs, rudimentary interfaces

Page 27: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

Security ChallengesIdentity Management

Mixture of .htpasswd, PAM, X.509, and application-specific IDs

Complexity of X.509 (and associated paraphernalia) confuses users

account creation, use, and management

Virtual Organization hierarchies and user-driven collaborations

Inheritance of rights/policies

How to allow users to easily create and manage groups

Merging security policies

Site/resource, VO, and user policies need to be merged

Encryption and Privacy Preservation

Generic mechanisms for encryption and key management

Preserving privacy of actions and data in federated grid environment

Page 28: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010

Security WorkMeta data system

Provide more generic pointers to ACLs and encryption keys

Extension of GACL system

Include non-X.509 ID tokens as policy principals

Allow GACL policies to apply to web framework objects (pyGACL)

Simple replicated key system for file encryption

Use of meta-data framework to point to encryption key (and replicas)

Use GACL to control key access (regular file)

Libraries to automatically read/write encrypted files

Future

VO hierarchies

Tools for user driven ACL management

Tools for policy management (merging site, VO and user policies)

Page 29: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, http://sbgrid.org

Page 30: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, http://sbgrid.org

Page 31: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, http://sbgrid.org

Page 32: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, http://sbgrid.org

Page 33: 2010 06 pre_show_computing_lifesciences_stokesrees

Ian Stokes-Rees, http://sbgrid.org