Top Banner
A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard, Uwe Wössner, Martin Aumüller, Huian Li, Donald K. Berry, John Colbourne Indiana University – University Information Technology Services Höchstleistungsrechnencentrum Stuttgart (High Performance Computing Center Stuttgart) Indiana University – Center for Genomics and Bioinformatics
21

A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Jan 16, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

A Global Grid for Analysis of Arthropod EvolutionCraig A. Stewart, Rainer Keller, Richard

Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard, Uwe Wössner, Martin Aumüller, Huian Li,

Donald K. Berry, John Colbourne

Indiana University – University Information Technology Services

Höchstleistungsrechnencentrum Stuttgart (High Performance Computing Center Stuttgart)

Indiana University – Center for Genomics and Bioinformatics

Page 2: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

License Terms• Please cite this presentation as: Stewart, C.A., R. Keller, R. Repasky, M. Hess, D.

Hart, M. Müller, R. Sheppard, U. Wössner, M. Aumüller, H. Li, D.K. Berry and J. Colbourne. A Global Grid for Analysis of Arthropod Evolution. 2004. Presentation. Presented at: Grid2004 - 5th IEEE/ACM International Workshop on Grid Computing (Pittsburgh, PA, 8 Nov 2004). Available from: http://hdl.handle.net/2022/14784

• Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document.

• Items indicated with a © or denoted with a source url are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse.

• Except where otherwise noted, the contents of this presentation are copyright 2004 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

Page 3: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Outline

• The biological problem• The software used• The global grid• What we learned• Acknowledgements

Page 4: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Biological problemAre Hexapods (animals with six legs) a single evolutionary group? Are ecdysozoans (animals that shed their skins) a single evolutionary group?

Page 5: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,
Page 6: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Phylogenetic inference

• Goal – reconstruct evolutionary history by comparison of DNA sequences

• NP-hard problem• Heuristic approach

used in maximum likelihood inference

• Data are available; analysis had never been attempted due to computational demands

Page 7: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Why this project on a grid?

• Important & time-sensitive biological question requiring massive computer resources

• A biologically-oriented code that scales well• Grid middleware environment & collaboration tool

well suited to the task at hand• Opportunity to create a grid spanning every

continent on earth (except Antarctica)

Page 8: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Software and data analysis• Non-grid preparatory work

– Download sequences from NCBI (67 Taxa, 12,162 bp, mitochondrial genes for 12 proteins)

– Align sequences with Multi-Clustal – Determine rate parameters with TreePuzzle

• Grid preparatory work– Analyze performance of fastDNAml with Vampir– Meetings via Access Grid & CoVise

• The grid software– PACXMPI – Grid/MPI middleware– Covise – Collaboration and visualization– fastDNAml – Maximum Likelihood phylogenetics

Page 9: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

fastDNAml

• ML analysis of phylogenetic trees based on DNA sequences

• Foreman/worker MPI program• Fault tolerance for grid computing built into

program since 1998• For 67 taxa: 2.12 ~10109 trees• Goal: 300 bootstraps, 10 jumbles per – 3000

executions (more than 3x typical!)

Page 10: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

• PACX-MPI (PArallel Computer eXtension) enables seamlessly execution of MPI-conforming parallel applications on a Grid.

• Application recompiled and linked w. PACX-MPI. • Communication between MPI processes internally

is done with the vendor MPI, while communication to other parts of the Metacomputer is done via the connecting network.

• Key advantages:– Optimized vendor MPI library is used. – Two daemons (MPI processes) take care of

communication between systems – allows bundling of communication.

Page 11: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

COVISE

• COllaborative VIsualization and Simulation Environment.

• Focus: collaborative & interactive use of supercomputers

• Interactive startup of calculation on Grid• Real-Time visualization of the results

Page 12: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Application framework

Work of Matthias Hess, HLRS

Page 13: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,
Page 14: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

GleiderfüsslerGrid

Page 15: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

The MetacomputersOne SGI Origin 2000 32 CEBPA (Spain)

Linux cluster 64 AIST (Japan)

Linux cluster 12 ANU (Australia)

Two T3E 128 HLRS (Germany)

IBM SP 64 IUB (US)

Dec Alpha 4 USP (Brazil)

Sunfire 6800 16 NUS (Singapore)

Three Hitachi SR8000 32 Germany

Cray T3E 128 MCC (UK)

Cray T3E 32 PSC (US)

IBM SP (Blue Horizon) 32 SDSC (US)

Four Dec Alpha (Lemieux) 64 PSC (US)

Five Linux system 1 ISET’com (Tunisia)

8 types of systems (several on Top500 list & TeraGrid); 6+ vendors; 641 processors; 9 countries; 6 continents

Page 16: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Results of one run

Page 17: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Conclusions• Results

– The grid actually worked (HPC Challenge award)– Real science was done (500 runs, 5,318,281 trees

analyzed, 7800 CPU hours used)• Lessons learned

– Access Grid was essential – CVS is good– Importance of fault tolerance & interaction of fault

tolerance with network speeds– Importance of the grid frameworks– Firewall issues & value of PACX-MPI

• Going forward– The key value of the grid approach was in reducing

wall-clock time to amounts tolerable for the application scientists!

Page 18: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Acknowledgments• This research was supported in part by the Indiana

Genomics Initiative. The Indiana Genomics Initiative of Indiana University is supported in part by Lilly Endowment Inc.

• This work was supported in part by Shared University Research grants from IBM, Inc. to Indiana University.

• This material is based upon work supported by the National Science Foundation under Grant No. 0116050 and Grant No. CDA-9601632. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

• Assistance with this presentation: John Herrin, Malinda Lingwall, W. Les Teach, Jennifer Fairman

• Thanks to the SciNet team and SC2003 organizers!

Page 19: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Jennifer Steinbachs Center for Genomics and Bioinformatics, Indiana UniversityGary W. Stuart Center for Genomics and Bioinformatics, Indiana University Michael Resch HLRS, University of StuttgartEric Wernert UITS, Indiana UniversityMarkus Buchhorn Australia National University Hiroshi Takemiya National Institute of Advanced Industrial Science & Technology, Japan Rim Belhaj ISET'Com, TunesiaWolfgang E. Nagel ZHR, Technical University of DresdenSergui Sanielevici Pittsburgh Supercomputing CenterSergio takeo Kofuji LCCA/CCE-USPDavid Bannon Victorian Partnership for Advanced Computing, Australia Norihiro Nakajima Japan Atomic Energy Research Institute Rosa Badia CEPBA-IBM Research Institute Mark A. Miller San Diego Supercomputer Center Hyungwoo Park Korea Institute of Science and Technology Information Rick Stevens Argonne National Laboratory Fang-Pang Lin National Center for High Performance Computing John Brooke Manchester Computing David Moffett Purdue University Tan Tin Wee National University of Singapore Greg Newby Arctic Region Supercomputer Center J.C.T. Poole CACR, Cal-TechRamched Hamza Sup'com, Tunesia Mary Papakhian, John N. Huffman UITS, Indiana UniversityLeigh Grundhoeffer UITS, Indiana UniversityRay Sheppard UITS, Indiana UniversityPeter Cherbas Center for Genomics and Bioinformatics, Indiana U.Stephen Pickles, Neil Stringfellow CSAR, University of ManchesterArthurina Breckenridge HLRS, University of Stuttgart

Page 20: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Our partners

Page 21: A Global Grid for Analysis of Arthropod Evolution Craig A. Stewart, Rainer Keller, Richard Repasky, Matthias Hess, David Hart, Matthias Müller, Ray Sheppard,

Questions?

Be sure to check out the current issue of Communications of the ACM Special Section on Bioinformatics – especially the article “The Emerging role of BioGrids”