Top Banner
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences Naim Matasci BIO5 / The iPlant Collaborative
39

The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Dec 05, 2014

Download

Technology

Naim Matasci

iPlant Presentation given at NESCent in June 2012 for the phylotastic participants of Phylotastic
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

The iPlant Collaborative: A Cyberinfrastructure for the Life

Sciences

Naim MatasciBIO5 / The iPlant Collaborative

Page 2: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

What is iPlant?

Page 3: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html

Page 4: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 5: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Problem 1: Data Volume

• Cost of analysis follows Moore's Law:– 1 Student with 1 computer to analyze 1 Mb of

data produced in 2001– 200 Students and 200 computers to analyze all

data produced for the same cost today (10 Gb)

Page 6: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

1. Tools separated by compute platform, data format, integration issues, and programming model.

2. Mixture of desktop, command line, database, and web-based tools

3. Labor intensive, fragile solutions devised to reach scientific objectives

4. Little ability to share results, analytical methods

5. Lack of reproducibility

Problem 2: Fragmented Analytical Landscape

Page 7: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Scalability

ABI 3730 DNA Analyzer, illumina Genome Analyzer, Joe Felesenstein ca. 1980, Ranger Cluster at TACC

Page 8: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 9: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 10: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

10

Major Ways to Access iPlant

• Storing and sharing data large and small: iPlant Data Storage• Integrated web-based analysis: The Discovery Environment• Cloud computing: Atmosphere• Applications: TNRS, TreeViewer, PhytoBisque, etc• Scientific networking, knowledgebase and information

exchange: My-Plant.org • Educational tools: DNASubway• Embedding iPlant CI capabilities into software: The

Foundation API• High Performance Computing for experts: TeraGrid/XSEDE

Page 11: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Why is the tree of life important?

“Knowledge of evolutionary relationships is fundamental to biology, yielding new insights across the plant sciences, from comparative genomics and molecular evolution, to plant development, to the study of adaptation, speciation, community assembly, and ecosystem functioning.”

Page 12: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Nothing in biology makes sense except in the light of evolution.

T. G. Dobzahnsky

Page 13: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 14: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

C3 to C4 Photosynthesis

Xin-Guang et al. 2008

Page 15: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

"We combined geospatial and molecular sequence data from two public archives to produce a 1,230-taxon phylogeny of the grasses with accompanying climate data for all species, extracted from more than 1.1 million herbarium specimens."

Edwards and Smith, 2010

Page 16: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

"Here we show that grasses are ancestrally a warm-adapted clade and that C4 evolution was not correlated with shifts between temperate and tropical biomes. Instead, 18 of 20 inferred C4 origins were correlated with marked reductions in mean annual precipitation."

Page 17: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

New Possibilities

illumina Genome Analyzer, Ranger Cluster at TACC, Acer phylogeny (Ackerly 2009), Green Plant ToL

Page 18: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 19: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Just Ask

Page 20: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Atmosphere

Page 21: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 22: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 23: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 24: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

iPlant's APIs – The Foundation APIService

EndpointRole

IO File storage, retrieval and management. Database interoperability

DATA File format conversion

APPS Registration and discovery of HPC applications

JOB Submission and management of compute jobs

SYSTEMS Availability and info about XSEDE hosts

PROFILE User profile discovery

AUTH Token based secure authentication

POSTIT URL shortener

Page 25: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

25

Consumer Applications

Page 26: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 27: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 28: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

iPlant Data Store

Dramatization: Not the actual iPlant Data Store

Page 29: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Overview of the iPlant Data StoreSome important items we won’t see in the demo

Source Destination Copy Method Time (seconds)

CD My Computer cp 320

Berkeley Server My Computer scp 150

External Drive My Computer cp 36

USB2.0 Flash My Computer cp 30

iDS MyComputer iget 18

My Computer My Computer cp 15

Close to optimum conditions; transfer between

Univ. of Arizona and UC Berkeley

100GB: 29m15s

1 GB / 17.5 seconds

Page 30: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 31: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Tree Visualization

• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information

Page 32: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

iPlant Tree Viewer

http://portnoy.iplantcollaborative.org/

Page 33: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

LIVE TREE VIEW DEMO

Page 34: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Obstacles

Number of taxa Taxa names

Page 35: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Taxonomic uncertainty

1. Non-existent names• Misspellings• Contamination

• Annotations• Morphospecies• Digitization issues (frame shifts, character encoding)Lexical

variants (digitization conventions)

2. Synonymy• Nomenclatural synonyms• Taxonomic synonyms / concepts

3. Misidentifications, incomplete identifications

Page 36: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

a) Centaurium curvistamineum (Wittr.) Abrams (1951)

b) Centaurium minimum (Howell) Piper (1915)

c) Centaurium muhlenbergii (Griseb.) Wight ex Piper (1906)

d) Centaurium muhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937)

e) Centaurium muhlenbergii (Griseb.) Wight ex Piper var. albiflorum Suksd. (1927)

f) Centaurodes muhlenbergii (Griseb.) Kuntze (1891)

g) Erythraea curvistaminea Wittr. (1886)

h) Erythraea minima Howell (1901)i) Erythraea muhlenbergii Griseb.

(1839)

Image: Gordon Leppig & Andrea J. Pickart

Page 37: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 38: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
Page 39: The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences

Request Tool Installation

Apps -> Create -> New App

Create New -> Request Tool Installation

Fill out forms and submit.Receive response in 2-5 days.