The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona.

Post on 26-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

The iPlant Collaborative Community Cyberinfrastructure for Life Science

Roger Barthelson/Uwe HilgertiPlant / University of Arizona

The iPlant CollaborativeVision

www.iPlantCollaborative.org

Enable life science researchers and educators touse and extend cyberinfrastructure

Community-driven organization builds cyberinfrastructure for biological sciences

The iPlant CollaborativeVision

UATACC

CSHL

iPlant CollaborativeA virtual organization

Biological CyberinfrastructureBig Data in Biology

Human Genome:$2.7 Billion, 13 Years

Human Genome: $900, 6 Hours

2014Oxford Nanopore

MiniION

2003: ABI 3730 Sequencer

The Egalitarian GenomeNext Generation Sequencing 2014

“BGI, based in China, is the world’s largest genomics research institute, with 167 DNA sequencers producing the equivalent of 2,000 human genomes a day.

BGI churns out so much data that it often cannot transmit its results to clients or collaborators over the Internet or other communications lines because that would take weeks. Instead, it sends computer disks containing the data, via FedEx.”

The Big Data ProblemStorage and Analysis

Biological CyberinfrastructureThe Problem of Big Data in Biology

Biology’s Other Big Data

Phenomics

Visualization

How iPlant CI Enables DiscoveryChallenge: Create an easy-to-use platform powerful enough

to handle data-intensive biologyMany bioinformatics tools “off limits” to those without

specialized computational backgrounds (“command line”).

• Data Store• Discovery Environment – 100s of tools/apps• Atmosphere – Cloud Computing• Bisque – Image Analysis Environment• APIs

iPlant APIsResources

The Biology App Store

The iPlant CollaborativeWhat is cyberinfrastructure?

Manage DataShare Data

Analyze Data

Scalable, accessible computation:data storage, cloud services, and software tools

Utilize Big Data TechFacilitate

CollaborationsConnect Resources

Manage Access

Enable science(verifiable, reproducible, tractable)

The iPlant CollaborativeWhat iPlant offers

• Data Management & Storage Resources• Access to High Performance Computing Resources• Tool Integration System• Application Programming Interfaces (APIs)• Cloud Computing• Genotype To Phenotype Science Enablement• Tree of Life Science Enablement• Image Analysis Platform• Support for Molecular Breeding Platform (IBP)• Support for AgMIP• Others to come...

The iPlant CollaborativeWhat iPlant offers

The iPlant CollaborativeWhat iPlant offers

The iPlant CollaborativeWhat iPlant offers

How iPlant CI Enables DiscoveryChallenge: Navigate biology’s “data deluge”

HT Image data – GB’s per dayHT sequence data – TB’s per run

iPlant Data Store

Texas

Replication

Arizona

Grid Computing

Cloud Computing HPCCommunity

Super Computing

iDrop

WebDAV

FoundationAPI

DE

i-commands

iPlant Data StoreScalableReliableRedundantHigh-PerformanceConnected

How iPlant CI Enables DiscoverySolution: iPlant Data Store

All data in within the same platform speed and accessibility

• Access your data from multiple iPlant services

• Automatic data backup redundant between University of Arizona and University of Texas

• Multiple ways to share data with collaborators

• Multi-threaded high speed transfers

• Default 100 GB allocation. >1 TB allocations available with justification

Source Time (s)

CD 320

Berkeley Server 150

External Drive 36*

USB2.0 Flash 30

iPlant Data Store 18*

My Computer 15

Getting 1 GB onto my computer takes...

How iPlant CI Enables DiscoveryWhat iPlant data solutions mean for a bovine breeder

“It's kind of like being in that COPD commercial where the weight is lifted off your chest, only with iPlant, we have access to more computational power, so we can get to projects much faster and we can do big projects that our machines may not have allowed us to do previously!

The ability to transport 2TB of data overnight using the iRODS system was particularly helpful because previously, we had been mailing hard drives which is not an optimal solution to sharing big data.”

James Koltes, Iowa State

How iPlant CI Enables DiscoverySolution: Discovery Environment

An extensible platform for science

• High-powered computing• Data sharing/collaboration• Easy to use interface• Virtually limitless apps• Analysis history (provenance)

iPlant’s Discovery EnvironmentWeb Interface for Hundreds of Applications

(Some) Apps in Discovery Environment

• Sequence Quality Control– FastQC– Fastx Toolkit– Sabre, Scythe, Sickle (paired end

trimming)– SGA cleanup (paired end quality

trimming)– Coming soon…

Sequence induction, assessment, and trimming pipeline

Mira contaminant detection and removal

(for sequencing studies)

(Some) Apps in Discovery Environment

• Genome Assembly– ABySS– Soapdenovo2– Velvet– Newbler– Contig analysis tools

With or without reference sequence for comparison

– Coming soon…Minimus2MiraPacBioToCA Or PBJelly?

(for sequencing studies)

(Some) Apps in Discovery Environment

• Transcript assembly/RNASeq– Tophat, Cufflinks, Cuffmerge,

CuffDiff– Oases– Trinity– Newbler– Scarf– Coming soon…

Open pipeline for transcript expression analysis (quantitative RNASeq)

Mira transcriptome assembly

(for sequencing studies)

The iPlant CollaborativeWhat iPlant offers

The iPlant CollaborativeWhat iPlant offers

How iPlant CI Enables DiscoveryWhat the Discovery Environment means to bench biologists

“In one week I was able to align my RNA-Seq samples using a method that previously took me a month on my bioinformatics computers…

Being able to access my data any time and from anywhere – price less.

The DE interface is intuitive and easy to use...[and] will allow greater continuity and comparability between different experiments from different laboratories.”

Richard Barker – Univ. Wisconsin, Madison

How iPlant CI Enables DiscoveryChallenge: Collaborate and access software on demand

Frustrated bioinformaticians serving the needs of severalusers

+ works well / powerful- expensive / complex

Cartoon: http://phdhumor.blogspot.com/2008/12/on-lazy-day-for-bioinformatician.html

How iPlant CI Enables DiscoveryiPlant Solution: Atmosphere

On-demand computing resource built on a cloud infrastructure

• Virtual Machine pre-configured with: Software Memory requirements Processing power

• Plant authentication and storage and HPC capabilities

• Build custom images/appliances and share with community

• Cross-platform desktop access to GUI applications in the cloud (using VNC)

Atmosphere: Your Cloud, Your Way

Google Cloud

Atmosphere

AtmosphereSelect a Machine Image, Launch

How iPlant CI Enables DiscoveryWhat Atmosphere means to bioinformaticians

“What my users used to call me for, they now do on their own through Atmosphere. Now I can scale up my user community”

Nathan Miller, Univ. Wisconsin, Madison

• BLAST 400k transcripts against NCBI nr in 36 h vs. 2 months

• Use iPlant Data Store to move 1500 high-res images per day for analysis

“iPlant is a great equalizer.” Mike Covington, UC Davis

The iPlant CollaborativeYour colleagues

Staff:Greg AbramSonali AdityaRitu AroraRoger BarthelsonRob BovillBrad BoyleGordon BurleighJohn CazesMike ConwayVictor CorderoRion DooleyAaron DubrowAndy EdmondsDmitry FedorovJohn FonnerMelyssa FratkinMichael Gatto

Leadership Team

Steve Goff - UADan Stanzione – TACCMatt Vaughn - TACCNirav Merchant – UAEric Lyons - UADoreen Ware – CSHL

Faculty Advisors & Collaborators:Ali AkogluKobus BarnardVolker BrendelTimothy ClausnerSally ElginBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDavid LowenthalB.S. ManjunathDavid Neale

Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYaDi ChenDavid ChoiBarbara Dobrin

Brian O’MearaSudha RamDavid SaltMark SchildhauerNeelima SinhaDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisRick StevensJames TaylorBrett TylerSteve Welch

Zhenyuan LuEric LyonsAaron MarcuseKubitzNaim MatasciRobert McLayNathan MillerSteve Mock Martha NarroShannon OliverBenoit ParmentierJmatt PetersonDennis RobertsPaul SarandoJerry SchneiderEdwin SkidmoreBrandon Smith

Utkarsh GaurCornel GhibanSteve GregoryMathew HelmkeNatalie HenriquesUwe HilgertNicole HopkinsLogan JohnsonChris JordanKathleen KennedyMohammed KhalfanDavid KnappLars KoersterkSangeeta KuchimanchiKristian KvilekvalSue LauterTina Lee

Mary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellJonathan StrootmanPeter Van BurenHans VasquezGrossRebeka VillarrealRamona WalllsLiya WangAnton Westveld Jason WilliamsJohn WregglesworthWeijia Xu

Andrew PredoehlSathee RavindranathKyle SimekGregory StriemerJason VandeventerNicholas WoodwardKuan Yang

Postdocs:Barbara BanburyChristos Noutsos Solon PissisBrad Ruhfel

John DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha MaratheAndre MercerKurt MichaelsZack Pierce

Michael Schatz – CSHLDavid Micklos – CSHLAnn Stapleton – UNCWRon Vetter – UNCW

Connect with iPlant!

Twitter: @iPlantCollab #iPlantFacebook: facebook.com/iPlantCollab

LinkedIn: iplant.co/iPlantCollabLinkedInGoogle+: iplant.com/iPlantGooglePlus

top related