Top Banner
araport.org @araport 1 Araport: your one-stop-shop for Arabidopsis data in the 21st century www.araport.org Chris Town J. Craig Venter Institute
18
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICAR 2015 Plenary - Chris Town

araport.org @araport 1

Araport: your one-stop-shop for

Arabidopsis data in the 21st century

www.araport.org

Chris Town

J. Craig Venter Institute

Page 2: ICAR 2015 Plenary - Chris Town

araport.org @araport

Funded in late 2013, the goal of the Araport Project is to integrate a wide range of data types using

wherever possible a data federation approach via web services. It will also provide the

infrastructure to enable community members to mobilize their own data and expose it through the

Araport interface.

Page 3: ICAR 2015 Plenary - Chris Town

araport.org @araport

What are the data types we are aiming to integrate?

Araport

• Among model organisms

• Among plants

Genome

features

Genes and

Proteins

Pathways and

Networks

Orthologs and

Paralogs

Germplasm and

Phenotypes • Physical interactions

• Genetic interactions

• Metabolic pathways

• Regulatory networks

• Expression

• Functional annotation

• Publication

• Gene families

• Gene structures

• Population variants

• Epigenetics

• Binding sites

• Mutants and ecotypes

• Genotypes and alleles

• Phenotypes

Page 4: ICAR 2015 Plenary - Chris Town

araport.org @araport

Where are the data coming from?

Araport

• Among model organisms

• Among plants

Genome

features

Genes and

Proteins

Pathways and

Networks

Orthologs and

Paralogs

Germplasm and

Phenotypes BAR

CoGe

TAIR

Phytozome

TAIR

BAR

• Physical interactions

• Genetic interactions

• Metabolic pathways

• Regulatory networks

• Expression

• Functional annotation

• Publication

• Gene families

• Gene structures

• 1001 variants

• Epigenetics

• Binding sites

• Mutants and ecotypes

• Genotypes and alleles

• Phenotypes

NCBI

Uniprot EBI

KEGG

Panther

Major Data Centers

Early Adoption Groups

Key:

Broader Community

TAIR

ABRC

NASC

AraCyc

Ensembl

Compara

AGRIS

AGRIS

AraLip

VIB

Page 5: ICAR 2015 Plenary - Chris Town

araport.org @araport

Chado JBrowse

Science

Apps

Custom

analysis

ThaleMine Gene List

Analysis

Gene

Report

Query, Web

Services

TAIR10

Array Expression

Interactions

Pathways

(KEGG)

Publications,

GeneRif

(NCBI, Uniprot)

Warehousing

Real-time federation

Co-expression

(ATTED)

ePictographs

(BAR)

1001 Genomes

Variants

(Ensemble)

Epigenetics

(EPIC-CoGe) Germplasm

Genotype

Phenotype

Araport11 Updated models

RNA-seq by tissue

Genome data T-DNA-seq

PEAT, DRS, …

(>70 tracks)

Community

Data/Tools

Real-time federation

Real-time federation

Warehousing

How are the data assimilated? U

se

r in

terf

ace

s

Page 6: ICAR 2015 Plenary - Chris Town

araport.org @araport 6

User interfaces:

JBrowse

Data types Actions

Chromosomes Scroll & zoom

Transcripts Track layering

Proteins Data integration

Expression

Interaction

Publications

Orthologs

Page 7: ICAR 2015 Plenary - Chris Town

araport.org @araport

User interfaces:

ThaleMine

Data types Actions

Function Search

Interaction Drill down

Expression List manipulation

Publications Save results

Page 8: ICAR 2015 Plenary - Chris Town

araport.org @araport

User interfaces: Science Apps Workspace

Prototype Highlights

• Skilled third parties can

create apps

• Features configurable

workspaces

• Supports analysis,

visualization, query,

and access apps

• Features an App Store

(not shown) for

discovery

• Apps are mobile

responsive

Page 9: ICAR 2015 Plenary - Chris Town

araport.org @araport

From web site to web services

Page 10: ICAR 2015 Plenary - Chris Town

araport.org @araport

GFP Reporter Images via

Web Services

GFP Reporter Images via

Science Apps

Page 11: ICAR 2015 Plenary - Chris Town

araport.org @araport

The richness of data sets in Araport depends upon community participation

We are developing the infrastructure that will allow Arabidopsis researchers to mobilize

their data and integrate it into the Araport site.

Community Participation: Developing for Araport

Developer workshop at TACC,

Nov. 2014. Another workshop

will be held this fall.

Page 12: ICAR 2015 Plenary - Chris Town

araport.org @araport

Blake Meyers Nick Provart

Erich Grotewold John Browse

Waltraud Schultze Sue Rhee

Harvey Millar Basil Nikolau

We are pro-actively engaging community contributors

Please write if you are interested: [email protected]

Page 13: ICAR 2015 Plenary - Chris Town

araport.org @araport

Sequence Read Archive

De novo Trinity

Assembly

Binned by 11 Tissue/Organ

Concatenating De Novo Assembly and

Genome-Guided Assembly for each

Tissue/Organ

Araport11

Protein-Coding Genes

TAIR10 plus

TopHat Alignment to TAIR10

Araport11

Mapping

Coverage Genome-Guided Trinity

Assembly

Binned by 11

Tissue/Organ

Araport11

Transcript

Assembly

Araport11

Spliced

Junction

11 Transcriptomes Assembled by PASA

Append Novel Gene

Models to TAIR10

Annotation Updates by PASA

Union of 11 sets of splice isoforms

Functional Annotation

UniProt Protein NCBI and MAKER-P

Assembly

747 unique models

Mapping to

intergenic regions

Literature

Assigning Locus ID

Novel transcribed

regions

Re-annotation of the Col-0 genome: Araport11

Manual

evaluation

Final gene set

233 changes, 112 additions

Page 14: ICAR 2015 Plenary - Chris Town

araport.org @araport

TAIR10 Araport11

Number of protein coding loci 27,416 28,565

Number of transcripts including isoforms 35,385 50,203

Number of TAIR10 transcripts with altered CDS 933 (3.3%)

Number of TAIR10 transcripts with altered UTRs 25,079 (88.2%)

Number of loci with splice isoform 5,665 (18%) 10,946 (38%)

Number of novel loci 1,162

Novel transcribed regions not yet classified 554

Araport 11 Protein Coding Genes: Pre-release.

Annotation Statistics

Structural Annotation

Functional Annotation

Loci retaining TAIR10 functional description: 21,690

Loci receiving new functional description: 7,122

Data are available

from Araport through

JBrowse, ftp and web

services

Page 15: ICAR 2015 Plenary - Chris Town

araport.org @araport

Araport offers a number of options for community input

Page 16: ICAR 2015 Plenary - Chris Town

araport.org @araport

Community Annotation of Araport11 Genes using Web Apollo

Page 17: ICAR 2015 Plenary - Chris Town

araport.org @araport

Learn more about Araport at ICAR

Visit our poster in the “Systems biology and new approaches” session

Come to our workshop: Tuesday 4.15-6.00 pm, Room 242 A-B.

The Arabidopsis information portal for users and developers Agnes Chan (J. Craig Venter Institute)

A Guided Tour of Araport

Matt Vaughn (Texas Advanced Computing Center)

Developing Apps: Exposing your data through Araport

Nick Provart (University of Toronto)

A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource

Blake Meyers (University of Delaware)

A Community Collaborator Perspective: Case study 2 - Small RNA DBs

Enter our “Design an App” competition and win an iPad!

Page 18: ICAR 2015 Plenary - Chris Town

araport.org @araport

Acknowledgements

J Craig Venter Institute • Chris Town • Jason Miller • Agnes Chan • Erik Ferlanti • Vivek Krishnakumar • Irina Belyaeva • Maria Kim • Chia-Yi Cheng • Seth Schobel

University of Cambridge • Gos Micklem • Sergio Contrino Former members • Ben Rosen • Svetlana Karamycheva • Eleanor Pence

Texas Advanced Computing Center • Matt Vaughn • Steve Mock • Rion Dooley • Matt Hanlon • Joe Stubbs • Walter Moreira • Chris Jordan

TAIR • Eva Huala • Bob Muller