Transcript
araport.org @araport 1
Araport: your one-stop-shop for
Arabidopsis data in the 21st century
www.araport.org
Chris Town
J. Craig Venter Institute
araport.org @araport
Funded in late 2013, the goal of the Araport Project is to integrate a wide range of data types using
wherever possible a data federation approach via web services. It will also provide the
infrastructure to enable community members to mobilize their own data and expose it through the
Araport interface.
araport.org @araport
What are the data types we are aiming to integrate?
Araport
• Among model organisms
• Among plants
Genome
features
Genes and
Proteins
Pathways and
Networks
Orthologs and
Paralogs
Germplasm and
Phenotypes • Physical interactions
• Genetic interactions
• Metabolic pathways
• Regulatory networks
• Expression
• Functional annotation
• Publication
• Gene families
• Gene structures
• Population variants
• Epigenetics
• Binding sites
• Mutants and ecotypes
• Genotypes and alleles
• Phenotypes
araport.org @araport
Where are the data coming from?
Araport
• Among model organisms
• Among plants
Genome
features
Genes and
Proteins
Pathways and
Networks
Orthologs and
Paralogs
Germplasm and
Phenotypes BAR
CoGe
TAIR
Phytozome
TAIR
BAR
• Physical interactions
• Genetic interactions
• Metabolic pathways
• Regulatory networks
• Expression
• Functional annotation
• Publication
• Gene families
• Gene structures
• 1001 variants
• Epigenetics
• Binding sites
• Mutants and ecotypes
• Genotypes and alleles
• Phenotypes
NCBI
Uniprot EBI
KEGG
Panther
Major Data Centers
Early Adoption Groups
Key:
Broader Community
TAIR
ABRC
NASC
AraCyc
Ensembl
Compara
AGRIS
AGRIS
AraLip
VIB
araport.org @araport
Chado JBrowse
Science
Apps
Custom
analysis
ThaleMine Gene List
Analysis
Gene
Report
Query, Web
Services
TAIR10
Array Expression
Interactions
Pathways
(KEGG)
Publications,
GeneRif
(NCBI, Uniprot)
Warehousing
Real-time federation
Co-expression
(ATTED)
ePictographs
(BAR)
1001 Genomes
Variants
(Ensemble)
Epigenetics
(EPIC-CoGe) Germplasm
Genotype
Phenotype
Araport11 Updated models
RNA-seq by tissue
Genome data T-DNA-seq
PEAT, DRS, …
(>70 tracks)
Community
Data/Tools
Real-time federation
Real-time federation
Warehousing
How are the data assimilated? U
se
r in
terf
ace
s
araport.org @araport 6
User interfaces:
JBrowse
Data types Actions
Chromosomes Scroll & zoom
Transcripts Track layering
Proteins Data integration
Expression
Interaction
Publications
Orthologs
araport.org @araport
User interfaces:
ThaleMine
Data types Actions
Function Search
Interaction Drill down
Expression List manipulation
Publications Save results
araport.org @araport
User interfaces: Science Apps Workspace
Prototype Highlights
• Skilled third parties can
create apps
• Features configurable
workspaces
• Supports analysis,
visualization, query,
and access apps
• Features an App Store
(not shown) for
discovery
• Apps are mobile
responsive
araport.org @araport
From web site to web services
araport.org @araport
GFP Reporter Images via
Web Services
GFP Reporter Images via
Science Apps
araport.org @araport
The richness of data sets in Araport depends upon community participation
We are developing the infrastructure that will allow Arabidopsis researchers to mobilize
their data and integrate it into the Araport site.
Community Participation: Developing for Araport
Developer workshop at TACC,
Nov. 2014. Another workshop
will be held this fall.
araport.org @araport
Blake Meyers Nick Provart
Erich Grotewold John Browse
Waltraud Schultze Sue Rhee
Harvey Millar Basil Nikolau
We are pro-actively engaging community contributors
Please write if you are interested: araport@jcvi.org
araport.org @araport
Sequence Read Archive
De novo Trinity
Assembly
Binned by 11 Tissue/Organ
Concatenating De Novo Assembly and
Genome-Guided Assembly for each
Tissue/Organ
Araport11
Protein-Coding Genes
TAIR10 plus
TopHat Alignment to TAIR10
Araport11
Mapping
Coverage Genome-Guided Trinity
Assembly
Binned by 11
Tissue/Organ
Araport11
Transcript
Assembly
Araport11
Spliced
Junction
11 Transcriptomes Assembled by PASA
Append Novel Gene
Models to TAIR10
Annotation Updates by PASA
Union of 11 sets of splice isoforms
Functional Annotation
UniProt Protein NCBI and MAKER-P
Assembly
747 unique models
Mapping to
intergenic regions
Literature
Assigning Locus ID
Novel transcribed
regions
Re-annotation of the Col-0 genome: Araport11
Manual
evaluation
Final gene set
233 changes, 112 additions
araport.org @araport
TAIR10 Araport11
Number of protein coding loci 27,416 28,565
Number of transcripts including isoforms 35,385 50,203
Number of TAIR10 transcripts with altered CDS 933 (3.3%)
Number of TAIR10 transcripts with altered UTRs 25,079 (88.2%)
Number of loci with splice isoform 5,665 (18%) 10,946 (38%)
Number of novel loci 1,162
Novel transcribed regions not yet classified 554
Araport 11 Protein Coding Genes: Pre-release.
Annotation Statistics
Structural Annotation
Functional Annotation
Loci retaining TAIR10 functional description: 21,690
Loci receiving new functional description: 7,122
Data are available
from Araport through
JBrowse, ftp and web
services
araport.org @araport
Araport offers a number of options for community input
araport.org @araport
Community Annotation of Araport11 Genes using Web Apollo
araport.org @araport
Learn more about Araport at ICAR
Visit our poster in the “Systems biology and new approaches” session
Come to our workshop: Tuesday 4.15-6.00 pm, Room 242 A-B.
The Arabidopsis information portal for users and developers Agnes Chan (J. Craig Venter Institute)
A Guided Tour of Araport
Matt Vaughn (Texas Advanced Computing Center)
Developing Apps: Exposing your data through Araport
Nick Provart (University of Toronto)
A Community Collaborator Perspective: Case study 1 - BioAnalytic Resource
Blake Meyers (University of Delaware)
A Community Collaborator Perspective: Case study 2 - Small RNA DBs
Enter our “Design an App” competition and win an iPad!
araport.org @araport
Acknowledgements
J Craig Venter Institute • Chris Town • Jason Miller • Agnes Chan • Erik Ferlanti • Vivek Krishnakumar • Irina Belyaeva • Maria Kim • Chia-Yi Cheng • Seth Schobel
University of Cambridge • Gos Micklem • Sergio Contrino Former members • Ben Rosen • Svetlana Karamycheva • Eleanor Pence
Texas Advanced Computing Center • Matt Vaughn • Steve Mock • Rion Dooley • Matt Hanlon • Joe Stubbs • Walter Moreira • Chris Jordan
TAIR • Eva Huala • Bob Muller
top related