APOLLO: Scalable and collaborative genome curation
Monica Munoz-Torres, PhD | @monimunoztoNathan Dunn, Colin Diesh*, Deepak Unni*, Seth Carbon, Heiko Dietze, Christopher Mungall, Nicole Washington, Ian Holmes*, Christine Elsik*, and Suzanna E. LewisBerkeley Bioinformatics Open-Source ProjectsGenomics Division, Lawrence Berkeley National Laboratory8th International Biocuration Conference. Beijing, China. 24 April, 2015
OUTLINE
• LAST TIME where we le. off last year
• IMPROVEMENTS
architecture, scalability, features • COLLABORATIONS
JBrowse & GenSAS • FUTURE PLANS
what lies on the horizon
Apollo Scalable and CollaboraJve Genome CuraJon
2 OUTLINE
APOLLOgenome annotation editing tool
3
v Web based, integrated with JBrowse. v Supports real Jme collaboraJon! v AutomaJc generaJon of ready-‐made computable data. v Supports annotaJon of genes, pseudogenes, tRNAs, snRNAs,
snoRNAs, ncRNAs, miRNAs, TEs, and repeats. v IntuiJve annotaJon, gestures, and pull-‐down menus to create and
edit transcripts and exons structures, insert comments (CV, freeform text), GO terms, etc.
INTRODUCTION
DETAILS FROM OUR LAST UPDATE
• ~ 100 insJtuJons worldwide • > 60 genomes across the tree of life:
• from plants to arthropods, to fungi, to fish and other vertebrates including human, bovine ca]le, and dog
PREVIOUSLY WE LEARNED 4
©BroadInsJtute.org
Nature Rev Gen 2009
©alexanderwild.com
©alexanderwild.com
©outdooralabama.com
National Agricultural Library
LESSONS WE HAVE LEARNED
What we have learned: • CollaboraJve work disJlls invaluable knowledge • We must enforce strict rules and formats • We must evolve with the data • A li]le training goes a long way • NGS poses addiJonal challenges
PREVIOUSLY WE LEARNED 5
HIGHLIGHTED IMPROVEMENTSscalability
SCALABILITY 6
• Easier deployment, more detailed documentaJon
• Supports mulJple organisms per server, improved comparaJve tools
• Easier to query the data and build extensions • More flexible user interface via removable side-‐dock with customizable tabs;
be]er search funcJonality, validaJon checks, and ediJng capability • Allows larger set of sequence annotaJons based on the Sequence Ontology
• Offers fine-‐grained user and group level permissions
NEW APOLLO ARCHITECTUREsimpler, more flexible
ARCHITECTURE 7
Web-‐based client + annotaJon-‐ediJng engine + server-‐side data service
REST / JSON Websockets
Annotation Engine (Server)
Shiro
LDAP
OAuth
JBrowse Data Organism 2
Annotations
Security
Preferences
Organisms
Tracks
BAM BED VCF GFF3 BigWig
Annotators
Google Web Toolkit (GWT) / Bootstrap
JBrowse DOJO / jQuery JBrowse Data Organism 1
Load genomic evidence for selected organism
Single Data Store PostgreSQL, MySQL,
MongoDB, ElasticSearch
Apollo v2.0
NEW APOLLO ARCHITECTUREsimpler, more flexible
ARCHITECTURE 8
REST / JSON Websockets
Annotation Engine (Server)
Shiro
LDAP
OAuth
JBrowse Data Organism 2
Annotations
Security
Preferences
Organisms
Tracks
BAM BED VCF GFF3 BigWig
Annotators
Google Web Toolkit (GWT) / Bootstrap
JBrowse DOJO / jQuery JBrowse Data Organism 1
Single Data Store PostgreSQL, MySQL,
MongoDB, ElasticSearch
Apollo v2.0
Single Data Store PostgreSQL, MySQL,
MongoDB, ElasticSearch
Grails controllers (J2EE servlet) route requests to the appropriate JBrowse data directory for a given organism NEW!
Load genomic evidence for selected organism
NEW APOLLO ARCHITECTUREsimpler, more flexible
ARCHITECTURE 9
REST / JSON Websockets
Annotation Engine (Server)
Shiro
LDAP
OAuth
JBrowse Data Organism 2
Annotations
Security
Preferences
Organisms
Tracks
BAM BED VCF GFF3 BigWig
Annotators
Google Web Toolkit (GWT) / Bootstrap
JBrowse DOJO / jQuery JBrowse Data Organism 1
Single Data Store PostgreSQL, MySQL,
MongoDB, ElasticSearch
Apollo v2.0
Load genomic evidence for selected organism
Single Data Store PostgreSQL, MySQL,
MongoDB, ElasticSearch
A single, queryable datastore houses annotations NEW!
Apollo v2.0
HIGHLIGHTED IMPROVEMENTSscalability
SCALABILITY 10
• Improvements to architecture: easier deployment, be]er documentaJon
• Supports mulJple organisms per server, improved comparaJve tools
• Easier to query the data and build extensions • More flexible user interface via removable side-‐dock with customizable tabs;
be]er search funcJonality, validaJon checks, and ediJng capability • Allows larger set of sequence annotaJons based on the Sequence Ontology • Offers fine-‐grained user and group level permissions
HIGHLIGHTED IMPROVEMENTSremovable side dock with customizable tabs
HIGHLIGHTED IMPROVEMENTS 11
Tracks Organism Users Groups Preferences Annotations Reference Sequence
HIGHLIGHTED IMPROVEMENTSannotation details, exon boundaries, data export
HIGHLIGHTED IMPROVEMENTS 12
Annotations Reference Sequences
1 2 3
1
2
3
HIGHLIGHTED IMPROVEMENTSvisible in the Apollo window
HIGHLIGHTED IMPROVEMENTS 13
AutomaJcally calculates upstream and downstream acceptor and donor sites.
OTHER IMPROVEMENTSbehind the scenes
OTHER IMPROVEMENTS 14
h]ps://github.com/GMOD/Apollo
APOLLOdemonstration
DEMO 15
See Apollo DemonstraJon Video at: h]ps://youtu.be/VgPtAP_fvxY
COLLABORATIONSApollo is open-source and extensible
HIGHLIGHTED IMPROVEMENTS 16
The Genome Sequence Annotation Server (GenSAS) Annotate
Examples: • GenSAS
whole-‐genome structural annotaJon pipeline.
• i5K Workspace@NAL space to display and share genome assemblies & gene models, and conduct manual annotaJon efforts.
Apollo users can add so.ware to support their own workflow.
FUTURE PLANScurrently working on
Footer 17
JOIN US
Footer 18
h]p://GenomeArchitect.org/
Nathan Dunn Apollo Technical Lead
Please bring your suggesJons, requests, and contribuJons to:
Special Thanks to: Stephen Ficklin
GenSAS, Washington State University
Deepak Unni Colin Diesh
Apollo Developers, University of Missouri
Suzi Lewis Principal InvesJgator
BBOP
Eric Yao JBrowse, UC Berkeley
• Berkeley Bioinforma9cs Open-‐source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).
• § Chris5ne G. Elsik (PI). University of Missouri.
• * Ian Holmes (PI). University of California Berkeley.
• Arthropod genomics community: i5K Steering Commi]ee (esp. Sue Brown (Kansas State)), Alexie Papanicolaou (UWS), BGI, Oliver Niehuis (1KITE h]p://www.1kite.org/), and the Honey Bee Genome Sequencing ConsorJum.
• Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI; by Contract No. 60-‐8260-‐4-‐005 from the NaJonal Agricultural Library (NAL) at the United States Department of Agriculture (USDA); and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-‐AC02-‐05CH11231.
• Insect images used with permission: h]p://AlexanderWild.com
• For your aAen9on, thank you! Thank you. 19
Web Apollo
Nathan Dunn
Colin Diesh §
Deepak Unni §
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Web Apollo: h]p://GenomeArchitect.org
i5K: h]p://arthropodgenomes.org/wiki/i5K
GO: h]p://GeneOntology.org
Thanks!
NAL at USDA
Monica Poelchau
Christopher Childers
Gary Moore
HGSC at BCM
fringy Richards
Dan Hughes
Kim Worley
JBrowse Eric Yao *