DAS/2: Next Generation Distributed Annotation System Gregg Helt Gregg Helt 1 , Steve Chervitz , Steve Chervitz 1 , Tony Cox , Tony Cox 2 , Andrew Dalke , Andrew Dalke 3 , Allen , Allen Day Day 4 , Ed Erwin , Ed Erwin 1 , Ed Griffiths , Ed Griffiths 2 , and Lincoln Stein , and Lincoln Stein 4 (1) Affymetrix, Inc. (2) Sanger Institute (3) Dalke Scientific; (4) Cold Spring Harbor Laboratory (5) University of Alabama
24
Embed
DAS/2: Next Generation Distributed Annotation System Gregg Helt 1, Steve Chervitz 1, Tony Cox 2, Andrew Dalke 3, Allen Day 4, Ed Erwin 1, Ed Griffiths.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DAS/2: Next Generation Distributed Annotation
SystemGregg HeltGregg Helt11, Steve Chervitz, Steve Chervitz11, Tony Cox, Tony Cox22, Andrew , Andrew DalkeDalke33, Allen Day, Allen Day44, Ed Erwin, Ed Erwin11, Ed Griffiths, Ed Griffiths22, and , and
Lincoln SteinLincoln Stein44
(1) Affymetrix, Inc.(2) Sanger Institute (3) Dalke Scientific;(4) Cold Spring Harbor Laboratory(5) University of Alabama
Distributed Annotation System (DAS) Overview
A specification designed for sharing genome annotations
Defines client requests and server responses
Simplified Web Services approach: HTTP GET, URLs, XML
Intended to be simple to implement
No central annotation authority
Intended to support client-side integration of annotations from different servers
First draft specification Spring 2000
Last major change to DAS1 was Spring 2002
Grant from NIH awarded June 2004 for development of next-generation DAS/2
DAS: Multiple Servers, Multiple Clients
Reference Server
AC003027AC005122M10154
Annotation Server Annotation Server
AC003027 M10154
WI1029 AFM820 AFM1126 WI443
AC005122
Annotation Server
Widespread Adoption of DAS/1
Server Implementations– Dazzle, ProServer, LDAS
Server sites– Ensembl, UCSC, TIGR, KEGG, WormBase, Affymetrix,
etc.
Clients– GBrowse, Ensembl, Dasty, IGB,
Libraries:– BioPerl, BioJava, JDAS
DAS Extensions– GeneDAS (non-positional annotations)– DAS web services registry– SPICE (protein structures)– DALEC (asynchronous analysis)
Ensembl is an ensemble of DAS servers
GBrowse on Ensembl
Distributed GBrowse
MyGBrowse
GBrowse 1
MODs
GBrowse 2
DAS
DAS
DAS
Ensembl UCSC
DAS Limitations
No ontology (controlled vocabulary) of feature types.– Is a “gene” from DAS server 1 the same as a
“gene” from DAS server 2?
Not particularly extensible.
Ambiguous semantics for retrieving features that overlap a range on the genome.
Development of DAS/2 Specification
Enhancements have largely been motivated by initial discussions on the DAS mailing list.– Series of RFCs collected– Though informal, still a long process!
Most recent DAS/2 draft specification is available at http://biodas.org/documents/das2/das2_protocol.html (tied to CVS repository), so anyone can review and comment
Feedback from the DAS developer and user communities will continue to guide future iterations of the DAS/2 specification
Keep track of overlap bounds of all previous queriesKeep track of overlap bounds of all previous queriesInstead of filter = “overlaps:S/x:y”, use filter = “overlaps:S/x:y; Instead of filter = “overlaps:S/x:y”, use filter = “overlaps:S/x:y; within:S/L:R”within:S/L:R”If annotation A not contained within L:R, then either:If annotation A not contained within L:R, then either:
i) bounds crosses L, in which case must overlap QueryLi) bounds crosses L, in which case must overlap QueryLii) bounds crosses R, in which case must overlap QueryRii) bounds crosses R, in which case must overlap QueryRiii) bothiii) both
Therefore if client has used this approach for all previous queries Therefore if client has used this approach for all previous queries (and restricts other filtering to single “type” filter), then for QueryC (and restricts other filtering to single “type” filter), then for QueryC no annotations will be returned that were already returned in a no annotations will be returned that were already returned in a previous queryprevious query
Solution #2: DAS/2 Validation Suite
Verify whether a DAS/2 server is compliant with the specification.– Critical for improving interoperability between clients and
servers developed by different groups.
Standalone tool and web application, written in Python– Enter a URL for a DAS/2 server– Get an HTML report about DAS/2 compliance
Reference dataset– Sequences and annotations that can be loaded into a DAS/2
server for additional validation of server implementation/configuration
Source code available at: http://sourceforge.net/projects/dasypus/
Implemented in Java in the Integrated Genome Browser– IGB (“ig-bee”) - A visualization app developed at Affymetrix – Supports data loading via a variety of formats and
mechanisms– Full implementation of DAS/2 read client, partial
implementation of DAS/2 writeback.
Handles large amounts of genome-scale data– Loads hundreds of thousands of sequence annotations at
once– Loads dense quantitative graphs with millions of data points– Maintains real-time responsiveness to user interactions– Includes features to support exploratory data analysis– Plugin architecture for customized extensions
Source code released under Common Public License– http://genoviz.sourceforge.net