SADI for GMOD: Semantic Web Services for Model Organism Databases Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson James Hogg Research Centre, Heart + Lung Institute University of British Columbia http://code.google.com/p/sadi/wiki/SADIforGMOD
19
Embed
SADI for GMOD: Semantic Web Services for Model Organism Databases
SADI for GMOD is a collection of ready-made SADI services for accessing sequence feature data in RDF form. The services were developed as an add-on for the GMOD (Generic Model Organism Database) project, which is a popular toolkit for building model organism databases and their associated websites (e.g. FlyBase).
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SADI for GMOD: Semantic Web Services for Model Organism Databases
Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson
James Hogg Research Centre, Heart + Lung InstituteUniversity of British Columbia
http://code.google.com/p/sadi/wiki/SADIforGMOD
Background
Background: Model Organism Databases
• several organisms are studied extensively by biologists: e.g. yeast, mouse, fruitfly
• each model organism has its own database:
• sequences (DNA, RNA, protein)
• sequence features (e.g. genes)
• research publications
• experimental results
• biochemical pathways
• phenotype images
• evolutionary trees (for closely related species)
All images were obtained from Wikipedia and are in the public domain.
Background: Sequence Features
position on DNA sequencepromoter track
gene track
transcript track
Lincoln Stein, http://www.sequenceontology.org/gff3.shtml
sequence features (a.k.a. sequence annotations) are regions of a DNA or protein sequence with a certain type (e.g. 'gene') in genome browsers, different types of sequence annotations are displayed in separate tracks
Background: Sequence Features
autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
Many types of biological data are represented as sequence features:
promoters chromosome bands genes transcripts CDSs proteins protein domains transposons non-coding RNAs ESTs many more...
Background: Distributed Annotation System (DAS)
autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/
HTTP GET
DAS XML
DAS Server
DAS Server
DAS Server
DAS Server
HTTP GET
DAS XML
HTTP GET
DAS XML
HTTP GET
DAS XML
Background: Limitations of the Distributed Annotation System (DAS)
integrating data from DAS servers requires specialized software (“DAS clients”)
other types of data (e.g. biochemical pathways, experimental results) cannot be automatically integrated with sequence feature data
most bioinformatics analysis software (e.g. BLAST) does not speak DAS
SADI for GMOD: Semantic Web Services for Model Organism
Databases
SADI for GMOD: Semantic Web Services for Model Organism Databases
SADI (Semantic Automated Discovery and Integration)
• Standard for Web services that consume/generate RDF• Motivation: automated integration of bioinformatics data and
software
GMOD (Generic Model Organism Database)
• Toolkit for building a model organism database and website
• Collection of related open source projects: e.g. Chado, Gbrowse, Pathway Tools
• Many sites use GMOD components: FlyBase, BeetleBase, DictyBase, etc.
SADI in a Nutshell• to invoke a SADI service:
o HTTP POST an RDF document to the service URLo e.g. $ curl --data @input.rdf http://sadiframework.org/examples/hello
• to get service metadata: o HTTP GET on service URLo returns an RDF document with service name, description, etc. o e.g. $ curl http://sadiframework.org/examples/hello
• structure of input/output data is described in OWLo service provider specifies one input OWL class and one output OWL class
• strengths of SADIo no framework-specific messaging formats or ontologieso supports batch processing of inputso supports long-running services (asynchronous services)
more info: http://sadiframework.org/
SADI for GMOD Services
• SADI services for accessing sequence feature data• implemented as Perl CGI scripts
Service Name Input Relationship Output
get_feature_info database identifier is about feature description
genomic coordinates overlaps
genomic coordinates is represented by
get_child_features feature description
get_parent_features feature description
get_features_overlapping_region
collection of feature descriptions
get_sequence_for_region
DNA, RNA, or amino acid sequence
has part / derives into
collection of feature descriptions
is part of / derives from
collection of feature descriptions
SADI for GMOD: Structure of Service Input/Output RDF
SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN
3. Download the SADI for GMOD tarball and unpack into cgi-bin
4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf
5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf