SADI for GMOD: Semantic Web Services for Model Organism Databases

Post on 12-Jan-2015

2866 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

SADI for GMOD is a collection of ready-made SADI services for accessing sequence feature data in RDF form. The services were developed as an add-on for the GMOD (Generic Model Organism Database) project, which is a popular toolkit for building model organism databases and their associated websites (e.g. FlyBase).

Transcript

SADI for GMOD: Semantic Web Services for Model Organism Databases

Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson

James Hogg Research Centre, Heart + Lung InstituteUniversity of British Columbia

http://code.google.com/p/sadi/wiki/SADIforGMOD

Background

Background: Model Organism Databases

• several organisms are studied extensively by biologists: e.g. yeast, mouse, fruitfly

• each model organism has its own database: 

• sequences (DNA, RNA, protein)

• sequence features (e.g. genes)

• research publications

• experimental results

• biochemical pathways

• phenotype images

• evolutionary trees (for closely related species)

All images were obtained from Wikipedia and are in the public domain.

Background: Sequence Features

position on DNA sequencepromoter track

gene track

transcript track

Lincoln Stein, http://www.sequenceontology.org/gff3.shtml

sequence features (a.k.a. sequence annotations) are regions of a DNA or protein sequence with a certain type (e.g. 'gene') in genome browsers, different types of sequence annotations are displayed in separate tracks

Background: Sequence Features

autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/

Many types of biological data are represented as sequence features:

promoters chromosome bands genes transcripts CDSs proteins protein domains transposons non-coding RNAs ESTs many more...

Background: Distributed Annotation System (DAS)

autogenerated image from http://flybase.org/cgi-bin/gbrowse/dmel/

HTTP GET

DAS XML

DAS Server

DAS Server

DAS Server

DAS Server

HTTP GET

DAS XML

HTTP GET

DAS XML

HTTP GET

DAS XML

Background: Limitations of the Distributed Annotation System (DAS)

 integrating data from DAS servers requires specialized software (“DAS clients”)

 other types of data (e.g. biochemical pathways, experimental results) cannot be automatically integrated with sequence feature data

 most bioinformatics analysis software (e.g. BLAST) does not speak DAS

SADI for GMOD: Semantic Web Services for Model Organism 

Databases

SADI for GMOD: Semantic Web Services for Model Organism Databases

SADI (Semantic Automated Discovery and Integration)

• Standard for Web services that consume/generate RDF• Motivation: automated integration of bioinformatics data and 

software 

GMOD (Generic Model Organism Database)

• Toolkit for building a model organism database and website

• Collection of related open source projects: e.g. Chado, Gbrowse, Pathway Tools  

• Many sites use GMOD components: FlyBase, BeetleBase, DictyBase, etc. 

SADI in a Nutshell• to invoke a SADI service:

o HTTP POST an RDF document to the service URLo e.g. $ curl --data @input.rdf http://sadiframework.org/examples/hello

• to get service metadata:  o HTTP GET on service URLo returns an RDF document with service name, description, etc. o e.g. $ curl http://sadiframework.org/examples/hello

• structure of input/output data is described in OWLo service provider specifies one input OWL class and one output OWL class

• strengths of SADIo no framework-specific messaging formats or ontologieso supports batch processing of inputso supports long-running services (asynchronous services)

more info: http://sadiframework.org/

SADI for GMOD Services

• SADI services for accessing sequence feature data• implemented as Perl CGI scripts

Service Name Input Relationship Output

get_feature_info database identifier is about feature description

genomic coordinates overlaps

genomic coordinates is represented by

get_child_features feature description

get_parent_features feature description

get_features_overlapping_region

collection of feature descriptions

get_sequence_for_region

DNA, RNA, or amino acid sequence

has part / derives into

collection of feature descriptions

is part of / derives from

collection of feature descriptions

SADI for GMOD: Structure of Service Input/Output RDF

@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .

GeneID:49962 a lsrn:GeneID_Record; sio:SIO_000008 [ # p = 'has attribute' a lsrn:GeneID_Identifier; sio:SIO_000300 "49962" # p = 'has value' ] .

@perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .@prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?id=> .@prefix GenBank: <http://lsrn.org/GB:> .

# p = 'is about'GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 .

# feature

FlyBase:FBgn0040037 a SO:SO_0000704 . # o = 'gene' range:position [ a range:RangedSequencePosition; sio:SIO_000053 . # p = 'has proper part' [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = 'has proper part' [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] .

_:minus_strand_seq sio:SIO_000011 [ # p = 'represents' a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = 'is proper part of' ] .

# reference feature (chromosome)

FlyBase:4 # chromosome 4 a SO:SO_0000105 . # o = 'chromosome arm'

Input RDF (N3) Output RDF (N3)

get_feature_info

HTTP POST

SADI for GMOD Demo

SADI Client Software

SADI Taverna PluginSHARE Query Engine

http://biordf.net/cardioSHARE/query

SPARQL Query => SADI Workflow Design SADI Workflows

http://sadiframework.org/content/2010/05/03/sadi-taverna-plugin-tutorial/

Demo with SHARE Query Engine

SPARQL Query SADI Workflow

"What proteins are homologous to FlyBase protein FBpp0091047?"

PREFIX FlyBase: <http://lsrn.org/FLYBASE:>PREFIX sio: <http://semanticscience.org/resource/>PREFIX sadi: <http://sadiframework.org/ontologies/properties.owl#>

SELECT ?homologWHERE { # SIO_000332 = 'is about' FlyBase:FBpp0091047 sio:SIO_000332 ?protein . ?protein sadi:hasSequence ?sequence .

# SIO_010302 = 'is homologous to' ?protein sio:SIO_010302 ?homolog .

}

Acknowledgements

  TeamMark Wilkinson: Principal InvestigatorLuke McCarthy: Lead Programmer, SADI & SHAREEdward Kawas: Perl Programmer, SADI

Funding

MicrosoftResearch

http://sadiframework.org/

SADI Training Course

 

“Web Publishing of Scientific Data and Services”October 22nd-23rd, 2011

University of British Columbia (next door!)

Learn how to:

=> semantically describe service functionality in OWL=> publish Semantic Web services using the SADI framework

More info: http://sadiframework.org/training

Extra Slides

[GENERAL]db_adaptor = Bio::DB::SeqFeature::Storedb_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybasebase_url = http://flybase.org/cgi-bin/sadi.gmod/

SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN

3. Download the SADI for GMOD tarball and unpack into cgi-bin

4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf

5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf

[DBXREF_TO_LSRN]SwissProt = UniProtUniProtKB = UniProtSwissProt/TrEMBL = UniProt...

6. Register the services in public SADI registry: http://sadiframework.org/registry

more info: http://code.google.com/p/sadi/wiki/SADIforGMOD

top related