RKB – A Semantic Knowledge Base for RNA Michel Dumontier 1 , José Cruz- Toledo 1 Marc Parisien 2 , Francois Major 2 1 Carleton University 2 Université de Montreal
May 10, 2015
RKB – A Semantic Knowledge Base for RNA
Michel Dumontier 1, José Cruz-Toledo 1
Marc Parisien 2, Francois Major 2
1 Carleton University2 Université de Montreal
2Carleton University -- Dumontier Lab dumontierlab.com
Objectives
i. To represent biochemistry of nucleic acids and their structural characteristics including base pairing/stacking
ii. Represent context specific knowledge
iii. Capture the structural annotation generated by MC-Annotate
5/25/2009
3Carleton University -- Dumontier Lab dumontierlab.com
Guided design
• Modeling with Upper Level Ontologies– interoperability and semantic coherency– New Upper Level Ontology (NULO)
• distinguishes objects, qualities, roles, processes and spatial regions
• Based on BFO/RO, but for OWL
5/25/2009
4Carleton University -- Dumontier Lab dumontierlab.com
• Objects– Occupy space
• Nucleic acids, nucleotides, riboses and phosphates
• Qualities– Intrinsic categorical or numeric valued property
• Nucleotide bears the quality of conformation
• Roles– Defined by extrinsic interactions
• A C3’ atom may hold the exo role during some sugar puckering
• Processes– Entities that extend in time
• structure determination, an interaction
5/25/2009
Biological Modeling
5Carleton University -- Dumontier Lab dumontierlab.com
Contextual Modeling of Nucleic Acids
• Base stacking varies in different XRD/NMR models• Need to know in which model that info is found• We want to set the stage for representing simulation.
5/25/2009
6Carleton University -- Dumontier Lab dumontierlab.com
RKB populated with PDB, MC-Annotate
• The ontology population involved 3 steps:
i. Assigning names
ii. Asserting class membership
iii. Assigning relations between entities
• The following naming convention was used:– Objects:
• Polymer: PDBID_cCHAIN• Residue: PDBID_cCHAIN_rRESIDUE• Atom: PDBID_cCHAIN_rRESIDUE_aAtom
– Quality/Roles• PDBID_mMODEL_cCHAIN_rRESIDUE_type
– Processes• Structure determination: PDBID_mMODEL• Interaction: PDBID_mMODEL_PROCESSTYPE_PARTICIPANT
5/25/2009
7Carleton University -- Dumontier Lab dumontierlab.com
Support for Leontis-Westhof Nomenclature
5/25/2009
• The RKB incorporates LW nomenclature • Describes the three edges for H-bonding
interactions in purines (Y) and pyrimidines (R)• Atom composition:
i. Watson-Crick Edge:• A(N6)/G(O6), R(N1), A(C2)/G(N2),
U(O4)/C(N4), Y(N3) and Y(O2)
ii. Hoogsteen Edge (CH edge for R):• A(N6)/G(O6), R(N7), U(O4)/C(N4) and
Y(C5)
iii. Sugar Edge:• A(C2)/G(N2), R(N3), Y(O2) and O2’
• cis and trans orientations • relative orientations of the glycosidic bond
between the sugar and the PO4 group
8
Support for LW+ Nomenclature
• Extension incorporates faces to each edge:
– WC edge:• Wh, Ww and Ws faces
– Hoogsteen Edge:• C8(Y), Hh, Hw and Bh
– Sugar Edge:• Bs, Ss(Y), Sw and O2’
• The Bh and Bs faces involve the Hoogsteen side amino/keto group and the sugar side amino/ keto group respectively.
• The C8 face was introduced for the C8-H8 donor group in purines
5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com
9
Describing Base Pairs
• Base pairs composed of interactions with the edges or faces of the interacting bases
• Role chains capture additional knowledge:
Objects that participate in sub-processes (face interactions) are also participants of the process whole (base pair)
hasPart ◦ hasParticipant -> hasParticipant
Objects are involved in processes when their qualities are
isBearerOf ◦ isParticipantIn -> isParticipantIn
5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com
10Carleton University :: Dumontier Lab :: dumontierlab.com
The RKB is compatible with both the LW and the Saenger nomenclature for base pairs
• The semantics of the RKB enables the usage of consistent bp naming schemes
• The AA BP in model 4 of PDB:1B36 can be classified as the being member of the following classes:– Saenger type II– LW Trans Hoogsteen/Hoogsteen (8)
5/25/2009
A A
NucleotideBasePairand ParallelBasePairand TransBasePairand HoogsteenHoogsteenBasePairand hasAgent exactly 2 AMP
11Carleton University -- Dumontier Lab dumontierlab.com
Sugar Puckering
• The ribose ring presents two distinct puckering modes, envelope and twist
• The classification into either geometry is dependent on the relative position of the carbon atoms of the ribose to its C5’ atom
• Carbon atoms in a ribose thus bear either the endo or exo role with respect to the plane formed by the other atoms
5/25/2009
12
Sugar Puckering (cont’d)
Our implementation of situational modeling assures that objects are represented by a single entity throughout their lifetime, thus avoiding the need to create multiple distinct instances of the same object in each particular spatial-temporal context with different attributes
5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com
13
RKB is SPARQL accessible
• SPARQL is a graph query language• Loaded instantiated ontology into Virtuoso 6
• SPARQL endpoint– http://codemonkey.dumontierlab.com/sparql/
• Specify Graphs to restrict search– http://semanticscience.org/rkb/mcannotate/pdb/dna– http://semanticscience.org/rkb/mcannotate/pdb/rna
5/25/2009Carleton University :: Dumontier Lab ::
dumontierlab.com
14
Query 1: Find all face interactions (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>
select distinct ?faceInteraction where {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .}
5/25/2009Carleton University :: Dumontier Lab ::
dumontierlab.com
Nucleotide base pairs are composed of one or more face interactions. Where known, such as in the MC-Annotate results, we can retrieve all 18 instances of this that satisfy this query.
Carleton University -- Dumontier Lab dumontierlab.com 155/25/2009
See results : http://tinyurl.com/porxdb
16Carleton University :: Dumontier Lab :: dumontierlab.com
Query 2: Find all C8 mediated base pairs (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>SELECT DISTINCT ?faceInteraction ?residue ?hasC8Face where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?C8Face ss:isAgentIn ?faceInteraction . ?C8Face rdf:type ss:C8Face . ?residue ss:hasQuality ?C8Face
}
Results: http://tinyurl.com/r7b5e4
5/25/2009
Face interactions are mediated by the faces of bases. Nucleotides and their face qualities are related by the hasQuality relation, whereas faces are agents in the face interaction, and are related by the hasAgent relation.
17Carleton University :: Dumontier Lab :: dumontierlab.com
Query 3: Find base pairs involving a GMP sugar-sugar face (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>
SELECT distinct ?faceInteraction ?residue ?hasSSFace WHERE {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .?hasSSFace rdf:type ss:SugarSugarFace .?hasSSFace ss:isAgentIn ?faceInteraction .?residue ss:hasQuality ?hasSSFace .?residue rdf:type ss:GMP}
Results found at: http://tinyurl.com/qpup8z
5/25/2009
This query builds on Query 2, in that it requires a Ss face to be on an AMP that is participating in a base pair. Two GMPs are found to have this particular face participating with other nucleotides in base pairs in this particular structure
18Carleton University :: Dumontier Lab :: dumontierlab.com
Query 4: Find Hoogsteen – O2’ face interactions (model 1 of PDB:1B36)
PREFIX ss: <http://semanticscience.org/>SELECT distinct ?faceInteraction ?residue1 ?residue2 ?hasHhFace ?hasO2pFace where {?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> .?pair ss:hasProperPart ?faceInteraction .?faceInteraction rdf:type ss:FaceInteraction .?hasHhFace rdf:type ss:HoogsteenHoogsteenFace .?hasHhFace ss:isAgentIn ?faceInteraction .?hasO2pFace rdf:type ss:O2pFace .?hasO2pFace ss:isAgentIn ?faceInteraction .?residue1 ss:hasQuality ?hasHhFace .?residue2 ss:hasQuality ?hasO2pFace}
Results found at: http://tinyurl.com/oo4fp8
5/25/2009
LW+ nomenclature more detailed for base interactions. The result of this query describes a single base pair in this structure.
19Carleton University -- Dumontier Lab dumontierlab.com
Future Directions
• Specify Saenger nomenclature • Map other structural annotator output (e.g. 3DNA)• Extend structural knowledge with 6 backbone angles
– range restrictions on classes
• SWRL / DL-safe rules or SPARQL query required to specify cyclic motifs
• Publish as part of Bio2RDF network
5/25/2009
20Carleton University -- Dumontier Lab dumontierlab.com
RKB Availability
• Creative Commons License.• Google Code Project:
– http://semanticscience.org
• Instructions: http://code.google.com/p/semanticscience/wiki/RKBDownload
5/25/2009
21Carleton University -- Dumontier Lab dumontierlab.com
References
• Dumontier, M., et al. (2009). RKB: A Semantic Web Knowledge Base for RNA, Accepted in Bio-Ontologies 2009, Stockholm, Sweden
• Smith, B., et al. (2005). Relations in biomedical ontologies. Genome Biol, 6(5): p. R46
• Leontis, N. B. and E. Westhof (2001). Geometric nomenclature and classification of RNA base pairs. RNA, 7(4): 499-512.
• Lemieux, S. and F. Major. (2002). RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire. Nucleic Acids Res, 30(19): p. 4250-63.
• Major, F., Thibault, P., Computer Modeling of RNA Three-Dimensional Structures, in Encyclopedia of Molecular Cell Biology and Molecular Medicine, R.A. Meyers, Editor. 2005, Wiley-VCH Verlag GmbH & Co.: Weinheim. p. 605-636.
5/25/2009