Accurate biochemical knowledge starting with precise structure- based criteria for molecular identity Michel Dumontier, Ph.D. Assistant Professor of Bioinformatics Department of Biology, School of Computer Science Institute of Biochemistry, Ottawa Institute of Systems Biology Carleton University 01/04/2009 1 NCBO Seminar Series::Michel Dumontier
37
Embed
Michel Dumontier , Ph.D. Assistant Professor of Bioinformatics
Accurate biochemical knowledge starting with precise structure-based criteria for molecular identity. Michel Dumontier , Ph.D. Assistant Professor of Bioinformatics Department of Biology, School of Computer Science Institute of Biochemistry, Ottawa Institute of Systems Biology - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Accurate biochemical knowledge starting with precise structure-based
criteria for molecular identity
Michel Dumontier, Ph.D.Assistant Professor of Bioinformatics
Department of Biology, School of Computer ScienceInstitute of Biochemistry, Ottawa Institute of Systems Biology
Carleton University
01/04/20091 NCBO Seminar Series::Michel Dumontier
Problem Statement (I)
• Although biochemical events can be described with reference to specific chemical substances, we may want to describe them at finer/grainier levels of (mereological) granularity.– residue : post translational modification
– collection of residues : motif/domain/interaction site
– atom : atomic interactions, catalytic mechanism
– collection of atoms : binding/catalytic site, interaction
• This requires identifiers for parts, regions (contiguous and non-contiguous), aggregates/complexes.
• However, we do not (AFAIK) have a precise (reproducible) methodology to automatically generate these!
01/04/20092 NCBO Seminar Series::Michel Dumontier
Bio2RDF: 2.3B triples of SPARQL-accessible linked biological data!
Chemical Parts!
Case Study: HIF1αHypoxia-Inducible Factor 1, alpha chain (uniprot:Q16665)
Master transcriptional regulator of the adaptive response to hypoxia
• Under normoxic conditions, HIF1α is hydroxylated on Pro-402
and Pro-564 in the oxygen-dependent degradation domain (ODD) by EGLN1/PHD1 and EGLN2/PHD2. EGLN3/PHD3 has also been shown to hydroxylate Pro-564. The hydroxylated prolines promote interaction with VHL, initiating rapid ubiquitination and subsequent proteasomal degradation.
• These are structurally different• Each exhibits distinct functionality!
• Yet most databases (Uniprot/Genbank) don’t have separate identifiers for them
• Reactome has an internal identifier for referring to different forms, but links to Uniprot entries and doesn’t provide an explicit description of the structure that it corresponds to!
01/04/20096 NCBO Seminar Series::Michel Dumontier
So
We have a clear need for being able to refer to distinct biochemical entities, based at least on their structure.
We also need to refer to arbitrary structural parts.
Should we generate all the combinations a priori???
NO!!
Should we be able to automatically generate the identifier from the structural attributes?
-> YES!!!
Should we semantically annotate (manually or otherwise) those forms known to be involved in specific processes???
-> YES!!!
What identifiers are unique for a given structure?
01/04/20097 NCBO Seminar Series::Michel Dumontier
InChI• IUPAC International Chemical Identifier (InChI)• A data string that provides
1. the structure of a chemical compound
2. the convention for drawing the structure
• Different compounds must have different identifiers. Several attributes can be used to distinguish one compound from another. – chemical graph (connection table) – Formula– Atom type (only some atoms explicit)– Bond type– Stereochemistry– Mobile/fixed H-bonds (tautomers)– Isotopic composition– Atomic charge
Chemical Knowledge for the Semantic Web.Mykola Konyk, Alexander De Leon, and Michel Dumontier. LNBI. 2008. 5109:169-176. Data Integration in the Life Sciences (DILS2008). Evry. France.
Knowledge of functional groups is important in chemical synthesis, pharmaceutical design and lead optimization.
Functional groups describe chemical reactivity in terms of atoms and their connectivity, and exhibits characteristic chemical behavior when present in a compound.
Describing chemical functional groups in OWL-DL for the classification of chemical compounds
N Villanueva-Rosales, MDumontier. 2007. OWLED, Innsbruck, Austria.
But what if we have a modification that isn’t contained in the ontology!
• No problem... define your own term, with the corresponding structural description (InChi, SMILES), and add to an ontology document...– If you’re using OWL, you can add the import
statement and publish it.
• And, of course, you should submit it to the appropriate ontology development teams. (and later make it equivalent to)
So what if...we describe the structural features of the molecule with OWL (sequence + PTMs), and generate an identifier from one of its serializations (RDF/XML?)
that way we have the explicit description as the identifier in a form that is compatible with the semantic web.
Uniprot example revisitedUnder normoxic conditions, HIF1α is hydroxylated on Pro-402
and Pro-564 in the oxygen-dependent degradation domain (ODD) by EGLN1/PHD1 and EGLN2/PHD2. The hydroxylated prolines promote interaction with VHL, initiating rapid ubiquitination and subsequent proteasomal degradation
.
:A rdfs:subClassOf :Hydroxylation:A hasParticipant (:0#r402 and :Substrate):A hasParticipant (:1#r402 and :Product):A hasParticipant (:5 and :Enzyme)
:B rdfs:subClassOf :Interaction:B :hasParticipant (:2#r402 or :3#r564 or :4#r402,r564):B :hasParticipant (:6)
• We need a precise method to generate identifiers for biopolymers and arbitrary sets of their parts.
• Consistent identifier generation will allow anybody to specify findings according to the biopolymers for which it was observed, whether it exists in a database or not, and will allow us to link biochemical knowledge at finer levels of granularity.
• (at least) two identifier schemes were put forward to initiate discussion, with the goal of setting a standard naming convention.