PENN S TATE Compatible text, visual and mathematical representations for biological process ontologies Nigam Shah Penn State University.
Post on 28-Mar-2015
218 Views
Preview:
Transcript
PENNSTATE
Compatible text, visual and mathematical representations for biological process
ontologies
Nigam Shah
Penn State University
PENNSTATE Ontologies in Molecular Biology
• An ontology is a formal way of representing knowledge.– In an ontology, concepts are described both by
their meaning and their relationship to each other.* • Gene Ontology• 43 open ontologies under OBO
– First name ‘things’ … then name ‘relations’.
• If we specify the ‘logic’ of combining ‘things’ and ‘relations’ we can write hypotheses about biological processes in a formal manner & evaluate them for consistency with existing information.
* Bard and Rhee, Nature Reviews Genetics, Vol 5, March 2004, pg 213
PENNSTATE Hypotheses and Events
An hypothesis about a biological process is a statement about relationships within a biological system.
Protein P induces transcription of gene X
We define an ‘event’ as a relationship between two biological entities, which we call ‘agents’.
PENNSTATE Testing events
Protein P induces transcription of gene X
promoter | gene X promoter | gene X
nucleusnucleus
PP
Implicit claims (that can made explicit):
1. P is a transcription factor.
2. P is a transcriptional activator.
3. P is localized to the nucleus.
4. P can bind to the promoter of gene X
PENNSTATE Hypothesis Ontology
• Expressive enough to describe the galactose system at a coarse level of detail.
• It is compatible with other ontology efforts.– E.g. GO so that GO annotations
can be used directly in HyBrow.
• We have also developed a grammar to write hypotheses using events from this ontology.
PENNSTATE Grammar for a hypothesis
A hypothesis consists of at least one event stream
An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them.
An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened.
A logical joint is the conjunction between two event streams.
PENNSTATE
Making Hypotheses with increasing ‘formality’
1. Controlled Vocabulary2. Formal Language3. Context-Free
Grammar
We have developed a formal language & grammar for representing an hypothesis as a sequence of events.
We use ‘constraints and rules’ to decide if an hypothesis is a valid production of the language.
The mathematical representation
A biological event is any occurrence for which we gather experimental data.
Hypotheses make testable statements about combinations of biological events.
http://conferences.computer.org/bioinformatics/CSB2003/SectA.html#Poster9
PENNSTATE Constraints and Rules
• Consistency of an hypothesis with prior knowledge is evaluated by applying constraints and rules.
• A constraint is a statement specifying the evidence that contradicts or supports an event.
• A protein must be in the nucleus to bind to a promoter.
• A rule comprises the ‘steps’ for deciding whether a constraint is satisfied or violated.
Binds_to_promoter [P, g]
:
Annotation constraintsif cellular location of P is not nucleus, give a penalty.if biological process is not transcription, give a penalty.
PENNSTATE A point-n-click interface
PENNSTATE Visual language representation
Uses a formal Visual Language:1. Direct composition of
hypotheses in a format akin to reaction pathway diagrams
2. Translatable to other representation forms
PENNSTATE Other notations:
Cook Notation -- BioD Kohn Notation
PENNSTATE Multiple ‘views’ of the ontology
• Once we have an ontology for hypotheses … it can be represented as
– Text files that users type.– As formal constructs that can be evaluated for validity in a
formal manner.– As files that are ‘browsed’ by using special programs.
• Having such equivalent formats allows us to perform computer aided hypothesis-evaluation.
PENNSTATE Multiple equivalent representations
Biological process described in a formal language
ev0 = Gal2p transports galactose in mem in wt
ev1 = galactose activate Gal3p in wt in cyt
ev2 = Gal3p Binds_to_promoter gal1 in wt in nuc
ev3 = Gal3p induce gal1 in presence_of galactose in wt in nuc
hy1 = (ev0+ev1) and (ev2+ev3)
XML format?
PENNSTATE Evaluating an hypothesis
Demo
Inference rules
Event Handler
Justification routines
Neighboring events generator
Hypothesis parser and ranking rules
Result formatter
Visual Widget
Hypothesis file
Browser
User
Database
PENNSTATE Screen shot of the output
n1 b1
C. Plot of the support verses conflicts for submitted and neighboring hypotheses (n1, b1). Clicking on the n1 submits that hypothesis as ‘seed’
A. Representation of an hypothesis in terms of events (ev = event)
B. Holding the mouse on a neighboring hypothesis (b1) shows what event was replaced to create it
n1 b1
C. Plot of the support verses conflicts for submitted and neighboring hypotheses (n1, b1). Clicking on the n1 submits that hypothesis as ‘seed’
A. Representation of an hypothesis in terms of events (ev = event)
B. Holding the mouse on a neighboring hypothesis (b1) shows what event was replaced to create it
PENNSTATE Credits
• Stephen Racunas– sar147@psu.edu
• Nina Fedoroff (Mentor)– nvf1@psu.edu
More on project website:
www.hybrow.org &
Aug 1st @ 11:10 AM.
top related