Top Banner
Computations using Computations using pathways and networks pathways and networks Nigam Shah [email protected]
36
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Computations using pathways Computations using pathways and networksand networks

Nigam [email protected]

Page 2: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

THE GOAL = MAKING SENSE OF THE GOAL = MAKING SENSE OF HIGH THROUGHPUT DATAHIGH THROUGHPUT DATA

Page 3: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

High throughput dataHigh throughput data

• “high throughput” is one of those fuzzy terms that is never really defined anywhere

• Genomics data is considered high throughput if:• You can not “look” at your data to interpret it• Generally speaking it means ~ 1000 or more genes and

20 or more samples.• There are about 40 different high throughput

genomics data generation technologies.• DNA, mRNA, proteins, metabolites … all can be

measured

Page 4: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

How does ontology help?How does ontology help?

• An ontology provides a organizing framework for creating “abstractions” of the high throughput data

• The simplest ontologies (i.e. terminologies, controlled vocabularies) provide the most bang-for-the-buck• Gene Ontology (GO) is the prime example

• More structured ontologies – such as those that represent pathways and more higher order biological concepts – still have to demonstrate real utility.

Page 5: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Gene Ontology to analyze microarray data

Page 6: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Using GO annotationsUsing GO annotations

Page 7: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Descriptions built by connecting/linking ontology Descriptions built by connecting/linking ontology termsterms

Biologists interpret a list of genes and form a result statement such as:

The photosynthesis genes located in the chloroplast are repressed in response to ozone stress and have the ABRE binding site enriched in their promoters.

Page 8: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

……more structuremore structure

?<link>?

<Some MF> in <Some BP>

OBOL

Relations Ontology

OBOL

Relations Ontology

Page 9: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Between-ontology structureBetween-ontology structure

Page 10: Computations using pathways and networks Nigam Shah nigam@stanford.edu.
Page 11: Computations using pathways and networks Nigam Shah nigam@stanford.edu.
Page 12: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

… … more structure [beyond GO]: PATOmore structure [beyond GO]: PATO

The building blocks of phenotype descriptions: EQEntity (bearer) such as spermatocyte, wingQuality (property, attribute)

- a kind of dependent continuant Formally, an EQ description defines:

- a Quality which inheres_in a bearer entity

The building blocks are combined according to the Pheno-syntax

www.fruitfly.org/~cjm/formats

Page 13: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Semantically structured annotationsSemantically structured annotations

1. Relationship ontology 2. Mouse Pathology ontology 3. Tissue/Organ 4. Gene ontology

mRNA of genes encoding proteins with mf in bp at cc is increased in sample-id which shows some pathology in some tissue in some organ

Basal layer of organ shows membranous staining

Queries enabled: 1. Identify all images with a specific pathology 2. Identify cases with pathology and some gene expression changes 3. Correlate changes biological processes with change in morphology

Discovery enabled: 1. Classify samples in expression space and “look” for histological changes that

correlate with it.

1. Relationship ontology 2. Mouse Pathology ontology 3. Tissue/Organ 4. Gene ontology

mRNA of genes encoding proteins with mf in bp at cc is increased in sample-id which shows some pathology in some tissue in some organ

Basal layer of organ shows membranous staining

Queries enabled: 1. Identify all images with a specific pathology 2. Identify cases with pathology and some gene expression changes 3. Correlate changes biological processes with change in morphology

Discovery enabled: 1. Classify samples in expression space and “look” for histological changes that

correlate with it.

WHY

HOW

Page 14: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Open Questions/ChallengesOpen Questions/Challenges

• Creation/acceptance of a systematic formalism for creating expressive annotations. (e.g. associated_with, involves)

• A generic tool that uses ontologies and allow the user to compose terms and cross ontology annotations• Easy term/annotation composition• Control the amount of alternative [compositional]

statements allowed

Page 15: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Pathways to analyze array data

Page 16: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

““Pathways” to analyze array dataPathways” to analyze array data

• The notion of a cancer signaling pathway can serve as an organizing framework for interpreting microarray expression data.

• On examining a relatively small set of genes based on prior biological knowledge about a given pathway, the analysis becomes more specific.

Page 17: Computations using pathways and networks Nigam Shah nigam@stanford.edu.
Page 18: Computations using pathways and networks Nigam Shah nigam@stanford.edu.
Page 19: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Reactome’s sky painterReactome’s sky painter

Page 20: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Operations on pathway resourcesOperations on pathway resourcesCustom code RDF + SPARQL OWL + SWRL

Verify a pathway resource Proofreading Reactome[1]

In progress In progress

Perform integrated querying of multiple pathway resources

Hard (“wrapper” approaches)

PKB[2]

Verify multiple pathway resources

Too hard (there are ~200)

Merge and compare multiple pathway resources

“Reason” over pathway resources

[1] A case study in pathway knowledgebase verification, BMC Bioinformatics 2006, 7:196[2] Pathway Knowledge Base: An Integrated pathway resource using BioPAX, Submitted to Applied Ontology

Page 21: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Merge and compare pathway resourcesMerge and compare pathway resources

• Given a set of ‘nodes’ and some ‘links’ among them, query multiple pathway sources and fill in the most plausible interactions between the nodes.• Plausible = not contradicted by existing data and knowledge

• Current pathway resources [in biopax] can not support this because, the manner in which ‘nodes’ are identified, the manner in which ‘links’ are identified is arbitrary.• Reactome has started to connect the pathway steps will GO

biological processes.

• BioPAX lets pathway sources “export” their nodes and links.• …but p53 in resource A is still different from P53 in resource B• … and Activate in resource A is still different from activates in

resource B

Page 22: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

ProblemProblem

• I have no clue what a pathway is!• A set or series of interactions, often forming a

network, which biologists have found useful to group together for organizational, historic, biophysical or other reasons.

• The complexity and abstraction represented in a pathway is decided by its author attempting to represent the interactions between a set of genes, proteins, and small molecules.

Page 23: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

“Networks” to analyze high throughput genomic data

Page 24: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Building networksBuilding networks

• Take a high throughput dataset

• Define a notion of ‘relatedness’ depending on the dataset• Co-expression for

microarray data• Co-occurance for literature

networks• …

• Enlist [node]--<link>--[node] pairs

• Find a good graph drawing program!

Page 25: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Nice hairball but …Nice hairball but …

From Long et al, in Trends in Biochemical Sciences, vol 32, no 7.

Srinivasan B, Snow R, Shah N and Batzoglou S in Interactome Networks conference @ CSHL

From Srinivasan et al, in Briefings in Bioinformatics August 2007.

Page 26: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Hypotheses/Models to analyze high throughput genomic data

Page 27: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Events and Implicit claimsEvents and Implicit claims

An hypothesis is a statement about relationships (among objects) within a biological system.

Protein P induces transcription of gene X

An ‘event’ is a relationship between two biological entities.

Implicit claims that can be tested:1. P is a transcription factor.2. P is a transcriptional

activator.3. P is localized to the nucleus.4. P can bind to the promoter

of gene X

promoter | gene X promoter | gene X PP

Page 28: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Representing Events ExplicitlyRepresenting Events ExplicitlyA hypothesis consists of at least one event stream

An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them.

An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened.

A logical joint is the conjunction between two event streams.

A hypothesis consists of at least one event stream

An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them.

An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened.

A logical joint is the conjunction between two event streams.

Page 29: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

User interfacesUser interfacesHypothesis described in

Natural Language

Biological process described in a formal language

Page 30: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Evaluating an hypothesisEvaluating an hypothesis

Page 31: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

n1 b1

n1 b1

A. Representation of an hypothesis in terms of events (ev = event)

B. Holding the mouse on a neighboring hypothesis (b1) shows what event was replaced to create it

C. Plot of the support versus conflicts for submitted and neighboring hypotheses (n1, b1). Clicking on the n1 submits that hypothesis as ‘seed’

Page 32: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

HyBrow: lessons learntHyBrow: lessons learnt

• The minimum requirement for a formal representation:• Ability to represent data information

Knowledge• A language to unambiguously express your

“thought experiment” (your model, hypothesis, theory, theorem etc)

• A reasoning framework to evaluate the outcome/ validity/accuracy of your thought experiment

• Project Home page: www.hybrow.org

Page 33: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Pathways as “models”?Pathways as “models”?

• Pathways are assumed to be models representing biological processes, without actually knowing the modeling formalism in which the model is valid.

• The ‘language’ of writing out a pathway doesn’t really have a grammar and/or a logic

• Most pathways end up being lists of heterogeneous sets of “steps” (in terms of the time of execution, the place of execution, the abstraction level, the kind of ‘thing’ passed along etc…)

• Lots of discussion on requirements of data providers, where are the users/consumers and their use cases?

Page 34: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

ClaimsClaims

• Pathways are useful only if they can serve as “models” [accurate representations] of a process• Hence whatever needs to be done to ensure that a pathway is a

valid model of at least one formalism should be required of the pathway author.

• A pathway representation that doesn’t solve the problem of uniquely identifying entities doesn’t solve the problem of integrating pathways.

• We just end up with marked up, structured information from multiple providers, without actually integrating anything.

Page 35: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Success of projects in the Biomedical domainSuccess of projects in the Biomedical domainHigh KR complexity

Minimal KR complexity

Minimal computational complexity

High computational complexity

Page 36: Computations using pathways and networks Nigam Shah nigam@stanford.edu.

Success of projects in the Biomedical domainSuccess of projects in the Biomedical domainHigh KR complexity

Minimal KR complexity

Minimal computational complexity

High computational complexity