Top Banner
ALGORITHMIC & ANALYTICAL METHODS FOR FUNCTIONAL CHARACTERIZATION OF MOLECULAR INTERACTION NETWORKS Mehmet Koyutürk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey, Wojciech Szpankowski, and Ananth Grama
31

Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

Jan 02, 2016

Download

Documents

Rose Horn

ALGORITHMIC & ANALYTICAL METHODS FOR FUNCTIONAL CHARACTERIZATION OF MOLECULAR INTERACTION NETWORKS. Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey, Wojciech Szpankowski, and Ananth Grama. OUTLINE. Biological motivation - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

ALGORITHMIC & ANALYTICAL METHODS

FOR FUNCTIONAL CHARACTERIZATION OF

MOLECULAR INTERACTION NETWORKS

Mehmet Koyutürk

PURDUE UNIVERSITYDEPARTMENT OF COMPUTER SCIENCE

Joint work with Jayesh Pandey, Wojciech Szpankowski, and Ananth Grama

Page 2: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

OUTLINE

Biological motivation Gene regulation, molecular annotation, pathway

annotation

Formal framework Functional attribute networks: Multigraph model

Algorithmic challenges Statistical interpretability, non-monotonicity

Statistical model Conditioning on building blocks to emphasize modularity

Resulting tool NARADA, algorithms, implementation, results

Page 3: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

OUTLINE

Biological motivation Gene regulation, molecular annotation, pathway

annotationFormal framework

Functional attribute networks: Multigraph modelAlgorithmic challenges

Statistical interpretability, non-monotonicityStatistical model

Conditioning on building blocks to emphasize modularity

Resulting tool NARADA, algorithms, implementation, results

Page 4: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

GENE REGULATION

Gene expression is the process of synthesizing a functional protein coded by the corresponding gene

Genes (& their products) regulate (promote / suppress) the extent of each other’s expression

Any step of gene expression can be modulated Transcription, translation, post-transcriptional

modification, RNA transport, mRNA degradation…

Negative ligand independent transcriptional regulation

at chromatin level

Page 5: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

GENE REGULATORY NETWORKS

Abstraction: organization of regulatory interactions in the cell Genes are nodes, regulatory interactions are directed

edges Boolean network model: Edges are signed, indicating

up- (promotion) and down-regulation (supression)

GeneUp-regulation

Down-regulation

Flowering time in Arabidopsis

Page 6: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

MOLECULAR ANNOTATION

Similar systems that involve different molecules (genes, proteins) in different species

Functional annotation of genes provides a unified understanding of the underlying principles

Gene Ontology: A library of molecular annotation Molecular function: What is the role of a gene? Biological process: In which processes is a gene involved? Cellular component: Where is a gene’s product localized?

We refer to each annotation class as a functional attribute

Page 7: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

FROM MOLECULES TO SYSTEMS

Networks are species-specificAnnotation is at the molecular levelMap networks from gene space to function

space Can generate a library of annotated (sub-) networks

Network of Gene Ontology terms based on significance

of pairwise interactions in

S. cerevisiae Synthetic Gene Array (SGA) network

(Tong et al., Science, 2004)

Page 8: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

INDIRECT REGULATION

g1

g2

g3

g4

g5

g6

g1

g2

g3

g4

g5

g6

Assessment of pairwise interactions is simple, but not adequate

Page 9: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

OUTLINE

Biological motivation Gene regulation, molecular annotation, pathway annotation

Formal framework Functional attribute networks: Multigraph

model Algorithmic challenges

Statistical interpretability, non-monotonicity Statistical model

Conditioning on building blocks to emphasize modularity Resulting tool

NARADA, algorithms, implementation, results

Page 10: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

FUNCTIONAL ATTRIBUTE NETWORK

Multigraph model A gene is associated with multiple functional attributes A functional attribute is associated with multiple genes Functional attributes are represented by nodes Genes are represented by ports, reflecting context

Functional attribute networkGene network

g1

g2

g3

g4

g5

g6

Page 11: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

FREQUENCY OF A MULTIPATH

A pathway of functional attributes occurs in various contexts in the gene network Multipath in the functional attribute network

Frequency of multipath ?

4 0

Page 12: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

SIGNIFICANCE OF A PATHWAY

We want to identify multipaths with unusual frequency These might correspond to modular pathways

Frequency alone is not a good measure of statistical significance The distribution of functional attributes among genes

is not uniform The degree distribution in the gene network is highly

skewed Pathways that contain common functional attributes

have high frequency, but they are not necessarily interesting

Page 13: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

OUTLINE

Biological motivation Gene regulation, molecular annotation, pathway

annotation

Formal framework Functional attribute networks: Multigraph model

Algorithmic challenges Statistical interpretability, non-monotonicity

Statistical model Conditioning on building blocks to emphasize modularity

Resulting tool NARADA, algorithms, implementation, results

Page 14: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

STATISTICAL INTERPRETABILITY

We are interested in identifying statistically over-represented patterns Null hypothesis: the pattern is sparse Additional positive observation => more significance Additional negative observation => less significance

BA

P(B) < P(A)

B’

P(B’) > P(A)

Page 15: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

MONOTONICITY

Frequency is a monotonic measure If a pathway is frequent, then all of its sub-paths are

frequent Algorithmic advantage: enumerate all frequent patterns in

a bottom-up fashion Commonly exploited in traditional data mining

applications

Statistically interpretable measures are not monotonic! Statistical significance fluctuates in the search space Existing data mining algorithms do not apply Significance of pathways are non-monotonic in two

dimensions: GO Hierarcy & path space

Page 16: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

GO HIERARCHY

Functional attributes are organized in a hierarchical manner “regulation of steroid biosynthetic process” is a

“regulation of steroid metabolic process” and is part of “steroid biosynthetic process”

Statistically interpretable measures are not monotonic with respect to GO hierarchy A pattern corresponding to child may be more

significant or less significant than that corresponding to its parent

Common example: Identification of significantly enriched GO terms in a set of genes (Ontologizer, VAMPIRE)

Page 17: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

MONOTONICITY W.R.T. GO

g1

g2

g5

g3

g4

g1, g2, g3

g1, g2g4g3

GO DAG:

g1, g2, g4

Gene network

:

P( ) < P( ) < P( )

Page 18: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

PATHWAY LENGTH

P( ) > P( )

Open problems How can we effectively search in the pathway space,

where significance fluctuates? How can we find optimal resolution in functional

attribute space?

P( ) < P( )

Page 19: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

OUTLINE

Biological motivation Gene regulation, molecular annotation, pathway

annotationFormal framework

Functional attribute networks: Multigraph modelAlgorithmic challenges

Statistical interpretability, non-monotonicityStatistical model

Conditioning on building blocks to emphasize modularity

Resulting tool NARADA, algorithms, implementation, results

Page 20: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

STATISTICAL MODEL: INSIGHT

Emphasize modularity of pathways Condition on frequency of building blocks Evaluate the significance of the coupling of building blocks

g1

g2

g5g3

g4 g6

φ( )

= = 2 = 5φ( )

φ( )

φ( ) = 4

φ( ) =

P( )

=> P( ) <

g7

Page 21: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

STATISTICAL MODEL: FORMULATION

We denote each frequency random variable by Φ, their realization by φ

Significance of pathway π123 ( p123 ) P (Ф123≥φ123 |Ф12=φ12,Φ23= φ23,Φ1= φ1,Φ2= φ2,Φ3=

φ3)

Φ 1 Ф2 Ф3Φ12 Φ23

Φ123

π123:

Page 22: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

SIGNIFICANCE OF A PATHWAY

Assume that regulatory interactions are independent There are φ12 φ23 posible pairs of π 12 and π 23 edges

The probability that a pair of π 12 and π 23 edges go through the same gene (corresponds to an occurrence of π 123) is 1/φ2

The probability that at least φ123 of these pairs go through the same gene can be bounded by p123≤ exp(φ12φ23Hq(t)) where q = 1/φ2 and

t = φ123 / φ12φ23

Hq(t) = t log(q/t) +(1-t) log((1-q)/(1-t)) is divergence

Page 23: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

BASELINE MODEL

A single regulatory interaction is the shortest pathway Arbitrary degree distribution: The number of

edges leaving and entering each functional attribute is specified

Edges are assumed to be independent

The frequency of a regulatory interaction is a hypergeometric random variable Can derive a similar bound for the p-value of a single

regulatory interaction

Page 24: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

OUTLINE

Biological motivation Gene regulation, molecular annotation, pathway

annotation

Formal framework Functional attribute networks: Multigraph model

Algorithmic challenges Statistical interpretability, non-monotonicity

Statistical model Conditioning on building blocks to emphasize modularity

Resulting tool NARADA, algorithms, implementation, results

Page 25: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

ALGORITHMIC ISSUES

Significance is not monotonic Need to enumerate all pathways?

Strongly significant pathways A pathway is strongly significant if all of its building

blocks and their coupling are significant (defined recursively)

Allows pruning out the search space effectively

Shortcutting common functional attributes Transcription factors, DNA binding genes, etc. are

responsible for mediating regulation Shortcut these terms, consider regulatory effect of

different processes on each other directly

Page 26: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

NARADAhttp://www.cs.purdue.edu/homes/jpandey/narada/

A software for identification of significant pathways

Queries Given functional attribute T, find all significant pathways

that originate at T Given functional attribute T, find all significant pathways

that terminate at T Given a sequence of functional attributes T1, T2, …, Tk,

find all occurrences of the corresponding pathway

Identified pathways are displayed as a tree User can explore back and forth between the gene

network and the functional attribute network

Page 27: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

RESULTS

E. coli transcription network obtained from RegulonDB 3159 regulatory interactions between 1364 genes Using Gene Ontology, 881 of these genes are

mapped to 318 processes

Pathway length 2 3 4 5

All 427 580 1401 942

Strongly significant 427 208 183 142

Common terms shortcut 184 119 3 1

Page 28: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

MOLYBDATE ION TRANSPORT

Significant regulatory pathwaysthat originate at

molybdate ion transport

Their occurrences in the gene network

Page 29: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

WHAT IS SIGNIFICANT?

Molybdate ion transport regulates various processes directly Mo-molybdopterin cofactor biosynthesis, oligopeptide

transport, cytochrome complex assembly

It regulates various other processes indirectly Through DNA-dependent regulation of transcription, two-

component signal transduction system, nitrate assimilation Regulation of these mediator processes is not

significant on itself! NARADA captures modularity of indirect regulation!

Page 30: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

CONCLUSION

Mapping gene regulatory networks to functional attribute space demonstrates great potential Abstract, unified understanding of regulatory systems

Algorithmically, a wide range of new challenges Bounding interpretable statistical measures Handling resolution in functional attribute space Generalizing the definition of a pathway

Discovering new information Projecting identified “canonical” patterns on other

networks to discover new regulatory relationships

Page 31: Mehmet Koyut ü rk PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Joint work with Jayesh Pandey,

ACKNOWLEDGMENTS

Ananth Grama

Wojciech Szpankowski

Shankar Subramaniam

YohanKim

JayeshPandey