Top Banner
Graph mining in bioinformatics Laur Tooming
15

Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Dec 26, 2015

Download

Documents

Jodie Perkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Graph mining in bioinformatics

Laur Tooming

Page 2: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Graphs in biology

• Graphs are often used in bioinformatics for describing processes in the cell

• Vertices are genes or proteins• The meaning of an edge depends on the type of

the graph – Protein-protein interaction– Gene regulation

Page 3: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

What we’re looking for

• We want to find sets of genes that have a biological meaning.

• Idea: find graph-theoretically relevant sets of vertices and find out if they are also biologically meaningful.

• Simple example: connected components• A more advanced idea: graph clustering.

Find subgraphs that have a high edge density.

Page 4: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Page 5: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Page 6: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Markov Cluster Algorithm (MCL)

• If there is cluster structure in a graph, random walks tend to remain in a cluster for a long time

• Graph modelled as a stochastic matrix: sum of entries in a column is 1

• aij - probability that randomly walking out of j will go to i on the next step

• Bigger edge weight means greater probability of choosing that edge

Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000. http://micans.org/mcl/

Page 7: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Markov Cluster Algorithm (MCL)

• Two procedures, inflation and expansion, are applied alternatively

• Expansion: matrix squaring– considers longer random walks

• Inflation: raising entries to some power, rescaling to remain stochastic– Weakens weak edges and strengthens strong

ones

• Converges to a steady state

Page 8: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Markov Cluster Algorithm (MCL)

Images from http://micans.org/mcl/ani/mcl-animation.html

Page 9: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Betweenness centrality clustering

• An edge between different clusters is on many shortest paths from one cluster to another.

• An edge inside a cluster is on less shortest paths, because there are more alternative paths inside a cluster.

• Betweenness centrality of an edge - the number of shortest paths in the graph containing that edge.

• Remove edges with the highest centrality from the graph to obtain clustering.

• Optimisations:– instead of all shortest paths, pick a sample of vertices

and calculate shortest paths from them– remove several edges at once

Page 10: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

GraphWeb• Web interface for analysing biological graphs

• Simple syntax for entering graphs– multiple datasets

– directed edges

– edge weights

• Visualising graphs with GraphViz• Finding biological meaning with g:Profiler

ds1: A > B 10

ds2: A > B 4

ds1: B C 5

ds2: C > D 12

Page 11: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Combining several datasets

• Whether or not there is an edge between two vertices is determined in biological experiments, which may sometimes give false results.

• For a given graph different sources may give different information. Some sources may be more trustworthy than others.

• We would like to combine different sources and assess the trustworthyness of each edge in the resulting graph.

• Edge weight in summary graph: sum over datasets– w(e,G) = Σ w(e,Gi)*w(Gi)

Page 12: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Combining several datasets

Page 13: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Page 14: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Page 15: Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

The end