Top Banner
the power of graphs for analyzing biological datasets Davy Suvee Janssen Pharmaceutica
45

The power of graphs to analyze biological data

Dec 05, 2014

Download

Technology

datablend

The power of graphs to analyze biological data
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The power of graphs to analyze biological data

the power of graphs for analyzing biological datasets

Davy Suvee

Janssen Pharmaceutica

Page 2: The power of graphs to analyze biological data

about me

➡ working as an it lead / software architect @ janssen pharmaceutica• dealing with big scientific data sets

• hands-on expertise in big data and NoSQL technologies

who am i ...

Davy Suvee@DSUVEE

➡ founder of datablend• provide big data and NoSQL consultancy

• share practical knowledge and big data use cases via blog

Page 3: The power of graphs to analyze biological data

outline

➡ getting visual insights into big data sets

➡ fluxgraph, a time machine for you graphs ...

★ gene expression clustering (mongodb, Neo4j, Gephi)★ Mutation prevalence (cassandra, Neo4j, Gephi)

Page 4: The power of graphs to analyze biological data

insights in big data

➡ typical approach through warehousing★ star schema with fact tables and dimension tables

Page 5: The power of graphs to analyze biological data

insights in big data

➡ typical approach through warehousing★ star schema with fact tables and dimension tables

Page 6: The power of graphs to analyze biological data

insights in big data

★ real-time visualization★ filtering★ metrics★ layouting★ modular 1, 2

1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin

Page 7: The power of graphs to analyze biological data

gene expression clustering

★ 4.800 samples★ 27.000 genes

➡ oncology data set:

➡ Question:★ for a particular subset of samples, which genes are co-expressed?

Page 8: The power of graphs to analyze biological data

mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,  "genomics_id" : 122551 ,  "sample_id" : 343981 ,  "donor_id" : 143981 ,  "sample_type" : "Tissue" ,  "sample_site" : "Ascending colon" ,  "pathology_category" : "MALIGNANT" ,  "pathology_morphology" : "Adenocarcinoma" ,  "pathology_type" : "Primary malignant neoplasm of colon" ,  "primary_site" : "Colon" ,  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                     … ]}

Page 9: The power of graphs to analyze biological data

pearson correlation through map-reduce

pearson correlation

x y

43 99

21 65

25 79

42 75

57 87

59 81

0,52

Page 10: The power of graphs to analyze biological data

co-expression graph

➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes

Page 11: The power of graphs to analyze biological data

co-expression graph

Page 12: The power of graphs to analyze biological data

graphs and time ...

➡ fluxgraph: a blueprints-compatible graph on top of Datomic

➡ make FluxGraph fully time-aware ★ travel your graph through time★ time-scoped iteration of vertices and edges★ temporal graph comparison

➡ towards a time-aware graph ...

➡ reproducible graph state

Page 13: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Page 14: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Davy

Page 15: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Davy

Peter

Vertex peter = ...

Page 16: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Michael

Davy

Peter

Vertex peter = ...Vertex michael = ...

Page 17: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Michael

Davy

Peter

Vertex peter = ...Vertex michael = ...

Edge e1 = fg.addEdge(davy, peter,“knows”);

knows

Page 18: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

Michael

Davy

Peter

knows

Page 19: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Davy

Peter

knows

Page 20: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Peter

knows

David

Page 21: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Peter

Edge e2 = fg.addEdge(davy, michael,“knows”);

knows

David

knows

Page 22: The power of graphs to analyze biological data

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

currenttime

by default

Page 23: The power of graphs to analyze biological data

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

currenttime

fg.setCheckpointTime(checkpoint);

Page 24: The power of graphs to analyze biological data

tcurrrentt3t2

time-scoped iteration

change change change

Davy’’’Davy’ Davy’’

t1

Davy

➡ how to find the version of the vertex you are interested in?

Page 25: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Page 26: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();

Page 27: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Page 28: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);

Page 29: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);Interval valid = davy.getTimerInterval();

Page 30: The power of graphs to analyze biological data

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

Page 31: The power of graphs to analyze biological data

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

➡ edge:★ setting or removing a property ★ being removed

Page 32: The power of graphs to analyze biological data

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

➡ edge:★ setting or removing a property ★ being removed

➡ ... and each element is time-scoped!

Page 33: The power of graphs to analyze biological data

MichaelMichael

Davy

Peter

David Davy

Peter

temporal graph comparison

knows

knows

knows

current checkpoint

what changed?

Page 34: The power of graphs to analyze biological data

temporal graph comparison

➡ difference (A , B) = union (A , B) - B

➡ ... as a (immutable) graph!

Page 35: The power of graphs to analyze biological data

temporal graph comparison

➡ difference (A , B) = union (A , B) - B

➡ ... as a (immutable) graph!

difference ( , ) =

David

knows

Page 36: The power of graphs to analyze biological data

t3t2t1

use case: longitudinal patient data

patient patient

smoking

patient

smoking

t4

patient

cancer

t5

patient

cancer

death

Page 37: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)

Page 38: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)

➡ example analysis: ★ if a male patient is no longer smoking in 2005★ what are the chances of getting lung cancer in 2010, comparing

patients that smoked before 2005

patients that never smoked

Page 39: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Page 40: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()

Page 41: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()

while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}

Page 42: The power of graphs to analyze biological data

use case: longitudinal patient data

boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; }

}).iterator().hasNext();

➡ which patients were smoking before 2005?

Page 43: The power of graphs to analyze biological data

use case: longitudinal patient data

Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());

➡ which patients have cancer in 2010

working set of smokers

Page 44: The power of graphs to analyze biological data

use case: longitudinal patient data

Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());

➡ which patients have cancer in 2010

working set of smokers

➡ extract the patients that have an edge to the cancer node

Page 45: The power of graphs to analyze biological data

Questions?