© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.
Redbasin Networks: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis
By Smita Kulkarni Gudur and Manoj Joshi
Friday, September 6, 13
Introduction
Smitha Kulkarni Gudur, CEO
Manoj Joshi, CTO
Allan Grimes, VP Business
Neeta Potdar, HR & Admin
Friday, September 6, 13
Redbasin Networks Overview
Redbasin Networks provides a cloud based platform for cancer drug researchers in Pharma and Bio-tech.
Redbasin is a scalable technology and platform that allows Life Science researchers to gain insights about viable drug molecules and pathways.
Friday, September 6, 13
Cancer Ecosystem Today (It’s complex!)
EPA
CDCUniversities
NIH/NLM
Hospitals, Treatment CentersBiotech
Labs
Legal
Instrument vendors
Certification,Approval
Lab tests
Patients
Insurance
Pharma
Contract Research
Organization
Drug Labs
4
FDA
Friday, September 6, 13
Cancer Market Research
US cancer spending $108b
89mdeaths
2005-2015
Redbasin Networks 10% of top
200 drugscancerrelatedgenerate $1b/yr
1.5mnew cancerdeaths
Friday, September 6, 13
Spring Data: Redbasin Cancer Research
SpringData
Protein Gene Disease Drug Antibody Ligand Complex Epigenetics
MongoDB Neo4j Redis HBase Lucene
Cloud
REST API
XML JSON
Friday, September 6, 13
Typical Drug Life Cycle Costs
Friday, September 6, 13
Why Not Go Relational?
Oncological meta-data is multi-dimensional
Pervasive joins are a drag on performance
Unpredictable schemas during mining
Temporality is difficult to represent
Friday, September 6, 13
Redbasin Core Data Technologies
• Mongo• Neo4J• Redis• Lucene• HBase/Hadoop
Friday, September 6, 13
Why Mongo?
Lots of XML and JSON documents
Very easy to use
High performance and scalability
Strong Java & REST Support
Friday, September 6, 13
Why Neo4j?
Neo4j is a modern graph database
Very easy to use
Complex features that are used less often have been dropped
Strong Java & REST Support
Friday, September 6, 13
How does Redbasin use Neo4J
We have 225 oncology dimensions
Everything either a node or relationship or a property
We use indexes liberally
Friday, September 6, 13
Numerous dim and sub-dim in Redbasin’s big data
DI
TX
Protein Gene Disease Drug Antibody Ligand
Epigenetics Ontology
Aminoacid
Structure PD/PK Physicochemical
Research Experiment
Interaction
Researcher Institute
Pathway
OrganismInstrument Method
Enzyme
Time LocationFDA Pharma ClinicalTrial
Friday, September 6, 13
Dimensions have sub-dimensions
DI
TX
Pharmacodynamics
Absorption Distribution Metabolism Elimination Toxicity
Principal Dimension
Sub-dimensions
(What drug does to body?)
Friday, September 6, 13
Data is Logical. But Big Data is not.
DI
AOP
TX
Real-time lookups
Understands human ideosyncracies
Logical
Impressive computational
abilities
Data is more than just data
Asymptotic convergence to
human
Friday, September 6, 13
No enterprise! Just plain cloud...
DI
AOP
TX
Friday, September 6, 13
Perhaps a Nebula(e), but why?
DI
AOP
TX
•Contextual correlation•Ontology driven•Multi-dimensional•Hierarchical•Fractal like•Clustering•Dynamic/Evolving•Stars(facts) are born•Zoom for details•Humongous•Transparency•Dynamic metadata*•Interconnected•Graph like•Complexity
Friday, September 6, 13
How does Redbasin use Spring DataRedbasin Cloud Connects to hundred’s of cancer data sourcesRedbasin uses contextual mining to create dynamic modelsWe map nodes, relationships, attributes to Redbasin Object ModelWe separate analytics from queries
Friday, September 6, 13
Neo4J Node Index Example IndexHits <Node> pNodeHits = drugIdIndex.get(DRUG_ID, drugConceptCode);if (pNodeHits != null && pNodeHits.size() > 0) { // if node already exists drugNode = pNodeHits.getSingle(); if (drugNode != null) { if (!drugNode.hasProperty(DRUG_CONCEPT_CODE)) { drugNode.setProperty(DRUG_CONCEPT_CODE, drugConceptCode); } if (!drugNode.hasProperty(BioEntityTypes.NODE_TYPE)) { drugNode.setProperty(BioEntityTypes.NODE_TYPE, BioEntityTypes.RB_DRUG); } }}
Friday, September 6, 13
Spring Stack: Spring Data with Mongo JSON "@molecule_type" : "complex", "@id" : "208314", "Name" : { "@name_type" : "PF", "@long_name_type" : "preferred symbol", "@value" : "TXA2/TP beta/beta Arrestin3/RAB11/GDP" }, "ComplexComponentList" : [ { "@molecule_idref" : "202489" }, { "@molecule_idref" : "202493", "PTMExpression" : [ { "@protein" : "O75228", "@position" : "239", "@aa" : "C", "@modification" : "palmitoylation" }
Friday, September 6, 13
Redbasin data grows and changes over time
Spring Data with Mongo Objects
Collection ideal for Redbasin’s unstructured
Data
Retrieve nested objects with ease
participantList.experimentalRoleList.experimentalRole.xref.secondaryRef.@db" : "pubmed"
DBObject utilities well suited for mapping to BioEntities
Friday, September 6, 13
Spring Data: Redis
Key
Value
Usage: Ontologies & Taxonomy for unique key value pairs. In auto completion as our data is “N” column based
Friday, September 6, 13
Redis - Ontology Lookups
Ontology Lookups Can Be Very Handy
Friday, September 6, 13
Redis - Analytics Cache
MineBot and Multi-entity Analytics is Nifty
Friday, September 6, 13
Redis - Managing Aliases
Gene Aliases for Instance are Numerous
Friday, September 6, 13
Redis - Key Value Pairs
Large Number of Key Value Pairs
Key Value
ATP Adenosine Tri-phosphate
Friday, September 6, 13
Redis - Slaves
Redis Slaves Simply Work
Friday, September 6, 13
Redis - Monitor
https://github.com/nkrode/RedisLive
Friday, September 6, 13
Redis - Subgraph Caching
•Subgraph Similarity Analytics•Pathway Rules Cache
Friday, September 6, 13
Redis - Spring data
• Using connection package Jedis• Spring’s data access exception for redis driver• Built abstraction - Redis template• Not using pubsub support• Using our our own JSON/XML mapping serializers• Atomic counter for redis - useful• Sorting (using) and pipelining (not using)• Not using 3.1 spring cache abstraction
Friday, September 6, 13
Spring Data: Redis Usage
Key Value
NCBI_TAXONOMY_ID Key: 9606 Homo Sapien
DISEASE_CODEKey: x46859
Metastases from colorectal carcinoma
HGNC_ID (Human Gene Identifier)Key: 1817 CEACAM5
Friday, September 6, 13
Redbasin vs Other BioModels
Redbasin Other BioModels
Focused on Oncology No focus on any specific Disease
Commercial/public domain correlations
Focused on academic knowledge
Information density is “infinite” Information size is “infinite”
Temporality/pathway dependent No time element
Hybrid vendor strategy No co-existence scenario
One cloud for all Oncology Typically downloadable software
Friday, September 6, 13
Neo4J Node Validation
Beclin 1 Gene
Bcl-2 Protein
Apoptosis
binds-to
inhibits
Biologically aware nodes and relationships
Friday, September 6, 13
Spring Data Relationship Entity
@RelationshipEntitypublic class BioRelation { }
Annotation for @RelationshipEntity
Metadata for recognition of a relationship class
Convenient relationship abstraction
Friday, September 6, 13
Relationships always have start/end nodes
@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; }
• A unique field must be marked as @EndNode• A unique field must be marked as @StartNode• Field can be any variable name• Flexibility for the programmer• Must be @BioEntity class
Friday, September 6, 13
Spring Data Relationship Entity
@RelationshipEntitypublic class BioRelation {..... @GraphId private Long id;..... }
• Id of the relationship• This is an unreliable field• But we have it hear for reference
Friday, September 6, 13
Spring Data Relationship Entity
@RelationshipEntitypublic class BioRelation { ..... @RelProperty private String name; .... }
• @RelProperty tells if this is a property• There could be non-property fields• The property here is “name”• It’s always a String
Friday, September 6, 13
Spring Data Relationship Entity
@RelationshipEntitypublic class BioRelation { .... @RelType private String relType; @RelProperty private String message;}
• @RelType is the actual relation• message is a default @RelProperty
Friday, September 6, 13
Spring Data Relationship Entity
@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; @GraphId private Long id;
@RelProperty private String name; @RelType private String relType; @RelProperty private String message;}
Friday, September 6, 13
Spring Data-isms @Retention(RetentionPolicy.RUNTIME) public @interface BioEntity { public BioTypes bioType(); }
Retention(RetentionPolicy.RUNTIME) public @interface RelationshipEntity { }
Friday, September 6, 13
Spring Data-isms Neo4j Retention(RetentionPolicy.RUNTIME)public @interface RelatedTo {
public Direction direction() default Direction.BOTH;
BioRelTypes relType() default BioRelTypes.DEFAULT_RELATION;
public Class<?> elementClass() default Object.class;
public BioTypes endNodeBioType() default BioTypes.UNKNOWN;
public BioTypes startNodeBioType() default BioTypes.UNKNOWN;}
Friday, September 6, 13
Bio Entity
@Retention(RetentionPolicy.RUNTIME)public @interface BioEntity { public BioTypes bioType(); }
• This is usually a node in Neo4J • @Retention - How long to retain annotations?• CLASS - Annotations are to be recorded in the class file by the compiler but need not be retained by the VM at run time.• RUNTIME - Annotations are to be recorded in the class file by the compiler and retained by the VM at run time, so they may be read reflectively.• SOURCE - Annotations are to be discarded by the compiler.
Friday, September 6, 13
End Node annotation
package com.redbasin.bio.meta;
@Retention(RetentionPolicy.RUNTIME)@Target({ ElementType.ANNOTATION_TYPE, ElementType.FIELD })public @interface Reference {}
@Retention(RetentionPolicy.RUNTIME)@Target({ElementType.FIELD,ElementType.METHOD})@Referencepublic @interface EndNode {}
• There is no concept of start and end nodes in Neo4J• This is a Redbasin abstraction• The @Reference can be used by annotation types and fields only• The annotation @EndNode can be used by methods and fields only• It cannot be used by classes or other elements
Friday, September 6, 13
Redbasin Open Doc Share
https://github.com/redbasin/redbasin-org
• It’s our “social taxonomy” for scientific documents• github community project• Scientists can collaborate over zillions of documents and media• Downloadable code, can run in cloud mode• Can be modified to support any data access• Redbasin.org uses it for collaboration in schools• A Spring champion cause, underprivileged schools
Friday, September 6, 13
What can developers do?
• Help us with development of our public domain API• We support Jquery, d3js, JSON/XML, REST and more• We support Android, iOS on mobiles/tablets• Spring data integration - developer plugins
Friday, September 6, 13
Redbasin Cloud Projects
Open Stack ProjectCloud Foundry IntegrationAWS Project
Friday, September 6, 13
Why have Java developers chosen Spring?
DI
AOP
TX
CoreModel
J(2)EE usability
Testable, lightweightmodel for
programming
Application Portability
Powerful Service Abstractions
Deployment Flexibility
Friday, September 6, 13
Spring
Deploy to Cloud or on premise
Big, Fast,
FlexibleData Web,
Integration,Batch
CoreModel
GemFire
Friday, September 6, 13
Spring Stack
DI AOP TX JMS JDBC
MVC Testing
ORM OXM Scheduling
JMXREST Caching Profiles Expression
Spring Framework
HATEOAS
JPA 2.0 JSF 2.0 JSR-250 JSR-330 JSR-303 JTA JDBC 4.1
Java EE 1.4+/SE5+
JMX 1.0+WebSphere 6.1+
WebLogic 9+
GlassFish 2.1+
Tomcat 5+
OpenShift
Google App Eng.
Heroku
AWS Beanstalk
Cloud FoundrySpring Web Flow Spring Security
Spring Batch Spring Integration
Spring Security OAuth
Spring Social
Twitter LinkedIn Facebook
Spring Web Services
Spring AMQP
Spring Data
Redis HBase
MongoDB JDBC
JPA QueryDSL
Neo4j
GemFire
Solr Splunk
HDFS MapReduce Hive
Pig Cascading
Spring for Apache Hadoop
SI/Batch
Spring XD
Friday, September 6, 13
Learn More. Stay Connected.
Contact Redbasin: bit.ly/redbasin<related sessions>
Talk to us on Twitter: @springcentralFind session replays on YouTube: spring.io/video
Friday, September 6, 13