TranSMART Hackathon Introduction Amsterdam 2015

Hackathon Introduction TRANSMART ANNUAL MEETING 2015

AMSTERDAM, OCTOBER 19, 2015

Kees van Bochove, Chair Architecture Working Group @ tranSMART Foundation | CEO @ The Hyve

2

3

Hackathon Topics

u  Building a POC around using SparkR on Amazon EC2 as

a computational backend for tranSMART 1.3

u  Improving the visual analytics in tranSMART, by updating

or adding analytics workflows in the SmartR plugin

4

Apache Spark

u  Largest and most active open source project in data

science as of this year

u  Seen by many as a ‘replacement’ for Hadoop

[MapReduce] in the big data area

u  Implements lessons learned from Hadoop; built from the

ground up as a framework support data scientists

u  Core concept: in memory datasets & lazy evaluation

5

SparkR architecture

6

SparkR architecture

7

Hackathon Goal: SparkR integration

u  Task: Integrate Spark with tranSMART database, via the

implementation of Spark RDD interface in tranSMART core API

u  Goal: Demonstrate scalability on compute side (scalability on

database remains limited because of relational database

paradigm)

u  Benefit: Can use Spark compatible applications on top of

tranSMART (e.g. machine learning, big data analytics tools etc.)

8

Current Architecture

9

Goal architecture

10

SparkR architecture

11

SmartR u  Plugin for tranSMART

1.3 written by

Sascha Hertzinger,

Uni. Luxembourgh

for IMI eTRIKS

u  Currently being

extended by The

Hyve and Sanofi

12

Current SmartR analytics

u  Boxplot

u  Correlation Analysis

u  Heatmap

u  Timeline Analysis

u  ? Your ideas

u  TODO: add screenshots

13

Hackathon Goal: Visual Analytics

u  Improve existing analytics workflows

u  e.g. heatmap

u  Add new workflows

u  D3.js library for building visualizations

TranSMART Hackathon Introduction Amsterdam 2015

Data & Analytics