Hackathon Introduction TRANSMART ANNUAL MEETING 2015 AMSTERDAM, OCTOBER 19, 2015 Kees van Bochove, Chair Architecture Working Group @ tranSMART Foundation | CEO @ The Hyve
Hackathon Introduction TRANSMART ANNUAL MEETING 2015
AMSTERDAM, OCTOBER 19, 2015
Kees van Bochove, Chair Architecture Working Group @ tranSMART Foundation | CEO @ The Hyve
2
3
Hackathon Topics
u Building a POC around using SparkR on Amazon EC2 as
a computational backend for tranSMART 1.3
u Improving the visual analytics in tranSMART, by updating
or adding analytics workflows in the SmartR plugin
4
Apache Spark
u Largest and most active open source project in data
science as of this year
u Seen by many as a ‘replacement’ for Hadoop
[MapReduce] in the big data area
u Implements lessons learned from Hadoop; built from the
ground up as a framework support data scientists
u Core concept: in memory datasets & lazy evaluation
5
SparkR architecture
6
SparkR architecture
7
Hackathon Goal: SparkR integration
u Task: Integrate Spark with tranSMART database, via the
implementation of Spark RDD interface in tranSMART core API
u Goal: Demonstrate scalability on compute side (scalability on
database remains limited because of relational database
paradigm)
u Benefit: Can use Spark compatible applications on top of
tranSMART (e.g. machine learning, big data analytics tools etc.)
8
Current Architecture
9
Goal architecture
10
SparkR architecture
11
SmartR u Plugin for tranSMART
1.3 written by
Sascha Hertzinger,
Uni. Luxembourgh
for IMI eTRIKS
u Currently being
extended by The
Hyve and Sanofi
12
Current SmartR analytics
u Boxplot
u Correlation Analysis
u Heatmap
u Timeline Analysis
u ? Your ideas
u TODO: add screenshots
13
Hackathon Goal: Visual Analytics
u Improve existing analytics workflows
u e.g. heatmap
u Add new workflows
u D3.js library for building visualizations