fortiss GmbH An-Institut Technische Universität München Madrid, 2015-09-01 Stream Processing on Demand for Lambda Architectures European Workshop on Performance Engineering (EPEW) 2015 Johannes Kroß 1 , Andreas Brunnert 1 , Christian Prehofer 1 , Thomas A. Runkler 2 , Helmut Krcmar 3 1 fortiss GmbH, 2 Siemens AG, 3 Technische Universität München
20
Embed
Stream Processing on Demand for Lambda Architectures · Stream Processing on Demand for Lambda Architectures ... rk Apache Storm Apache HBase S4 EMC Greenplum ca Netezza SAP Apache
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
fortiss GmbH
An-Institut Technische Universität München
Madrid, 2015-09-01
Stream Processing on Demand for Lambda ArchitecturesEuropean Workshop on Performance Engineering (EPEW) 2015
Johannes Kroß1, Andreas Brunnert1, Christian Prehofer1, Thomas A. Runkler2, Helmut Krcmar3
1 fortiss GmbH, 2 Siemens AG, 3 Technische Universität München
pmw.fortiss.org Madrid, 2015-09-012
• Motivation
• Stream Processing On Demand
• Experimental Validation
• Related Work
• Conclusion and Future Work
Agenda
pmw.fortiss.org Madrid, 2015-09-013
• Motivation
• Stream Processing On Demand
• Experimental Validation
• Related Work
• Conclusion and Future Work
Agenda
pmw.fortiss.org Madrid, 2015-09-014
Motivation
• Various complementary big data technologies with different characteristics
• Development of complex system of systems
Performance issues and high resource requirements (Brunnert et al. 2014)
Teradata AsterApache Hadoop
Ap
ac
he
Sp
ark
Apache Storm
Apache HBase
S4
EMC Greenplum
IBM
Ne
tezz
a
HP
Ve
rtic
a
SA
P
Apache Kafka
Cassandra
Apache SamzaM
on
go
DB
ElephantDB
Vo
lde
mo
rtA
pa
ch
e F
lum
e
Hana
Amazon Kinesis
Clo
ud
era
Hortonworks
Ma
pR Vo
ltD
B
Autonomy
splu
nk
tableau
TIBCO
Pentaho
pmw.fortiss.org Madrid, 2015-09-015
Data Processing in the Lambda Architecture
Motivation
Query
Batch layer
Speed layer
Data
set View
View
Incoming data
Query
QueryDouble processing Merge
Apache Kafka
Apache Flume
Scribe
*MQ
Apache Spark
Hadoop MapReduceHDFS
Apache Storm
Apache Samza
Apache Spark Streaming
Amazon Kinesis
ElephantDB
Voldemort
Impala
Apache HBaseApache HBase
Cassandra
Redis
Kafka-
Spout
Camus
• Enable real-time queries on big data
• Design principles:
+ Data immutability - Resource requirements
+ Recomputation
+ Fault-tolerance
adapted from
Marz and Warren (2015)
pmw.fortiss.org Madrid, 2015-09-016
• Motivation
• Stream Processing On Demand
• Experimental Validation
• Related Work
• Conclusion and Future Work
Agenda
pmw.fortiss.org Madrid, 2015-09-017
A Novel Approach
Stream Processing On Demand
Batch process
Stream process
on demand
1
2
Lambda architecture
Decision-making model
Iterative Procedure:
1) Regular batch iteration (in parallel with stream process)
2) Decision-making model
Decide if stream processing is additionally required in the next batch iteration
pmw.fortiss.org Madrid, 2015-09-018
Chronological Sequence of Batch and Stream Processes
Stream Processing On Demand
Batch process j
time < y
Stream process j
time y
Batch process k
time < z
Stream process k
time z
Batch process i
time < x
timey z
Decision point whether batch process k will exceed
time-constraint and stream processes j and k are demanded