Transcript
YeezyScoreA comparison of stream
processing software
By: Kat Chuang
@katychuang
10 mins
High level overview
Kat Chuang @katychuang
Batch
Streaming
Microbatching
Storm Trident Spark Streaming
Released 2011 2010
Delivery Semantics
Exactly Once Exactly once
State Management Yes Yes
Latency Seconds Seconds
Output MapState Resilient Distributed Dataset (RDD)
Throughput 10k/nodes/sec? 400k/nodes/sec?
Test Cases Metrics
1. Does every message pass through the pipeline?
2. How fast does each message take to process?
Data
1. Timestamps
Kat Chuang @katychuang
Timestamp1 (Timestamp1, Timestamp2)
(Timestamp1, Timestamp2)
Timestamp1
Pipelines
Kat Chuang @katychuang
1. Does every message pass through the pipeline?
Kat Chuang @katychuang
This is a scatterplot
2. How fast does each message take to process?
Kat Chuang @katychuang
This is a scatterplot
Storm Trident Vs Spark StreamingStorm Trident Spark Streaming
Stream processing framework that also does micro-batching.
Great for transforming or computing as data flows in.
Complex event processing (CEP), continuous computation.
Task-Parallel Computations, i.e. reading Twitter streams
Batch processing framework that also does micro-batching.
Great for combining with historical data.
ML algos included. Requires HDFS-backed data source.
Data-Parallel Computations, i.e. offering recommendations
Kat ChuangData Engineering Fellow#DE-2015c
hello@katychuang.comGithub: katychuangTwitter: katychuangIG: katychuang.nyc
top related