Insight DE project

Post on 16-Jan-2017

249 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

Transcript

YeezyScoreA comparison of stream

processing software

By: Kat Chuang

@katychuang

10 mins

High level overview

Kat Chuang @katychuang

Batch

Streaming

Microbatching

Storm Trident Spark Streaming

Released 2011 2010

Delivery Semantics

Exactly Once Exactly once

State Management Yes Yes

Latency Seconds Seconds

Output MapState Resilient Distributed Dataset (RDD)

Throughput 10k/nodes/sec? 400k/nodes/sec?

Test Cases Metrics

1. Does every message pass through the pipeline?

2. How fast does each message take to process?

Data

1. Timestamps

Kat Chuang @katychuang

Timestamp1 (Timestamp1, Timestamp2)

(Timestamp1, Timestamp2)

Timestamp1

Pipelines

Kat Chuang @katychuang

1. Does every message pass through the pipeline?

Kat Chuang @katychuang

This is a scatterplot

2. How fast does each message take to process?

Kat Chuang @katychuang

This is a scatterplot

Storm Trident Vs Spark StreamingStorm Trident Spark Streaming

Stream processing framework that also does micro-batching.

Great for transforming or computing as data flows in.

Complex event processing (CEP), continuous computation.

Task-Parallel Computations, i.e. reading Twitter streams

Batch processing framework that also does micro-batching.

Great for combining with historical data.

ML algos included. Requires HDFS-backed data source.

Data-Parallel Computations, i.e. offering recommendations

Kat ChuangData Engineering Fellow#DE-2015c

hello@katychuang.comGithub: katychuangTwitter: katychuangIG: katychuang.nyc

top related