Top Banner
Testing Processing Frameworks Streaming and Gabriela Choy
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Comparing processing frameworks v7

Testing Processing Frameworks

Streamingand

Gabriela Choy

Page 2: Comparing processing frameworks v7

Spark / StormSpark Storm

Implemented in Scala Clojure, Java

Delivery Semantics Exactly once At least once. Exactly once with Trident

APIPython, Java, Scala Java, Scala, Clojure,

Python, etc. Trident: Java, Scala, Clojure.

Processing ModelBatch. Micro-batches with Spark Streaming. ~ 500ms

Record at a time/ Trident allows for micro-batches.

Latency 1 - 2 seconds sub-seconds

Page 3: Comparing processing frameworks v7

Pipeline

Streaming

MySQL

MySQL

Producer/s

Producer/s

Each pipeline is run independently

Page 4: Comparing processing frameworks v7

Node Specifications

Spark Streaming Storm

4 AWS nodes m3.medium

Zookeeper 3.4.6Kafka 0.8.2.1

Spark (streaming) 1.3

4 AWS nodes m3.medium

Zookeeper 3.4.6Kafka 0.8.2.1Storm 0.9.5

Page 5: Comparing processing frameworks v7

Spark Streaming: 1 master node, 3 workers

Cluster Configuration

Master node

Worker 1

Worker 2

Worker 3

Page 6: Comparing processing frameworks v7

Storm : 1 nimbus, 3 Supervisors

Cluster Configuration

Nimbus

Supervisor 1

Supervisor 2

Supervisor 3

Page 7: Comparing processing frameworks v7

Metric

Throughput: amount of data that is being processed.

● By changing batch size

● By changing load (i.e. Scaling up)

● Programs used for benchmarking will be wordcount.

Page 8: Comparing processing frameworks v7

# Producers Batch Interval

1 1s, 2s, 3s, 4s, 6s

41s, 2s, 3s, 4s, 6s

8 1s, 2s, 3s, 4s, 6s

Tests for Spark Streaming

Page 9: Comparing processing frameworks v7

Throughput for 1 producer with 95% CI

Page 10: Comparing processing frameworks v7

Throughput for 1 producer

Page 11: Comparing processing frameworks v7

Throughput for 4 producers with 95% CI

Page 12: Comparing processing frameworks v7

Throughput for 4 producers

Page 13: Comparing processing frameworks v7

Throughput for 8 producers

Page 14: Comparing processing frameworks v7

Throughput for 8 producers

Page 15: Comparing processing frameworks v7

# Producers Tuples Emitted-Acked

1 10 min

4 10 min

8 10 min

Tests for Storm

Preliminary results for storm

Page 16: Comparing processing frameworks v7

Tuples Emitted per Second

Page 17: Comparing processing frameworks v7

Tuples Acked per Second

Page 18: Comparing processing frameworks v7

Spout Latency

Page 19: Comparing processing frameworks v7

Takeaways

● Setting the batch interval in spark streaming should be done by monitoring processing times and load size

● For Storm as numbers of producers increase so does throughput and spout latency.

Page 20: Comparing processing frameworks v7

Would like to add:

● Increase number of producers. Use real data.

● Add a graph as a second use case.

● Dashboard to monitor live streaming.

Page 21: Comparing processing frameworks v7

Gabriela Choy

Bsc. in Chem. Engineering. ULA, Vnzla.Msc in Statistics. UT Dallas

Previously: Worked in Device Reliability Engineering at View, Inc.

About Me