Top Banner
38

Devops Spark Streaming

Feb 18, 2017

Download

Engineering

Marilyn Waldman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Devops Spark Streaming
Page 2: Devops Spark Streaming

Agenda

Distributed ProcessingTwo Use CasesSparkSpark Streaming Ecosystem

Page 3: Devops Spark Streaming

DistributedProcessing

Page 4: Devops Spark Streaming
Page 6: Devops Spark Streaming
Page 7: Devops Spark Streaming

1. https://spark.apache.org/docs/latest/cluster-overview.html

Berkeley AMPLab 2009Fast, general purpose cluster computing platform10X to 100X faster than Hadoop - runs in-memoryon top of Hadoop

Page 8: Devops Spark Streaming
Page 9: Devops Spark Streaming

1. Open source implementation forResilient Distributed Datasets(RDD's)

2. Advanced DAG execution enginesupporting cyclic data flow and in-memory computing

3. Java, Scala, Python and R4. Mesos, Yarn, StandAlone, Cloud,

Notebook5. HDFS, Hive, Cassandra, HBase,

Tachyon, Hadoop

RDD's + DAG + Lazy ExecutionRDD's + DAG + Lazy Execution

Page 10: Devops Spark Streaming
Page 11: Devops Spark Streaming

credit: Pietro Michirardi - Spark Internals

Page 12: Devops Spark Streaming

credit: http://spark.apache.org/docs/1.0.0/streaming-programming-guide.html

Spark StreamingSpark Streamingecosystem

Page 13: Devops Spark Streaming

credit: http://techblog.netflix.com/2015/03/can-spark-streaming-survive-chaos-monkey.html

NETFLIX ARCHITECTURENETFLIX ARCHITECTURE

Page 19: Devops Spark Streaming

Spark Streaming Set Up

Page 22: Devops Spark Streaming

ZooKeeper is a system for distributedcoordination and service discovery

Is highly-available

ZooKeeper Features

Distributed coordinationDistributed queuesDistributed locksDiscovery service Leader election

Page 23: Devops Spark Streaming

Distributed: runs on a set of servers called brokers ScalablePublisher-Subscriber System - topic based subscriptionReliable - messages passed to Kafka are replicated andpersisted to diskPreserves message order

Page 24: Devops Spark Streaming

Credit: http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/

localhost:2181

Page 25: Devops Spark Streaming
Page 26: Devops Spark Streaming

credit: https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-examplecredit: Jeremy Freeman

Page 27: Devops Spark Streaming

Semantics

At most onceAt least onceExactly once

credit: https://spark.apache.org/docs/latest/streaming-programming-guide.html

Spark Streaming supports "at least once"and with Kafka "exactly once"

Page 30: Devops Spark Streaming

http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-

approach-no-receivers

Page 32: Devops Spark Streaming
Page 33: Devops Spark Streaming

Lambda Architecture - combine batch andstreaming data

credit: Strata+Hadoop NYC

Page 34: Devops Spark Streaming

Combine machine learning to real-timedata

1. credit: Strata+Hadoop NYC

Page 36: Devops Spark Streaming

Combine SQL with real-time data

credit: Hadoop+Strata NYC

Page 37: Devops Spark Streaming
Page 38: Devops Spark Streaming

The End