Jul 14, 2015
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SparkSQL
SparkStreaming
MLlib(machine learning)
GraphX(graph)
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
SPARK STREAMING OVERVIEW
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
•
•
•
•
•
SPARK STREAMING OVERVIEW
•
•
•
• Kafka provides seamless integration between information of producers and consumers without blocking the producers of the information, and without letting producers know who the final consumers are.
• Each consumer keeps control of its own offset (read)
• On demand topic creation
SPARK STREAMING OVERVIEW
• ETL and ELT, wide catalog of sources and sinks
• Flexible design of topologies and agent deployment strategies.
• Data transformation, thanks to interceptors.
•
•
SPARK STREAMING OVERVIEW
readClobreadCSVreadLinereadMultiLinereadAvroreadJson
addCurrentTimeaddLocalHostgeoIPfindReplaceSplit
generateUUIDdecompressIfextractJsonPathsdetectMimeType
xqueryextractURIComponentsxsltGrok (regular expressions)
exec
spooling
logger
SPARK STREAMING OVERVIEW
•
•
•
•
•
•
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
CASSANDRA
Kafka
STRATIO DEEP
STRATIO DEEP
•
•
•
•
•
•
•
SPARK STREAMING OVERVIEW
•
•
•
•
•
Shark(SQL)
SparkStreaming
Mllib(machine learning)
GraphX(graph)
SPARK STREAMING OVERVIEW
RDD, what is that?
SPARK STREAMING OVERVIEW
RDD, what is that?
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
SPARK STREAMING OVERVIEW
•
•
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
?SPARK STREAMING OVERVIEW
Spark Streaming: Overall view
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
Spark Streaming: Overall view
Discretized Stream or DStream.
SPARK STREAMING OVERVIEW
Discretized Stream or DStream.
SPARK STREAMING OVERVIEW
Discretized Stream or DStream.
SPARK STREAMING OVERVIEW
Overall view
SPARK STREAMING OVERVIEW
Input DStreams and Receivers.
• Basic (distributed with Spark Streaming).
• Advanced (available as dependency).
SPARK STREAMING OVERVIEW
Basic sources
• File Stream.
• Sockets.
• Actors (Akka).
• Queue RDDs (Testing).
SPARK STREAMING OVERVIEW
Advanced sources
SPARK STREAMING OVERVIEW
Do It Yourself
• Code onStart()
• Code onStop()
• Code receive()
• Custom Receiver ready!
SPARK STREAMING OVERVIEW
• map(func), flatMap(func), filter(func), count()
• repartition(numPartitions)
• union(otherStream)
• reduce(func),countByValue(), reduceByKey(func, [numTasks])
• join(otherStream, [numTasks]), cogroup(otherStream, [numTasks])
• transform(func)
• updateStateByKey(func)
• window(windowLength, slideInterval)
• countByWindow(windowLength, slideInterval)
• reduceByWindow(func, windowLength, slideInterval)
• reduceByKeyAndWindow(func, windowLength, slideInterval, [numTasks])
• countByValueAndWindow(windowLength, slideInterval, [numTasks])
• print()
• foreachRDD(func)
• saveAsObjectFiles(prefix, [suffix])
• saveAsTextFiles(prefix, [suffix])
• saveAsHadoopFiles(prefix, [suffix])
SPARK STREAMING OVERVIEW
•
•
•
•
•
•
•
•
•
•
•
SPARK STREAMING OVERVIEW
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
SPARK STREAMING OVERVIEW
•
•
•
•
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
•
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
• Stateful transformations (updateStateByKey, reduceByKeyAndWindow).
• As fault-tolerance mechanism, when driver crashes.
HDFS is mandatory if you are going to use operations that requires checkpointing.
SPARK STREAMING OVERVIEW
Configuration parameters
• spark.streaming.receiver.maxRate
• spark.streaming.concurrentJobs
• spark.streaming.receiver.writeAheadLogs.enable
• spark.streaming.unpersist
SPARK STREAMING OVERVIEW
each node has mutable state and for each record they have to update state & send new records
SPARK STREAMING OVERVIEW
•
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
•
•
•
•
•
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
SPARK STREAMING OVERVIEW
•
•
•
•
•
SPARK STREAMING OVERVIEW