Top Banner
WHEN STORM HITS DATA. DATA STREAMS PROCESSING IN REAL TIME. MARCIN STANISLAWSKI
55

Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

Apr 16, 2017

Download

PROIDEA
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

WHEN STORM HITS DATA.DATA STREAMS PROCESSING IN REAL TIME.

MARCIN STANISLAWSKI

Page 2: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

WHO AM I?Architect/Developer at Interia.plStorm and Hadoop userGithub: webikTwitter: @unilama

Page 3: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

BIG DATA

HADOOP

Page 4: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

WELCOME IN ZOO

Page 5: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski
Page 6: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

RUN JOB

COFFEE BREAK*

RESULTS* - there are some solutions

Page 7: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

IMPALA

implemented in C++non Map Reduce solution

Page 8: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

KIJI

KijiRESTHDFS/HBase/Cassandra

Page 9: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

BATCH PROCESSING VS. STREAMING

Page 10: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski
Page 11: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski
Page 12: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STREAMING SOLUTIONSYahoo S4AkkaSpark StreamingStorm

Page 13: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STORM WHAT IS THAT?

Page 14: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

README.MDStorm is a distributed realtime computation system.

Storm is simple, can be used with any programming

language, and is a lot of fun to use!

Page 15: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

CURRENT STATUSApache IncubationIncluded in HortonWorks DataPlatformContributed by YahooEasy deploy to Amazon EC2

Page 16: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

WHO USES

Page 17: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

BASIC IDEA

Page 18: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

SPOUTSTAKES EVENTS FROM:

KafkaKestrelRabitMQ...

AND PASS THEM TO...

Page 19: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

BOLTSTUPLES ARE PROCESSED, IN WAY THAT YOU IMPLEMENT IT

Page 20: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

EVENTS ARE TUPLES( 1, "TEST", "ATMOSPHERE", "2014-05-20 10:00:40", ... )

OBJECTS ARE SERIALIZED USING KYRO

Page 21: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

WRITTEN IN JAVA&CLOJURETOPOLOGIES ARE DAGS

Page 22: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

ARCHITECTURENimbusNodes(Supervisors)UIDRPC

Page 23: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

EVENT PROCESSED ONE OR MORE TIMES.

Page 24: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

ACKING FRAMEWORKEach tuple must be acked or failed

Page 25: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

TUPLES TRACKINGtuple has random 64 bit id

xor of all tuple ids, that have been createdand/or acked in the tree

if tuple id equals 0, tuple is fully processed

Page 26: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

COMMUNICATIONBetween:

Tasks: Disruptor LMAXWorkers: ⦰MQ -> Netty

Page 27: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

TRIDENThigh-level abstractionsame as Cascading/Scalding in Hadoop World

Page 28: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

SPOUTKey difference - producing Stream(s)

Page 29: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STREAMBatches chain with multiplication ability

Page 30: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STREAM OPERATIONSFunctionsFiltersProjectionsJoinsMerges

Page 31: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

SATEOperations:

GroupingAggregateQuery

Page 32: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STATE TYPESnon-transactionaltransactionalopaque transactional

Page 33: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STATEIn memory stateNoSQL databasesExternal systems via APIs

Page 34: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

DRPC

Page 35: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

DRPC TOPOLOGYNAMED DRPC SPOUT

USES MAIN TOPOLOGY STATESGENERATES ONE TUPLE OUTPUT

Page 36: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

DRPC ELEMENTSTHRIFT SERVER(S)

WITH PREDEFINED SPOUTAND BOLT

Page 37: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

ARE YOU PROGRAMMING IN NON-JVMLANGUAGE?NO PROBLEM :)

RubyPythonPerlPHP...

Page 38: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STREAMING APIAPI defined as ThriftJSON based communication

Page 39: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

RED STORMWriting topologies in Ruby

Page 40: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

REAL TIME ALGORITHMS

Page 41: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

SIMPLE OPERATIONSSumCountMultiplication

Page 42: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

MAXIMUM AND MINIMUMdon't lose current value

Page 43: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

USUALLY TWO TOPOLOGIES

Page 44: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

LEARNINGClassificationClustering

Page 45: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

MODELEvaluatorVisualiser

Page 46: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

BASIC ELEMENT TABLE

Page 47: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

SIMPLE EXAMPLE

Page 48: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

ALGORITHM EXAMPLESk-means clustering

statistical test (T, F, Z, Chi2)Hidden Markov Models

Page 49: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

ADVERT TIME :)

Page 50: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

STORMUNIThttp://github.com/webik/StormUnit

MAVEN MOJO - COMMING SOON :)http://github.com/webik/storm-maven

Page 51: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

WHAT NEXT...

Page 52: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

SUMMINGBIRDWrite once, run on:

StormHadoop(Scalding)Amazon Kinesis

Page 53: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

MAYBE BACK INTO ZOOSTORM YARN

Page 54: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

THANK YOU.

Page 55: Atmosphere 2014: When Storm hits data. Data streams processing in real time - Marcin Stanislawski

QUESTIONS?