Apache Storm A distributed, real-time computation system Some content borrowed from Nathan Marz ’ Presentation of a similar na Ryan Lanman
Jan 26, 2015
Apache Storm
A distributed, real-time computation system
Some content borrowed from Nathan Marz’ Presentation of a similar name
Ryan Lanman
Objectives1.Their Motivation2.Our Motivation3.Storm Basics4.Demo
Their MotivationHow Storm Came To Be
What They Wanted• Guaranteed data processing• Horizontal scalability• Fault-tolerance• No intermediate message brokers!• Higher level abstraction than message passing• “Just works”
Our MotivationWhy We Chose Storm
eventua
ll
y^
Lumify IngestRaw Data
Text Extraction
Entity Extraction
Text Highlighting
Location Extraction
Full Text Indexing
Issues
• No Reducers• High DB Read/Writes• Batch-style processing• M/R Overhead• Zero Fault Tolerance
What We Really Wanted
• Distributed, Stream-type Processing• Simple Logical DAG• Better Fault Tolerance
Text
Storm Ingest Workflow
Documents
Video
Images
Raw Data Content Sorter
Text Extraction
Video Frame
Splitting
Video Frame Text Extraction
Image Text Extraction
…
Storm BasicsWhat the heck’s a Topology?
Storm Cluster
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Storm Cluster
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Storm Cluster
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Storm Cluster
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Storm Data Concepts• Tuples• Streams• Spouts• Bolts• Topologies
Tuples
• Single unit of data in Storm• Examples– Tweet– User Activity Log Entry– File Info
Streams
Tuple Tuple Tuple TupleTupleTuple Tuple
An unbound sequence of Tuples
Spouts
TupleTuple
TupleTupleTuple Tuple
Producers of Streams
Tuple
TupleTuple
Tuple
Tuple Tuple
Spout
Bolts
TupleTuple
Tuple Tuple
Process input streams to create new streams
Tuple
Tuple
Tuple Tuple
Tuple Tuple
Examples
Spout Examples• HDFS Filesystem Spout• Kafka Queue Spout
Bolt Examples• Filtering• Aggregation• DB Operations
Topologies
Spout
Spout
Spout
Demo
Demo Topology
Twitter Hosebird
Spout
SentenceSplitter
Accumulo
WordCount
Demo Topology
Twitter Hosebird
Spout
SentenceSplitter
Accumulo
WordCount
Twitter ShuffleGrouping Field
Grouping