SKETCHY: A COMPLEX EVENT PROCESSING NETWORK FOR SPAM DETECTION. Matt Weiden / SoundCloud Ltd.
SKETCHY: A COMPLEX EVENT PROCESSING NETWORK FOR SPAM DETECTION.
!Matt Weiden / SoundCloud Ltd.
Matt Weiden / SoundCloud Ltd.
!WHO?
Ich heiße Matt Weiden. Freut mich. • Backend Engineer, SoundCloud’s Trust, Safety & Security
Team • Previously Cognitive Science, BCI research !
Contributors • Rany Keddo • Michael Brückner • Astera Schneeweisz • Others
Matt Weiden / SoundCloud Ltd.
WHAT? INFERENCE FROM RELATED STREAMS OF DATA
The problem: How quickly and efficiently can we draw aggregate inferences from large streams of related events? !!!!!
!!!What inferences could we make? How quickly and efficiently can we make them?
Time
Posts
Views
Follows
DRINKING FROM A FIREHOSE.
Performing this for a whole site might take a little more thought.
Matt Weiden / SoundCloud Ltd.
WHAT (MORE SPECIFICALLY)? INFERENCE FROM RELATED STREAMS OF DATA
!!!How quickly and efficiently can we draw aggregate inferences
from large streams of related events?
Matt Weiden / SoundCloud Ltd.
HOW? EVENT-DRIVEN ARCHITECTURE
Event-Driven Architecture (EDA) !
• Near realtime • Only process the data once* • Operate on
• incremental sub-goal results • ‘Complex Events’ by adding ‘Context’
• Asynchronous, pipelined parallelism • Broadcast reusable events and complex events
Matt Weiden / SoundCloud Ltd.
HOW? EVENT PROCESSING NETWORK
Event Processing Networks (EPNs) implement EDA !
• Represented as a directed acyclic graph of • Event producers • Event processing agents (EPAs)
• enrich events • transform events into complex events • detect patterns
• Event consumers • Event channels
Matt Weiden / SoundCloud Ltd.
HOW? EVENT PROCESSING NETWORK
Sketchy is an EPN that implements EDA !
• Prevents text and social graph spam at SoundCloud • Open-source • Modular
• written as a flexible library, adaptable • many common components available out of the box
• Battle tested • ingests many sensitive event types at SoundCloud
Matt Weiden / SoundCloud Ltd.
HOW? EVENT PROCESSING NETWORK
Event producers introduce events into a network !• Represented as a directed graph of
• Event producers • Event channels • Event processing agents (EPAs)
• enrich events • transform events into complex events • detect patterns
• Event consumers
Producer
Event Channel A
Event Channel B
Event Channel C
Producer or EPA 2
Consumer or EPA 4
Matt Weiden / SoundCloud Ltd.
HOW? EVENT PROCESSING NETWORK
Event channels route events through the network !• Represented as a directed graph of
• Event producers • Event channels • Event processing agents (EPAs)
• enrich events • transform events into complex events • detect patterns
• Event consumersProducer or EPA 1
Event Channel
Consumer or EPA 3
Matt Weiden / SoundCloud Ltd.
HOW? EVENT PROCESSING NETWORK
Event processing agents contain business logic !• Represented as a directed graph of
• Event producers • Event channels • Event processing agents (EPAs)
• enrich events • transform events into complex events • detect patterns
• Event consumers
Event Processing
Agent
Event Channel A
Event Channel B
Event Channel A
Event Channel B
DB 1 cache
Matt Weiden / SoundCloud Ltd.
HOW? EVENT PROCESSING NETWORK
Event consumers act on processing in the network !• Represented as a directed graph of
• Event producers • Event channels • Event processing agents (EPAs)
• enrich events • transform events into complex events • detect patterns
• Event consumers
Consumer
Event Channel A
Event Channel B
Event Channel C
Matt Weiden / SoundCloud Ltd.
HOW? DO EPNs ACHIEVE EDA’s GOALS?
• Asynchronous, pipelined parallelism !
!!!!!!The node to node flow allows parallelism asynchronous computation.
Producer or EPA 1 Event Channel Consumer
or EPA 3
Matt Weiden / SoundCloud Ltd.
HOW? DO EPNs ACHIEVE EDA’s GOALS?
• Asynchronous, pipelined parallelism !!!!!!!
!!Source: http://www2.engr.arizona.edu/~ece462/Lec03-pipe/
• Build ‘Complex Events’ by putting events into the context in which they occur !
!!!!!!!!!Possible by aggregating and/or summarizing with data from external sources.
Matt Weiden / SoundCloud Ltd.
HOW? DO EPNs ACHIEVE EDA’s GOALS?
Event Processing
Agent
DB 1
Abstract example of a complex event being created.
EVENT5
EVENT5
E1 E2 E3 E4
+context
cache
Matt Weiden / SoundCloud Ltd.
HOW? DO EPNs ACHIEVE EDA’s GOALS?
• Build ‘Complex Events’ by putting events into the context in which they occur !
!!!!!!!!! In Sketchy the bulk agent stores a text fingerprint context in memcached.
M1 M2 M3 M4
fingerprints
MSG 4 bulkStatisticsAgent bulkDetectorAgent
Bulk! Complex Event
M1 M2 M3 M4
Stores Fingerprint Finds similar fingerprints (Jacquard distance)
memcached
Matt Weiden / SoundCloud Ltd.
HOW? DO EPNs ACHIEVE EDA’s GOALS?
• Broadcast events and complex events wherever their reuse is possible !
!!!!!!!!
A common use case in a Sketchy network.
Producer or EPA 2
Producer or EPA 1
Event Channel Consumer or EPA 4
Consumer or EPA 3
The event channel can send messages in this fashion.
messageCreateIngester
junkStatisticsAgent junkDetectorAgent
signalEmitterAgent
rateLimiterAgent
Matt Weiden / SoundCloud Ltd.
SKETCHY@SOUNDCLOUD
messageCreateIngester
junkStatisticsAgent junkDetectorAgent
signalEmitterAgent
rateLimiterAgent
Storm is a framework for building EPNs at scale
Matt Weiden / SoundCloud Ltd.
MOVE SKETCHY’S LOGIC TO TWITTER’S STORM?
STORM Sketchy’s Network Components
Language Scala Scala
Parallelism Multiple workers on Multiple hosts
Multiple workers on Single host
Deployment ‘Nimbus’ & Zookeeper Bazooka
Messaging Guarantees atLeastOnce, atMostOnce
Not yet
Hadoop Integration Yes No
Matt Weiden / SoundCloud Ltd.
LEARN MORE
• Event Processing Networks • Sharon and Etzion, “Event Processing Network, A
Conceptual Model,” VLDB, 2007 • Sketchy
• https://github.com/soundcloud/sketchy-core • Storm
• Toshniwal et al., “Storm@Twitter,” SIGMOD, 2014 • https://storm.incubator.apache.org
THANK YOU. QUESTIONS? !
Matt Weiden / SoundCloud Ltd. @mweiden, [email protected]