Cloud Event Processing Analyze ∙ Sense ∙ Respond CloudConnect March 8, 2011
Dec 23, 2014
Cloud Event Processing
Analyze ∙ Sense ∙ Respond
CloudConnectMarch 8, 2011
CLOUDEVENTPROCESSING
Welcome
• High Velocity Big Data• What is Complex Event Processing?• Analyzing Time Series with SAX• What is Map/Reduce?• Correlating with Historical Data• Using the Cloud• Questions
CLOUDEVENTPROCESSING
Data Growth*
Category 1 Category 2 Category 3 Category 40
2
4
6
8
10
12
14
16
18
*It would appear that things will actually get worse, not better
CLOUDEVENTPROCESSING
High Velocity Big Data
• What is Big Data?– You’ve got Big Data issues when you can’t turn the data into
information fast enough to act on:• Earthquake• Brownout• Market Crash• Terrorist Event
– You’ve got Big Data when you have to consider its actually Physicality
• What is High Velocity Big Data– Big Data In Flight…
• You don’t get to store it before you analyze it
CLOUDEVENTPROCESSING
What is Complex Event Processing?
• Complex Event Processing (CEP) delivers high-speed processing of many events across all the layers of an organization, identifying only the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.– From Wikipedia
CLOUDEVENTPROCESSING
What? What is CEP?
• Domain Specific Language– Makes it easier to deal with events
• Continuous Query– Select symbol, side, price from tradeStream
• Time/Length Windows– Select symbol, side, avg(price) from tradeStream.win:time(10
minutes) group by symbol, side• Pattern Matching
– select a.* from pattern [every a=FIXNewOrderSingle -> (timer:interval(30 seconds) and not FIXNewOrderSingle(a.Side!=Side and a.OrderQty = OrderQty and a.Symbol = Symbol))]
CLOUDEVENTPROCESSING
Wouldn’t It Be Cool
• Select * from everything where itsInteresting = toMe in last 10 minutes;
• Select * from everything where earthQuake > .8;
• Select * from everything where terroristsWillStrike > .9;
CLOUDEVENTPROCESSING
CEP – Current Benefits*
• Really Fast!• Low Latency!• Provides a ‘ready made’ framework to build
real-time pattern matching applications• Think at a higher level
– Productivity
*your mileage may vary, widely
CLOUDEVENTPROCESSING
CEP – Current Limitations
• Memory Bound– If you have a lot of events and windows, you risk
running out of memory on a single machine• Compute Bound
– To ensure high throughput and low latency, most CEP engines are actually doing simplistic things
• e.g. Filtering events
• Black Box– What’s going on in there?
CLOUDEVENTPROCESSING
Checkpoint
• Ok, so by using Complex Event Processing– You can analyze data in flight– But
• You’re constrained by:– Available compute– Memory
• Because, there’s still too much data to process on one machine…
CLOUDEVENTPROCESSING
The Problem With Time Series • Dimensionality
– How can I recognize something?• Distance Measures
– How do I find similar occurrences?• Time
– By the time I process the data, the information has little value…
CLOUDEVENTPROCESSING
Symbolic Aggregate Approximation
• SAX reduces numerical data to a short string, or SAX word.
• Thousands of data points of numerical, continuous data becomes ‘ABCEDEFGH’
• SAX Approximation of the data fits in main memory, yet retains features of interest
• Creating SAX words from
historical and streaming data allows us to perform all kinds of magic…
0
-
-
0 20 40 60 80 100 120
bb b
a
cc
c
a
baabccbc
SAX Encoding
SAX Advantages:• Patterns identified and described using SAX actually
look like the underlying data• Other algorithms sometimes don’t actually describe
the underlying patterns or take way too much work to be useful in real time
CLOUDEVENTPROCESSING
SAX – 5 Use Cases
• Indexing– Given a time series, find similar time series in the database
• Clustering– Find natural grouping in the time series
• Classification– Automagically sort patterns found in time series into categories
• Summarization– Condense verbose data into meaningful information
• Anomaly Detection– Find surprising, interesting, or unexpected behavior
CLOUDEVENTPROCESSING
Why SAX is Cool
• Lower Bounding– The patterns identified and described using SAX
actually look like the underlying data• Dimensionality Reduction
– Previously intractable problems become possible in real time
• Other algorithms sometimes don’t describe underlying patterns
• Take way too much work to be useful in real time
CLOUDEVENTPROCESSING
A Day’s Worth of IBM
CLOUDEVENTPROCESSING
Normalized & PAA Applied
CLOUDEVENTPROCESSING
And Finally, SAX
ED D
BC C
ABCCE
FG
EDDCCBC
CLOUDEVENTPROCESSING
Checkpoint
• We’ve reduced dimensionality• We know were we are
– The current pattern is AABASDGF• We’re calculating it in ‘real-time’*
– Using Complex Event Processing• But
– There’s still too much data to process on one machine…• How can we process more data in the same
amount of time?
*I much prefer the term event-driven
CLOUDEVENTPROCESSING
What is Map/Reduce?• Framework for processing ginormous datasets using a large number of
computers (nodes) in a cluster.
• "Map" Master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node.
• "Reduce" Takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.– From Wikipedia
CLOUDEVENTPROCESSING
What? What is Map/Reduce?
• WordCount Example (classic)– Map scans text for words and emits - {word,1}– Combine/collapses key values on same node -
{word,1,1,1} -> {word,3}– Shuffle/Sort merges results from different nodes
• {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50) – becomes
• {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50}
– Reduce• Outputs {“NoSQL”,100} {“Oracle”,50}
CLOUDEVENTPROCESSING
SAX and Map/Reduce
• SAX is an ‘embarrassingly parallel’ problem• Using parallel processing allows SAX words to
be computed more quickly• Using Streaming Map/Reduce provides results
even faster, increasing the value of data even more– Partition by symbol and sort by timestamp– Calculate SAX words for each symbol, in parallel
• CEP Time Windows to the Rescue!
CLOUDEVENTPROCESSING
Checkpoint
• CEP is great, but I still have to tell it what I’m looking for, right?
• SAX can help us reduce dimensionality, what else can it do for us?
• How do I relate Streaming Data to Historical Data?
• How do I do this while the Information still has value?
CLOUDEVENTPROCESSING
High Velocity Big Data Pattern
OnRampEvents Events
Map
Map
Map
SAX Reduce Context
Map
ReduceMap
Map
Events
Historical
Events
CLOUDEVENTPROCESSING
So What Do We Need?
• Complex Event Processing• The Algorithm (SAX)• Processing Model – Streaming Map/Reduce• Context – The Historical Aspect• What Do We Call This?
CLOUDEVENTPROCESSING
What is DarkStar?
– Platform as a Service (PaaS)• Provides Distributed
– Complex Event Processing– Streaming Map/Reduce– Messaging– Web Services– Monitoring/Management
– Applications are built on top, or inside• SAX runs inside of DarkStar
– SAX is not a component of DarkStar, but an add-in library
– And deployed in a cluster• Virtualized Resources
CLOUDEVENTPROCESSING
DarkStar
• What patterns are occurring in my data, right now?– CEP based streaming Map/Reduce
• Use a cluster of machines
• When did this pattern happen before?– Database with embedded Map/Reduce
• No need to move data outside the database for processing
CLOUDEVENTPROCESSING
The Cloud
• Elastic Resource– Grows/Shrinks according to demand
• Virtualization– Efficient utilization of compute
• The Previously Unthinkable– Is now possible, if not already commonplace
• Peering can provide access to Big Pipes and Secure Data
CLOUDEVENTPROCESSING
Thank You!
• Questions?
• Contact Me– Colin Clark– @EventCloudPro– [email protected]