WSO2 Complex Event Processor Sriskandarajah Suhothayan (Suho) Srinath Perera WSO2 Inc.
WSO2 Complex Event Processor
Sriskandarajah Suhothayan (Suho)Srinath Perera
WSO2 Inc.
Outline
• BigData• Complex Event Processing• Basic Constructs of Query Language • CEP Solution Patterns• Scale, HA and Performance • Demo
Why Big Data is hard?• How store? Assuming 1TB bytes it takes 1000
computers to store a 1PB • How to move? Assuming 10Gb network, it
takes 2 hours to copy 1TB, or 83 days to copy a 1PB
• How to search? Assuming each record is 1KB and one machine can process 1000 records per sec, it needs 277CPU days to process a 1TB and 785 CPU years to process a 1 PB
• How to process? – How to convert algorithms to work in large size
– How to create new algorithms
http://www.susanica.com/photo/9
Why it is hard (Contd.)?•System build of many computers •That handles lots of data•Running complex logic •This pushes us to frontier of Distributed Systems and Databases
•More data does not mean there is a simple model
•Some models can be complex as the system
http://www.flickr.com/photos/mariachily/5250487136, Licensed CC
Big data Processing Technologies Landscape
WSO2 Bigdata Offerings
Scenarios of Event Processing
Months
Days
hours
Minutes
Seconds
100 ms
< 1ms
0 10 100 1000 10000 100000 ~1M
Aggregate Data Rate (Events/seconds)
Late
ncy
Relational Database Applications Operational Analytics
Applications
Monitoring Applications Financial Trading
Applications
ManufacturingApplications
Data Warehousing Applications
Web Analytics Applications
CEP Target Scenarios
CEP Is & Is NOT!
• Is NOT!• Simple filters
• Simple Event Processing• E.g. Is this a gold or platinum customer?
• Joining multiple event streams• Event Stream Processing
• Is !• Processing multiple event streams • Identify meaningful patterns among streams• Using temporal windows
• E.g. Notify if there is a 10% increase in overall trading activity AND the average price of commodities has fallen 2% in the last 4 hours
What is ?
Query Functions of CEP
• Filter • Transformation • Window + { Aggregation, group by }• Join • Event Sequence • Event Table
CEP Architecture
Event Streams
• Event stream is a sequence of events • Event streams are defined by Stream Definitions• Events streams have in-flows and out-flows
• Inflows can be from• Event builders
Converts incoming XML, JSON, etc events to event stream
• Execution plans• Outflows are to
• Event formattersConverts to event stream to XML, JSON, etc events
• Execution plans
Stream Definition {
'name':'phone.retail.shop', 'version':'1.0.0', 'nickName': 'Phone_Retail_Shop', 'description': 'Phone Sales', 'metaData':[ {'name':'clientType','type':'STRING'} ], 'correlaitonData':[ {'name':’transactionID’,'type':'STRING'} ], 'payloadData':[ {'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'}, {'name':'total','type':'INT'}, {'name':'user','type':'STRING'} ]}
Event Format
• Standard event formats are available for • XML• JSON• Text• Map• WSO2 Event
• If events adhere to the standard format they do not need data mapping.
• If events do not adhere custom event mapping should be configured in Event builder & Event Formatter appropriately.
Event Format
Standard XML event format
<events> <event> <metaData> <tenant_id>2</tenant_id> </metaData> <correlationData> <activity_id>ID5</activity_id> </correlationData> <payloadData> <clientPhoneNo>0771117673</clientPhoneNo> <clientName>Mohanadarshan</clientName> <clientResidenceAddress>15, Alexendra road,
California</clientResidenceAddress> <clientAccountNo>ACT5673</clientAccountNo> </payloadData> </event><events>
CEP Execution Plan
● Is an isolated logical execution unit
● Each execution plan imports some of the event streams available in CEP and defines the execution logic using queries and exports the results as output event streams.
● Has one-to-one relationship with CEP Backend Runtime.
● Has many-to-many relationship with Event Streams.
● Each execution plan spawns a Siddhi Engine Instance.
CEP Solution patterns
1. Transformation - project, translate, enrich, split2. Filter3. Composition / Aggregation / Analytics
● basic stats, group by, moving averages
4. Join multiple streams 5. Detect patterns
● Coordinating events over time ● Trends - increasing, decreasing, stable, non-increasing, non-decreasing,
mixed
6. Blacklisting 7. Building a profile
Siddhi Query Structure
define stream <event stream>(<attribute> <type>,<attribute> <type>, ...);
from <event stream>select <attribute>,<attribute>, ...insert into <event stream> ;
Siddhi Query : Projection
define stream TempStream(deviceID long, roomNo int, temp double);
from TempStreamselect roomNo, tempinsert into OutputStream ;
Siddhi Query : Inferred Streams
from TempStreamselect roomNo, tempinsert into OutputStream ;
define stream OutputStream(roomNo int, temp double);
Siddhi Query : Enrich
from TempStreamselect roomNo, temp,‘C’ as scaleinsert into OutputStream
define stream OutputStream(roomNo int, temp double, scale string);
Siddhi Query : Enrich
from TempStreamselect deviceID, roomNo, avg(temp) as avgTempinsert into OutputStream ;
Siddhi Query : Transformation
from cseEventStream[price >= 20 and symbol==’IBM’]select symbol, volumeinsert into StockQuote
from TempStreamselect concat(deviceID, ‘-’, roomNo) as uid,
toFahrenheit(temp) as tempInF, ‘F’ as scale
insert into OutputStream ;
Siddhi Query : Split
from TempStreamselect roomNo, tempinsert into RoomTempStream ;
from TempStreamselect deviceID, tempinsert into DeviceTempStream ;
Siddhi Query : Filter
from TempStream [temp > 30.0 and roomNo != 2043]select roomNo, tempinsert into HotRoomsStream ;
Siddhi Query : Window
from TempStreamselect roomNo, avg(temp) as avgTempinsert into HotRoomsStream ;
Siddhi Query : Window
from TempStream#window.time(1 min)select roomNo, avg(temp) as avgTempinsert into HotRoomsStream ;
Siddhi Query : Window
from TempStream#window.time(1 min)select roomNo, avg(temp) as avgTempgroup by roomNoinsert into HotRoomsStream ;
Siddhi Query : Batch Window
from TempStream#window.timeBatch(5 min)select roomNo, avg(temp) as avgTempgroup by roomNoinsert into HotRoomsStream ;
Siddhi Query : Joindefine stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream(deviceID long, roomNo int, isOn bool);
Siddhi Query : Joindefine stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream(deviceID long, roomNo int, isOn bool);
from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.lenght(1) as R on T.roomNo == R.roomNoselect T.roomNo, R.deviceID, ‘start’ as actioninsert into RegulatorActionStream ;
Siddhi Query : Detect Trend
from t1=TempStream, t2=TempStream [t1.temp < t2.temp and t1.deviceID == t2.deviceID]+
within 5 minselect t1.temp as initialTemp, t2.temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ;
Siddhi Query : Partition
define partition Device by TempStream.deviceID ;
define partition Temp by range TempStream.temp <= 0 as ‘ICE’, range TempStream.temp > 0 and
TempStream.temp < 100 as ‘WATER’, range TempStream.temp > 100 as ‘VAPOUR’ ;
Siddhi Query : Detect Trend per Partition
define partition Device by TempStream.deviceID ;
from t1=TempStream, t2=TempStream [t1.temp < t2.temp and t1.deviceID == t2.deviceID]+
within 5 minselect t1.temp as initialTemp, t2.temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream partition by Device ;
Siddhi Query : Detect Pattern
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 10] -> a3= ..) -> a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]
within 1 dayselect a1.cardNo as cardNo, a2.price as price, a2.place as placeinsert into PotentialFraud ;
Siddhi Query : Define Event Table
define table CardUserTable (name string, cardNum long) ;
define table CardUserTable (name string, cardNum long) from (‘datasource.name’=‘CardDataSource’, ‘table.name’=‘UserTable’, ‘caching.algorithm’=‘LRU’) ;
Cache types supported● Basic: A size-based algorithm based on FIFO.● LRU (Least Recently Used): The least recently used event is dropped
when cache is full.● LFU (Least Frequently Used): The least frequently used event is dropped
when cache is full.
Siddhi Query : Query Event Table
define stream Purchase (price double, cardNo long, place string);
define table CardUserTable (name string, cardNum long) ;
from Purchase#window.length(1) join CardUserTableon Purchase.cardNo == CardUserTable.cardNum
select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price
insert into PurchaseUserStream ;
Siddhi Query : Insert into Event Table
define stream FraudStream (price double, cardNo long, userName string);
define table BlacklistedUserTable (name string, cardNum long) ;
from FraudStreamselect userName as name, cardNo as cardNuminsert into BlacklistedUserTable ;
Siddhi Query : Update into Event Table
define stream LoginStream (userID string, islogin bool, loginTime long);
define table LastLoginTable (userID string, time long) ;
from LoginStreamselect userID, loginTime as timeupdate LastLoginTable
on LoginStream.userID == LastLoginTable.userID ;
Siddhi Extensions
● Function extension● Aggregator extension● Window extension● Transform extension
Siddhi Query : Function Extension
from TempStreamselect deviceID, roomNo,
custom:toKelvin(temp) as tempInKelvin, ‘K’ as scale
insert into OutputStream ;
Siddhi Query : Aggregator Extension
from TempStreamselect deviceID, roomNo, temp
custom:stdev(temp) as stdevTemp, ‘C’ as scale
insert into OutputStream ;
Siddhi Query : Window Extension
from TempStream#window.custom:lastUnique(roomNo,2 min)
select *insert into OutputStream ;
Siddhi Query : Transform Extension
from XYZSpeedStream#transform.custom:getVelocityVector(v,vx,vy,vz)
select velocity, directioninsert into SpeedStream ;
CEP Event Adaptors
● For receiving and publishing events● Has the configurations to connect to external endpoints● Has many-to-one relationship with Event Streams
CEP Event Adaptors
Support for several transports (network access)● SOAP● HTTP● JMS● SMTP● SMS● Thrift● Kafka
Supporting data formats ● XML● JSON● Map● Text● WSO2Event - WSO2 data format over Thrift for High Performant Event transfer
supporting Java/C/C++/C# via Thrift language bindings
CEP Event Adaptors
Supports database writes using Map messages● Cassandra ● MYSQL ● H2
Supports custom event adaptors via its pluggable architecture!
Monitoring & Debugging : Event Flow
● Visualization of the Event Stream flow in CEP● Helps to get the big picture ● Good for debugging
Monitoring & Debugging : Event Tracer
• Dump message traces in a textual format • Before and after processing each stage of event flow
Monitoring & Debugging : Event Statistics
• Real-time statistics• via visual illustrations & JMX • Time based request & response counts • Stats on all components of CEP server
Real Time Dashboard
• Provides tools to configure gadgets• Currently supports RDBMS only
• Powered by WSO2 User Engagement Server ( WSO2UES)
• Same JVM Performance (Siddhi with Esper, M means a Million) 4 core machine
• Filters 8M Events/Sec vs Esper 2M • Window 2.5M Events/Sec vs. Esper 1M• Patterns 1.4M Events/Sec about 10X faster than Esper
• Over the Network Performance (Using thrift based WSO2 event format) - 8 core machine• Filter 0.25M (or 250K) Event/Sec
Performance Results
CEP High Availability
Execution plan in “RedundantNode” based distributed processing mode
<executionPlan name="RedundantNodeExecutionPlan" statistics="enable" trace="enable" xmlns="http://wso2.org/carbon/eventprocessor"> ... <siddhiConfiguration> <property name="siddhi.enable.distributed.processing">RedundantNode</property> <property name="siddhi.persistence.snapshot.time.interval.minutes">0</property> </siddhiConfiguration> ...</executionPlan>
HA / Persistence
• Option 1: Side by side • Recommended• Takes 2X hardware• Gives zero down time
• Option 2: Snapshot and restore• Uses less HW • Will lose events between snapshots• Downtime while recovery • ** Some scenarios you can use event tables to keep
intermediate state
Scaling
• Vertically scaling• Can be distributed as a pipeline
• Horizontally scaling• Queries like windows, patterns, and Join have shared states• Hard to distribute!
Scaling (Contd.)
• Currently users have to setup the pipeline manually (WSO2 team can help)
• Work is underway to support above pipeline and distributer operators out of the box
Lambda Architecture
Demo
Scenario
MyPizzaShop – On time delivery or free Pizza Offer !!!
Order Event
{ "event": { "correlationData": { "orderNo": "0023" }, "payloadData": { "orderInfo": "2 L PEPPERONI", "amount": "25.70", "name": "James Mark", "address": "29BX Finchwood Ave, Clovis, CA 93611", "tpNo": "(626)446-4601" } }}
Delivered to customer event
correlation_orderNo:23,isDelivered:true
Email Notification
Hi Alis Miranda
Your order for 1 L CHICKEN pizza will be delivered in 30 mins to779 Burl Ave, Clovis, CA 93611.
The total cost of the order is $14.5.If you didn't get the pizza within 30 min you will be eligible to have those pizzas for
free..!!
MyPizzaShop
Final Payment Notification
<event xmlns="http://wso2.org/carbon/event"> <correlationData> <orderNo>3</orderNo> </correlationData> <payloadData> <name>James Clark</name> <amount>54.0</amount> </payloadData> </event>
Thank You