WSO2 Product Release Webinar WSO2 Complex Event Processor 2.0.1 Simplifying High Performant Data Processing S. Suhothayan (Suho) Software Engineer, Data Technologies Team.
WSO2 Product Release Webinar
WSO2 Complex Event Processor 2.0.1
Simplifying High Performant Data Processing
S. Suhothayan (Suho) Software Engineer,
Data Technologies Team.
Outline � What is Complex Event Processing? � WSO2 CEP Server & SOA integrates � The Siddhi Runtime CEP Engine. � High availability, Persistence and Scalability of
WSO2 CEP � How CEP can be combined with Business
Activity Monitoring (BAM). � Demo
Complex Event Processing ?
Complex Event processing is about listening to events and detecting patterns in
near real-time without storing all events.
WSO2 Inc. 4
CEP Is & Is NOT! � Is NOT!
o Simple filters - Simple Event Processing - E.g. Is this a gold or platinum customer?
o Joining multiple event streams - Event Stream Processing
� Is ! o Processing multiple event streams o Identify meaningful patterns among streams o Useing temporal windows
- E.g. Notify if there is a 10% increase in overall trading
activity AND the average price of commodities has
fallen 2% in the last 4 hours
WSO2 CEP Server � Enterprise grade server for CEP runtimes � Supports several transports (network access) � Supports several data formats � Support for multiple CEP runtimes � Governance � Monitoring � Tools (WSO2 Dev Studio)
CEP Brokers
� Is an adaptor for receiving and publishing events
� Has the configurations to connect to external endpoints
� Its many-to-many with CEP engine
CEP Brokers � Support for several transports (network access)
and data formats o SOAP/WS-Eventing
- XML messages o REST
- JSON messages o JMS
- Map messages - XML messages - Text messages
o SMTP (Email) - Text messages
o Thrift - WSO2 data format High Performant Event Capturing & Delivery Framework supports Java/C/C++/C# via Thrift language bindings - WSO2 Event
� & Brokers are pluggable !
CEP Buckets
� Is an isolated logical execution unit
� Each CEP bucket has a set of o Queries o Input & Output
event mappings. � Its one-to-one with
a CEP Backend Runtime Engine
Opensource CEP Runtimes for Buckets � Siddhi
o Apache License, a java library, Tuple based event model
o Supports distributed processing o Supports multiple query models
- Based on a SQL-like language - Filters, Windows, Joins, Ordering and others
� Esper, http://esper.codehaus.org (Deprecated) o GPLv2 License, a Java library, Events can be XML, Map,
Object o Supports multiple query models
- Based on a SQL-like language - Filters, Windows, Joins, Ordering and others
� Drools Fusion (Deprecated) o Apache License, a java library o Support for temporal reasoning + windows
Developer Studio UI
� Eclipse based tool to define buckets
� Can manage the configurations throughout the production lifecycle
� Note: 2.1.0 Still not support Text Output Mapping
Monitoring � Provides real-time statistical visual illustrations of
request & response counts per time based on CEP server, bucket, broker and topics.
Siddhi Queries � Filters and Projection � Windows
o Events are processed within temporal windows. (e.g. for aggregation and joins)
Time window vs. length window. � Joins
o Join two streams � Event ordering
o Identify event sequences and patterns
Filters
� Filters the events by conditions � Conditions
o >, <, = , <=, <=, != o contains, instanceof o and, or, not
� Example
from <stream-name> [<conditions>]* insert into <stream-name>
from cseEventStream[price >= 20 and symbol==’IBM’] insert into StockQuote symbol, volume
Window
� Types of Windows o (Time | Length) (Sliding| Batch) windows
� Type of aggregate functions o sum, avg, max, min
� Example
from <stream-name> [<conditions>]#window.<window-name>(<parameters>) Insert [<output-type>] into <stream-name
from cseEventStream[price >= 20]#window.lengthBatch(50) insert into StockQuote symbol, avg(price) as avgPrice group by symbol having avgPrice>50
Join
� Join two streams based on a condition and window � Unidirectional – event arriving only to the
unidirectional stream triggers join � Example
from <stream>#<window> [unidirectional] join <stream>#<window> on <condition> within <time> insert into <stream>
from TickEvent[symbol==’IBM’]#window.length(2000) join NewsEvent#window.time(5 min) insert into JoinStream *
Pattern
� Check condition A happen before/after condition B � Can do iterative checks via “every” keyword. � Here with “within <time>”, SIddhi emits only events
that are within that time of each other � Example
from [every] <condition> Æ [every] <condition> … <condition> within <time> insert into StockQuote (<attribute-name>* | * )
from every (a1 = purchase[price < 10] ) Æa2 = purchase [price >10000 and a1.cardNo==a2.cardNo]
within 1 day insert into potentialFraud a1.cardNo as cardNo, a2.price as price, a2.place as place
a1 x1 k5 a2 n7 y1
Sequence
� Regular Expressions supported o * - Zero or more matches (reluctant). o + - One or more matches (reluctant). o ? - Zero or one match (reluctant). o or – either event
� Here we have to refer events returned by * , + using square brackets to access a specific occurrence of that event
from <event-regular-expression> within <time> insert into <stream>
from a1 = requestOrder[action == "buy"], b1 = cseEventStream[price > a1.price and symbol==a1.symbol]+, b2 = cseEventStream[price <b1.price] insert into purchaseOrder a1. symbol as symbol, b1[0].price as firstPrice, b2.price as orderPrice
a1 b1 b1 b2 n7 y1
� We compared Siddhi with Esper, the widely used opensource CEP engine
� For evaluation, we did setup different queries using both
systems, push events in to the system, and measure the time till all of them are processed.
� We used Intel(R) Xeon(R) X3440 @2.53GHz , 4 cores 8M
cache 8GB RAM running Debian 2.6.32-5-amd64 Kernel
Performance Results
Simple filter without window
Performance Comparison With ESPER
from StockTick[prize >6] return symbol, price
State machine query for pattern matching
Performance Comparison With ESPER
From f=FraudWarningEvent -> p=PINChangeEvent(accountNumber=f.accountNumber) return accountNumber;
Performance of WSO2 CEP � Here we publihsed data from two client publisher
nodes to the CEP Sever node and sent the triggered notifications of CEP to a client subscriber node.
� To test the worsecase sinario, 100% of the data
published to CEP is recived at the subscriber node after processing (No data is filtered)
� We used Intel® Core™ i7-2630QM CPU @ 2.00GHz, 8
cores, 8GB RAM running Ubnthu 12.04, 3.2.0-32-generic Kernel, for running CEP and used Intel® Core™
i3-2350M CPU @ 2.30GHz, 4 cores, 4GB RAM running Ubnthu 12.04, 3.2.0-32-generic Kernel, for the three client nodes.
Simple filter without window
Performance of WSO2 CEP
from StockTick[prize >6] return symbol, price
1 2 3 4 5 6 7 8 9 10 50 100 Avg 67 135 181 210 212 232 245 250 234 186 187 112
0
50
100
150
200
250
300
kilo
Eve
nts/
Sec
# Clients
WSO2 CEP Throughput
HA/ Persistence � Ability to recover
runtime state in the case of a failure.
� Enables queries to span lifetimes much greater than server uptime.
� Takes periodic snapshots and stores all state information to a scalable persistence store (Apache Cassandra).
� Supports pluggable persistent stores.
Scaling � Vertically scaling
o Can be distributed as a pipeline � Horizontally scaling
o Queries like windows, patterns, and Join have shared states, hence hard to distribute!
o Use distributed cache (Hazelcast) to achieve this - shared memory and batch processing
Event Recording � Ability to record all/some of the events for
future processing � Few options
o Publish them to Cassandra cluster using WSO2 data bridge API or BAM (can process data in Cassandra with Hadoop using WSO2 BAM).
o Write them to distributed cache o Custom thrift based event recorder
Scenario � Monitoring stock exchange for game changing
moments � Two input event streams.
o Event stream of Stock Quotes from a stock exchange
o Event stream of word count on various company names from twitter pages
� Check whether the last traded price of the stock has changed significantly(by 2%) within last minute, and people are twitting about that company (> 10) within last minute
Input events � Input events are JMS Maps
o Stock Exchange Stream
Map<String, Object> map1 = new HashMap<String, Object>(); map1.put("symbol", "MSFT"); map1.put("price", 26.36); publisher.publish("AllStockQuotes", map1);
o Twitter Stream
Map<String, Object> map1 = new HashMap<String, Object>();
map1.put("company", "MSFT");
map1.put("wordCount", 8);
publisher.publish("TwitterFeed", map1);
Queries from allStockQuotes[win.time(60000)] insert into fastMovingStockQuotes symbol,price, avg(price) as averagePrice group by symbol having ((price > averagePrice*1.02) or (averagePrice*0.98 > price )) from twitterFeed[win.time(60000)] insert into highFrequentTweets company as company, sum(wordCount) as words group by company having (words > 10) from fastMovingStockQuotes[win.time(60000)] as fastMovingStockQuotes join highFrequentTweets[win.time(60000)] as highFrequentTweets on fastMovingStockQuotes.symbol==highFrequentTweets.company insert into predictedStockQuotes fastMovingStockQuotes.symbol as company, fastMovingStockQuotes.averagePrice as amount, highFrequentTweets.words as words
Alert � As a Email
Hi Within last minute, people being twitting about {company}
{words} times, and the last traded price of {company} has changed by 2% and now being trading at ${amount}.
From CEP
Useful links � WSO2 CEP 2.0.1
http://wso2.com/products/complex-event-processor/
� Distributed Processing Sample With Siddhi CEP and ActiveMQ JMS Broker.
http://suhothayan.blogspot.com/2012/08/distributed-processing-sample-for-wso2.html
� Creating Custom Data Publishers to BAM/CEP http://wso2.org/library/articles/2012/07/creating-custom-agents-publish-
events-bamcep
� WSO2 BAM 2.0.1 http://wso2.com/products/business-activity-monitor/