Norikra in action Data/Stream Processing Meetup (2013/06/28) TAGOMORI Satoshi (@tagomoris) 13年6月29日土曜日
Norikra in action
Data/Stream Processing Meetup (2013/06/28)TAGOMORI Satoshi (@tagomoris)
13年6月29日土曜日
TAGOMORI Satoshi (@tagomoris)LINE corp.
Ruby, Perl, Node.js, Hadoop, ...
13年6月29日土曜日
13年6月29日土曜日
System OverviewWeb Servers Fluentd
Cluster
ArchiveStorage(scribed)
FluentdWatchers
GraphTools
Notifications(IRC)
Hadoop Cluster(HDFS, YARN)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
13年6月29日土曜日
Stream queryCustom fluentd plugin: not so casual enoughxQL: declarative languagestreams processing
for optional data fieldsno more schema management
connectivity with Fluentd
13年6月29日土曜日
Stream query: vs stored data query
No more query wait time
Immediate result for time batch
No more storages
No more query execution management
Once register query, runs forever
13年6月29日土曜日
Esper
"Esper and Event Processing Language (EPL) provide a highly scalable, memory-efficient, in-memory computing, SQL-standard, minimal latency, real-time streaming Big Data processing engine for medium to high-velocity and high-variety data."
http://esper.codehaus.org/
13年6月29日土曜日
Norikra: is not for only Fluentd.
13年6月29日土曜日
Norikra query: vs Fluentd custom plugin
SQL!!!
No more restart for new queries
register queries whenever we want
No more private plugins
No more fat Fluentd configurations
13年6月29日土曜日
Norikra
Full feature of Esper over JRuby
Simple RPC: msgpack-rpc-over-http
Simple RPC Server: mizuno (jetty + rack)
Simple Client Library: norikra-client
Just same code for cruby/jruby
13年6月29日土曜日
Norikra
Norikra Server (on JVM)
Esper Instance (Query Engine)
Type DefinitionManager
Output Event Pool
Norikra Engine
RPC Servermizuno (Jetty + Rack)
Rack RPC HandlerNorikraClient
NorikraClient
JRUBY
CRUBY
msgpack-rpc-over-http
13年6月29日土曜日
Norikra Query: target "sales"
goods_id:5 price:49.8 num:1 shop:"LINE"goods_id:2 price:12.5 num:3 shop:"Cookpad"goods_id:4 price:36.6 num:10 shop:"Cookpad"
SELECT shop, sum(price*num) AS amountFROM sales.win:time_batch(10 minutes)GROUP BY shop
goods_id:5 price:49.8 num:1 shop:"LINE"
goods_id:2 price:12.5 num:3 shop:"Cookpad" affiliate:"BiS"
SELECT affiliate, count(*) AS cntFROM sales.win:time_batch(1 hour)GROUP BY affiliate
13年6月29日土曜日
Esper and NorikraEsper:
queries for streamsstream: a set of field-type pairs of eventsusers need to know for field set variations(or manage 'map subtypes' on your own)
Norikra:queries for targetstarget: virtual name of union of field set variationusers don't need to know for detail of target
13年6月29日土曜日
automated stream inheritanceof norikra's target
Base typedef
Query typedef
Data typedef
b_xxxxxxxxx
minimal fieldset definition:
name: 'string'id: 'long'
valid: 'boolean'action_type: 'string'
13年6月29日土曜日
automated stream inheritanceof norikra's target
Base typedef
Query typedef
Data typedef
b_xxxxxxxxx
event data fieldset definition:
name: 'string'id: 'long'
valid: 'boolean'action_type: 'string'
product_code: 'string'charge: 'integer'shop_code: 'long'e_xxxxxxxx1
13年6月29日土曜日
automated stream inheritanceof norikra's target
Base typedef
Query typedef
Data typedef
b_xxxxxxxxx
e_xxxxxxxx1 e_xxxxxxxx2
event data fieldset definition:name: 'string'
id: 'long'valid: 'boolean'
action_type: 'string'product_code: 'string'
charge: 'integer'shop_code: 'long'affiliate: 'string'
13年6月29日土曜日
automated stream inheritanceof norikra's target
Base typedef
Query typedef
Data typedef
b_xxxxxxxxx
e_xxxxxxxx1 e_xxxxxxxx2
new query:SELECT count(*)
FROM target.win:time_batch(1min)WHERE affiliate.length() > 0
13年6月29日土曜日
automated stream inheritanceof norikra's target
Base typedef
Query typedef
Data typedef
b_xxxxxxxxx
e_xxxxxxxx1 e_xxxxxxxx2'
event data fieldset definition:
name: 'string'id: 'long'
valid: 'boolean'action_type: 'string'
affiliate: 'string'
q_xxxxxxxx0
new query:SELECT count(*)
FROM target.win:time_batch(1min)WHERE affiliate.length() > 0
13年6月29日土曜日
automated stream inheritanceof norikra's target
Base typedef
Query typedef
Data typedef
b_xxxxxxxxx
e_xxxxxxxx1 e_xxxxxxxx2'
q_xxxxxxxx0
Registered EPL:SELECT count(*)
FROM q_xxxxxxxx0.win:time_batch(1min)WHERE affiliate.length() > 0
13年6月29日土曜日
automated stream inheritanceof norikra's target
Base typedef
Query typedef
Data typedef
b_xxxxxxxxx
e_xxxxxxxx1' e_xxxxxxxx2'
q_xxxxxxxx0
e_xxxxxxxx3'
q_xxxxxxxx1
13年6月29日土曜日
Output data pooling
Output event data: pushed
Event pushing brings many problems
Pooling + fetch
typical usecase: aggregation
-> not so many outputs
13年6月29日土曜日
fluent-plugin-norikra
Fluentd plugin to use Norikra
Norikra server autostart
Automatically defined target
Pre-defined queries for each targets
13年6月29日土曜日
fluent-plugin-norikra
installation
`gem install fluent-plugin-norikra`
configuration
see DEMO
13年6月29日土曜日
Demo: bootstrap
rbenv shell jruby-1.7.4gem install norikrawhich norikrarbenv shell 2.0.0-pxxxgem install fluent-plugin-norikravi demo.conffluentd -c demo.conf
13年6月29日土曜日
Demo: query streams
some messages over fluent-cat
register queries with norikra-client
more messages over fluent-cat & norikra-client
13年6月29日土曜日
roadmap of norikraNorikra is still UNDER DEVELOPMENT
Norikra feature updates (JOINs, etc)Web GUI
query & target list managementsave & restore metadata
Distributed & orchestrated nodes
13年6月29日土曜日
See also:http://fluentd.org/http://fluentd.org/plugin/https://github.com/tagomoris/norikrahttps://github.com/tagomoris/norikra-clienthttps://github.com/tagomoris/fluent-plugin-norikrahttp://esper.codehaus.org/
"Fluentd: The ruby based middleware across the world"http://www.slideshare.net/tagomoris/fluentd-in-tkrk10
"Log analysis system with Hadoop in livedoor 2013 Winter"http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
13年6月29日土曜日