Transcript

Norikra in action

Data/Stream Processing Meetup (2013/06/28)TAGOMORI Satoshi (@tagomoris)

13年6月29日土曜日

TAGOMORI Satoshi (@tagomoris)LINE corp.

Ruby, Perl, Node.js, Hadoop, ...

13年6月29日土曜日

13年6月29日土曜日

System OverviewWeb Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers

GraphTools

Notifications(IRC)

Hadoop Cluster(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Norikra

13年6月29日土曜日

Stream queryCustom fluentd plugin: not so casual enoughxQL: declarative languagestreams processing

for optional data fieldsno more schema management

connectivity with Fluentd

13年6月29日土曜日

Stream query: vs stored data query

No more query wait time

Immediate result for time batch

No more storages

No more query execution management

Once register query, runs forever

13年6月29日土曜日

Norikra: is not for only Fluentd.

13年6月29日土曜日

Norikra query: vs Fluentd custom plugin

SQL!!!

No more restart for new queries

register queries whenever we want

No more private plugins

No more fat Fluentd configurations

13年6月29日土曜日

Norikra

Full feature of Esper over JRuby

Simple RPC: msgpack-rpc-over-http

Simple RPC Server: mizuno (jetty + rack)

Simple Client Library: norikra-client

Just same code for cruby/jruby

13年6月29日土曜日

Norikra

Norikra Server (on JVM)

Esper Instance (Query Engine)

Type DefinitionManager

Output Event Pool

Norikra Engine

RPC Servermizuno (Jetty + Rack)

Rack RPC HandlerNorikraClient

NorikraClient

JRUBY

CRUBY

msgpack-rpc-over-http

13年6月29日土曜日

Norikra Query: target "sales"

goods_id:5 price:49.8 num:1 shop:"LINE"goods_id:2 price:12.5 num:3 shop:"Cookpad"goods_id:4 price:36.6 num:10 shop:"Cookpad"

SELECT shop, sum(price*num) AS amountFROM sales.win:time_batch(10 minutes)GROUP BY shop

goods_id:5 price:49.8 num:1 shop:"LINE"

goods_id:2 price:12.5 num:3 shop:"Cookpad" affiliate:"BiS"

SELECT affiliate, count(*) AS cntFROM sales.win:time_batch(1 hour)GROUP BY affiliate

13年6月29日土曜日

Esper and NorikraEsper:

queries for streamsstream: a set of field-type pairs of eventsusers need to know for field set variations(or manage 'map subtypes' on your own)

Norikra:queries for targetstarget: virtual name of union of field set variationusers don't need to know for detail of target

13年6月29日土曜日

automated stream inheritanceof norikra's target

Base typedef

Query typedef

Data typedef

b_xxxxxxxxx

minimal fieldset definition:

name: 'string'id: 'long'

valid: 'boolean'action_type: 'string'

13年6月29日土曜日

automated stream inheritanceof norikra's target

Base typedef

Query typedef

Data typedef

b_xxxxxxxxx

event data fieldset definition:

name: 'string'id: 'long'

valid: 'boolean'action_type: 'string'

product_code: 'string'charge: 'integer'shop_code: 'long'e_xxxxxxxx1

13年6月29日土曜日

automated stream inheritanceof norikra's target

Base typedef

Query typedef

Data typedef

b_xxxxxxxxx

e_xxxxxxxx1 e_xxxxxxxx2

event data fieldset definition:name: 'string'

id: 'long'valid: 'boolean'

action_type: 'string'product_code: 'string'

charge: 'integer'shop_code: 'long'affiliate: 'string'

13年6月29日土曜日

automated stream inheritanceof norikra's target

Base typedef

Query typedef

Data typedef

b_xxxxxxxxx

e_xxxxxxxx1 e_xxxxxxxx2

new query:SELECT count(*)

FROM target.win:time_batch(1min)WHERE affiliate.length() > 0

13年6月29日土曜日

automated stream inheritanceof norikra's target

Base typedef

Query typedef

Data typedef

b_xxxxxxxxx

e_xxxxxxxx1 e_xxxxxxxx2'

event data fieldset definition:

name: 'string'id: 'long'

valid: 'boolean'action_type: 'string'

affiliate: 'string'

q_xxxxxxxx0

new query:SELECT count(*)

FROM target.win:time_batch(1min)WHERE affiliate.length() > 0

13年6月29日土曜日

automated stream inheritanceof norikra's target

Base typedef

Query typedef

Data typedef

b_xxxxxxxxx

e_xxxxxxxx1 e_xxxxxxxx2'

q_xxxxxxxx0

Registered EPL:SELECT count(*)

FROM q_xxxxxxxx0.win:time_batch(1min)WHERE affiliate.length() > 0

13年6月29日土曜日

automated stream inheritanceof norikra's target

Base typedef

Query typedef

Data typedef

b_xxxxxxxxx

e_xxxxxxxx1' e_xxxxxxxx2'

q_xxxxxxxx0

e_xxxxxxxx3'

q_xxxxxxxx1

13年6月29日土曜日

Output data pooling

Output event data: pushed

Event pushing brings many problems

Pooling + fetch

typical usecase: aggregation

-> not so many outputs

13年6月29日土曜日

fluent-plugin-norikra

Fluentd plugin to use Norikra

Norikra server autostart

Automatically defined target

Pre-defined queries for each targets

13年6月29日土曜日

fluent-plugin-norikra

installation

`gem install fluent-plugin-norikra`

configuration

see DEMO

13年6月29日土曜日

Demo: bootstrap

rbenv shell jruby-1.7.4gem install norikrawhich norikrarbenv shell 2.0.0-pxxxgem install fluent-plugin-norikravi demo.conffluentd -c demo.conf

13年6月29日土曜日

Demo: query streams

some messages over fluent-cat

register queries with norikra-client

more messages over fluent-cat & norikra-client

13年6月29日土曜日

roadmap of norikraNorikra is still UNDER DEVELOPMENT

Norikra feature updates (JOINs, etc)Web GUI

query & target list managementsave & restore metadata

Distributed & orchestrated nodes

13年6月29日土曜日

See also:http://fluentd.org/http://fluentd.org/plugin/https://github.com/tagomoris/norikrahttps://github.com/tagomoris/norikra-clienthttps://github.com/tagomoris/fluent-plugin-norikrahttp://esper.codehaus.org/

"Fluentd: The ruby based middleware across the world"http://www.slideshare.net/tagomoris/fluentd-in-tkrk10

"Log analysis system with Hadoop in livedoor 2013 Winter"http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013

13年6月29日土曜日

top related