Top Banner
Realtime Analytics with Cassandra Acunu Analytics Tom Wilkie, Acunu 21st August 2012
35

Realtime Analytics with Cassandra

Jan 24, 2015

Download

Technology

Acunu

My talk at NoSQL Now 2012
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Realtime Analytics with Cassandra

Realtime Analytics with Cassandra

Acunu Analytics

Tom Wilkie, Acunu21st August 2012

Page 2: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?

2

Page 3: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives

• What is it?• How does it work?• Approximate Analytics• Whats it good for?

3

Page 4: Realtime Analytics with Cassandra

Analytics

Why bother?

“Companies that can harness big data will trample data incompetents”

The Economist, May 26th 2011

4

Page 5: Realtime Analytics with Cassandra

Analytics

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

5

Page 6: Realtime Analytics with Cassandra

Analytics

Live & historicalaggregates... Trends... Drill downs

and roll ups

Combining “big” and “real-time” is hard

6

Page 7: Realtime Analytics with Cassandra

Analytics7

Solution Con

Scalability$$$

Not realtime

Spartan query semantics => complex, DIY solutions

Page 8: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?

• How does it work?• Approximate Analytics• Whats it good for?

8

Page 9: Realtime Analytics with Cassandra

Analytics

• Aggregate incrementally, on the fly• Store live + historical aggregates

events

counterupdates

Acunu Analytics

Click streamSensor data

etc

Page 10: Realtime Analytics with Cassandra

Analytics

{time : TIME(HOUR; MIN; SEC),page : PATH(/),category : STRING,loadTime : LONG

}

{select : ["COUNT", "AVG(loadTime)"],where : “time, ?path”,group : “time, ?category”

}

10

Page 11: Realtime Analytics with Cassandra

Analytics11

Dashboard UI

Page 12: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?

• Approximate Analytics• Whats it good for?

12

Page 13: Realtime Analytics with Cassandra

Analytics

countgrouped by ...

daycount

distinct (session)

count ... geography

... browseravg(duration)

13

Page 14: Realtime Analytics with Cassandra

Analytics

time : TIME(HOUR; MIN; SEC),cust_id : LONG,session_id : LONG,geography : STRING,browser : STRING,load_time : LONG

Data Definition

{ select: “COUNT” patterns: [ { where : “?time”, group : “?time” }, { where : “”, group : “geography” }, { where : “”, group : “browser” } ]}, { select: [“COUNT_DISTINCT(session_id)”, “AVG(load_time)”], where: “time”, group: “”}

QueryPatterns

14

Page 15: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

UK all→228 user01→1 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1904 ...

∅ all→87314 UK→238 US→354 ...

{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,

}

15

Page 16: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :00→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

16

{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,

}

Page 17: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

UK all→228 user01→1 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1904 ...

∅ all→87314 UK→238 US→354 ...

17

Page 18: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

18

where time 21:00-22:00count(*)

Page 19: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

19

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

Page 20: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

20

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

where geography=UK group all by user,

Page 21: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

21

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

where geography=UK group all by user,

count all

Page 22: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

22

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

where geography=UK group all by user,

count all

group all by geo

Page 23: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics

• Whats it good for?

23

Page 24: Realtime Analytics with Cassandra

Analytics

Approximate Analytics

Exact

Large ScaleReal-time

24

Page 25: Realtime Analytics with Cassandra

Analytics

Count Distinct

Plan A: keep a list of all the things you’ve seen count them at query time

Quick to update ... but at scale ...Takes lots of spaceTakes a long time to query

25

Page 26: Realtime Analytics with Cassandra

Analytics

Approximate Distinct

xitem

00101001110...

hash max so far

22leading zeroes

y 11010100111... 0 2z 00011101011... 3 3

...

max # leading zeroes seen so far

... to see a max of M takes about 2M items

26

Page 27: Realtime Analytics with Cassandra

Analytics

Approximate Distinct

to reduce var, average over m=2k sub-streams

xitem

00101001110...

hash

0, 0

index, zeroes max so far

0,0,0,0y 11010100111... 3, 1 0,0,1,0z 00011101011... 0, 1 1,0,1,0

...

take the harmonic mean

27

Page 28: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?

28

Page 29: Realtime Analytics with Cassandra

Analytics

Was it worth it?

29

Page 30: Realtime Analytics with Cassandra

Analytics

• Ad Hoc: same queries, but without the need to pre-define them

• Geolocation: support for location-based events and queries

• Drill down: see the events that make up any given aggregate

30

What’s Coming?

Page 31: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?

31

Page 32: Realtime Analytics with Cassandra

Analytics

Manufacturing

Systems Monitoring

Financial Services

Social Media Ad Analytics

Oil + Gas

Page 33: Realtime Analytics with Cassandra

Analytics

“Up and running in about 4 hours”

“We found out a competitor was scraping our data”

“We keep discovering use cases we hadn’t thought of ”

Page 34: Realtime Analytics with Cassandra

Analytics

Page 35: Realtime Analytics with Cassandra

Analytics

www.acunu.com @acunu

Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation.

35