Top Banner
Counters for real-time statistics Aug 2011
18

Counters for real-time statistics Aug 2011

Jan 07, 2016

Download

Documents

Osvaldo Mendez

Counters for real-time statistics Aug 2011. Quick Cassandra storage primer. Standard columns. Idempotent writes – last client time stamp wins Store byte [] - can have validators No internal locking Not read before write Example: set Users['ecapriolo']['fname']='ed';. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Counters for  real-time statistics  Aug 2011

Counters for real-time statistics

Aug 2011

Page 2: Counters for  real-time statistics  Aug 2011

Quick Cassandra storage primer

Page 3: Counters for  real-time statistics  Aug 2011

Standard columns

Idempotent writes – last client time stamp wins Store byte [] - can have validators No internal locking Not read before write Example:

set Users['ecapriolo']['fname']='ed';

Page 4: Counters for  real-time statistics  Aug 2011

Counter columns

Store Integral values only Can be incremented or decremented with single

RPC Local read before write Merged on read Example:

incr followers['ecapriolo']['x'] by 30

Page 5: Counters for  real-time statistics  Aug 2011

Counters combine powers with:

composite keys: incr stats['user/date']['page'] by 1; scale to distribute writes

A distributed system to record events Pre-caclulated real time stats

And you get:

Page 6: Counters for  real-time statistics  Aug 2011

Other ways to collect and report

Store in files, process into reports Example: data-> hdfs -> hive queries -> reports Light work on front end Heavy on back end

Store into relational database Example:

data -> rdbms (ind) -> rt queries & reports -> reports Divides work between front end and back end Indexes can become choke points

Page 7: Counters for  real-time statistics  Aug 2011

Example data set

url | username | event_time | time_to_serve_millis

/page1.htm | edward | 2011-01-02 :04:01:04 | 45

/page1.htm | stacey | 2011-01-02 :04:01:05 | 46

/page1.htm | stacey | 2011-01-02 :04:02:07 | 40

/page2.htm | edward | 2011-01-02 :04:02:45 | 22

Page 8: Counters for  real-time statistics  Aug 2011

“Query” one: hit count bucket by minute

page | time | count

/page1.htm | 2011-01-02 :04:01 | 2

/page1.htm | 2011-01-02 :04:02 | 1

/page2.htm | 2011-01-02 :04:02 | 1

Page 9: Counters for  real-time statistics  Aug 2011

“Query” two: resources consumed by user per hour

user | time | total_time_to_serve

edward | 2011-01-02 :04 | 67

stacey | 2011-01-02 :04 | 86

Page 10: Counters for  real-time statistics  Aug 2011

Turn a record line into a pojo

class Record {

String url,username;

Date date;

int timeToServe;

}

Use your imagination here:

public static List<Record> readRecords(String file) throws Exception {

Page 11: Counters for  real-time statistics  Aug 2011

writeRecord() Method

public static void writeRecord(Cassandra.Client c, Record r) throws Exception {

DateFormat bucketByMinute = new SimpleDateFormat("yyyy-MM-dd HH:mm");

DateFormat bucketByDay = new SimpleDateFormat("yyyy-MM-dd");

DateFormat bucketByHour = new SimpleDateFormat("yyyy-MM-dd HH");

Page 12: Counters for  real-time statistics  Aug 2011

“Query” 1 page counts by minute

CounterColumn counter = new CounterColumn();

ColumnParent cp = new ColumnParent("page_counts_by_minute");

counter.setName(ByteBufferUtil.bytes (bucketByMinute.format(r.date)));

counter.setValue(1);

c.add( ByteBufferUtil.bytes(

bucketByDay.format(r.date)+"-"+r.url)

, cp, counter, ConsistencyLevel.ONE);

Page 13: Counters for  real-time statistics  Aug 2011

“Query” 2 usage by users per hour

CounterColumn counter2 = new CounterColumn();

ColumnParent cp2 = new ColumnParent ("user_usage_by_minute");

counter2.setName( ByteBufferUtil.bytes(

bucketByHour.format(r.date)));

counter2.setValue(r.timeToServe);

c.add(ByteBufferUtil.bytes(

bucketByDay.format(r.date)+"-"+r.username)

, cp2, counter2, ConsistencyLevel.ONE);

Page 14: Counters for  real-time statistics  Aug 2011

How this works

Page 15: Counters for  real-time statistics  Aug 2011

Results

[default@counttest] list user_usage_by_minute;

——————-

RowKey: 2011-01-02- stacey

=> (counter=2011-01-02 04, value=86)

——————-

RowKey: 2011-01-02- edward

=> (counter=2011-01-02 04, value=67)

Page 16: Counters for  real-time statistics  Aug 2011

More Results

[default@counttest] list page_counts_by_minute;

——————-

RowKey: 2011-01-02-/page1.htm

=> (counter=2011-01-02 04:01, value=2)

=> (counter=2011-01-02 04:02, value=1)

——————-

RowKey: 2011-01-02-/page2.htm

=> (counter=2011-01-02 04:02, value=1)

Page 17: Counters for  real-time statistics  Aug 2011

Recap

Counters pushed work to the “front end” Data is bucketed, sorted, and indexed on insert Data is already “ready” on read Designed around how you want to read data

Distributed writes across the cluster Bucketed data by time, user, page, etc. Different then table/index contention point

Page 18: Counters for  real-time statistics  Aug 2011

Questions?Full code at: http://www.jointhegrid.com/highperfcassandra/?cat=7