Top Banner
Drinking from the Firehose Real-Time Metrics Samantha Quiñones
88

Drinking from the Firehose - Real-time Metrics

Jul 15, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Drinking from the Firehose - Real-time Metrics

Drinking from the Firehose Real-Time Metrics

Samantha Quiñones

Page 2: Drinking from the Firehose - Real-time Metrics

@ieatkillerbees http://samanthaquinones.com

Page 3: Drinking from the Firehose - Real-time Metrics
Page 4: Drinking from the Firehose - Real-time Metrics

“How would you let editors test how well

different headlines perform for the same

piece of content?”

Page 5: Drinking from the Firehose - Real-time Metrics

Measuring User Behavior

• Application path

• Use patterns

• Mouse & attention tracking

Page 6: Drinking from the Firehose - Real-time Metrics
Page 7: Drinking from the Firehose - Real-time Metrics

Multivariate Testing

• Sort all users in to groups

• 1 control group receives unaltered content

• 1 or more groups receive altered content

• Measure behavioral statistics (CTR, abandon rate, time on page, scroll depth) for each group

Page 8: Drinking from the Firehose - Real-time Metrics

State Monitoring

• Debugging

• Load Monitoring

Page 9: Drinking from the Firehose - Real-time Metrics

And then…?

Page 10: Drinking from the Firehose - Real-time Metrics

• Augmented intelligence for content creators

• Quality prediction

Page 11: Drinking from the Firehose - Real-time Metrics

What if content could change itself based on the weather?

Page 12: Drinking from the Firehose - Real-time Metrics

Managing Big Data

Page 13: Drinking from the Firehose - Real-time Metrics

How big is big?

Page 14: Drinking from the Firehose - Real-time Metrics

1,300,000,000,000events per

DAY

Page 15: Drinking from the Firehose - Real-time Metrics

~40 datapointsper

EVENT

Page 16: Drinking from the Firehose - Real-time Metrics

~15,000 eventsper

SECOND

Page 17: Drinking from the Firehose - Real-time Metrics

~600,000 datapoints

Containing

Page 18: Drinking from the Firehose - Real-time Metrics

25 megabytes / second

At a rate up to

Page 19: Drinking from the Firehose - Real-time Metrics

CollectorCollectorCollectorCollector CollectorCollectorCollectorCollector CollectorCollectorCollectorCollector

Rabbit MQ Farm

Page 20: Drinking from the Firehose - Real-time Metrics

Rabbit MQ Farm

Hadoop

Page 21: Drinking from the Firehose - Real-time Metrics

Hadoop

• Framework for distributed storage and processing of data

• Designed to make managing very large datasets simple with…

• Well-documented, open-source, common libraries

• Optimizing for commodity hardware

Page 22: Drinking from the Firehose - Real-time Metrics

Hadoop Distributed File System

• Modeled after Google File System

• Stores logical files across multiple systems

• Rack-aware

• No read-write concurrency

Page 23: Drinking from the Firehose - Real-time Metrics

MapReduce

• Framework for massively parallel data processing tasks

Page 24: Drinking from the Firehose - Real-time Metrics

Map<?php $document = "I'm a little teapot short and stout here is my handle here is my spout";

/** * Outputs: [0,0,0,0,0,0,0,0,1,0,0,0,1,0,0] */ function map($target_word, $document) { return array_map( function ($word) use ($target_word) { if ($word === $target_word) { return 1; } return 0; }, preg_split('/\s+/', $document) ); } echo json_encode(map("is", $document)) . PHP_EOL;

Page 25: Drinking from the Firehose - Real-time Metrics

Reduce<?php $data = [0,0,0,0,0,0,0,0,1,0,0,0,1,0,0];

/** * Outputs: 2 */ function reduce($data) { return array_reduce( $data, function ($count, $value) { return $count + $value; } ); } echo reduce($data) . PHP_EOL;

Page 26: Drinking from the Firehose - Real-time Metrics

Hadoop Limitations

• Hadoop jobs are batched and take significant time to run

• Data may not be available for 1+ hours after collection

Page 27: Drinking from the Firehose - Real-time Metrics

“How would you let editors test how well

different headlines perform for the same

piece of content?”

Page 28: Drinking from the Firehose - Real-time Metrics

Consider Shelf-life

• Most articles are relevant for < 24 hours

• Interest peaks < 3 hours

Page 29: Drinking from the Firehose - Real-time Metrics

Real-Time Pipelines

Page 30: Drinking from the Firehose - Real-time Metrics

CollectorCollectorCollectorCollector

CollectorCollectorCollectorCollector

CollectorCollectorCollectorCollector

Rabbit MQ Farm

CollectorCollectorCollectorStreamer

CollectorCollectorCollectorStreamer

CollectorCollectorCollectorStreamer

Page 31: Drinking from the Firehose - Real-time Metrics
Page 32: Drinking from the Firehose - Real-time Metrics
Page 33: Drinking from the Firehose - Real-time Metrics

Version 1 (PoC)

CollectorCollectorCollectorStreamer CollectorCollectorCollectorReceiver CollectorCollectorCollectorStatsD Cluster

ElasticSearch

Page 34: Drinking from the Firehose - Real-time Metrics
Page 35: Drinking from the Firehose - Real-time Metrics

this.visit = function(record) { if (record.userAgent) { var parser = new UAParser(); parser.setUA(record.userAgent); var user_agent = parser.getResult(); return { user_agent: user_agent } } return {}; };

Page 36: Drinking from the Firehose - Real-time Metrics
Page 37: Drinking from the Firehose - Real-time Metrics

Findings

• Max throughput per collector: 300 events/second

• ~70 receivers needed for prod

• StatsD key format creates data redundancy and reduced data richness

Page 38: Drinking from the Firehose - Real-time Metrics

Version 1 (PoC)

CollectorCollectorCollectorStreamer CollectorCollectorCollectorReceiver CollectorCollectorCollectorStatsD Cluster

ElasticSearch

Page 39: Drinking from the Firehose - Real-time Metrics

Transits & Terminals

• Transits - Short-term, in-memory, volatile storage for data with a life-span up to a few seconds

• Terminals - Destinations for data that either store, abandon, or transmit

Page 40: Drinking from the Firehose - Real-time Metrics

An efficient real-time data pathway consists

of a network of transits and terminals, where

no single node acts as both a transit and a

terminal at the same time.

Page 41: Drinking from the Firehose - Real-time Metrics

StatsD

• Acts as a transit, taking data and passing it along…

• BUT

• Acts as a terminal, aggregating keys in memory and becoming a transit after a time or buffer threshold.

Page 42: Drinking from the Firehose - Real-time Metrics

Version 2

CollectorCollectorCollectorStreamer CollectorCollectorCollectorReceiver ElasticSearchRabbitMQ

Page 43: Drinking from the Firehose - Real-time Metrics
Page 44: Drinking from the Firehose - Real-time Metrics

RabbitMQ

• Lightweight message broker

• Allows complex message routing without application-level logic

• Can buffer 90-120 seconds of traffic

Page 45: Drinking from the Firehose - Real-time Metrics

Version 2

• Eliminated eventing and improved performance

• Replaced StatsD with RabbitMQ

• Data records are kept together

• No longer works with Kibana (sadface)

Page 46: Drinking from the Firehose - Real-time Metrics

while (buffer.length > 0) { var char = buffer.shift(); if ('\n' === char) { queue.push(new Buffer(outbuf.join(''))); continue; } outbuf.push(char); }

var i = 0; var tBuf = buffer.slice(); while (i < buffer.length) { var char = tBuf[i++]; if ('\n' === char) { queue.push(new Buffer(outbuf.join(''))); } outbuf.push(char); }

Page 47: Drinking from the Firehose - Real-time Metrics

Findings

• Max throughput per collector: 600 events/second

• ~35 receivers needed for prod

• Micro-optimized code became increasingly brittle and hard to maintain as custom logic was needed for every edge case

Page 48: Drinking from the Firehose - Real-time Metrics

Version 2

CollectorCollectorCollectorStreamer CollectorCollectorCollectorReceiver ElasticSearchRabbitMQ

Page 49: Drinking from the Firehose - Real-time Metrics

Need to Get Serious

• Very high throughput

• Multi-threaded worker pool with large memory buffers

• Static & dynamic optimization

• Efficient memory management for extremely volatile in-memory data

• Eliminate any processing overhead. Receiver must be a Transit

Page 50: Drinking from the Firehose - Real-time Metrics

And also…

• Not GoLang (because no one on the team is familiar with it)

• Not Rust (because no one on the team wants to be familiar with it)

• Not C (because C)

Page 51: Drinking from the Firehose - Real-time Metrics

mfw java :(

Page 52: Drinking from the Firehose - Real-time Metrics
Page 53: Drinking from the Firehose - Real-time Metrics

Why Java?

• Solid static & dynamic analysis and optimizations in the S2BC & JIT compilers

• Clients for the stuff I needed to talk to

• Well-supported within AOL & within my team

Page 54: Drinking from the Firehose - Real-time Metrics

Version 3

CollectorCollectorCollectorStreamer CollectorCollectorCollectorReceiver

ElasticSearch

RabbitMQ

CollectorCollectorCollectorProcessor/ Router

Page 55: Drinking from the Firehose - Real-time Metrics
Page 56: Drinking from the Firehose - Real-time Metrics

public class StreamReader { private static final Logger logger = Logger.getLogger(StreamReader.class.getName()); private StreamerQueue queue = new StreamerQueue(); private StreamProcessor processor; private List<StreamReader.BeaconWorkerThread> workerThreads = new ArrayList(); private RtStreamerClient client;

public StreamReader(String streamerURI, AmqpClient amqpClient, String appID, String tpcFltrs, String rfFltrs, String bt) { ArrayList queueList = new ArrayList(); this.processor = new StreamProcessor(amqpClient); byte numThreads = 8;

for(int i = 0; i < numThreads; ++i) { StreamReader.BeaconWorkerThread worker = new StreamReader.BeaconWorkerThread(); this.workerThreads.add(worker); worker.start(); }

queueList.add(this.queue); this.client = new RtStreamerClient(streamerURI, appID, tpcFltrs, rfFltrs, bt, queueList); } }

Page 57: Drinking from the Firehose - Real-time Metrics

public class StreamProcessor { private static final Logger logger = Logger.getLogger(StreamProcessor.class.getName()); private AmqpClient amqpClient;

public StreamProcessor(AmqpClient amqpClient) { this.amqpClient = amqpClient; }

public void send(String data) throws Exception { this.amqpClient.send(data.getBytes()); logger.debug("Sent event " + data + " to AMQP"); } }

Page 58: Drinking from the Firehose - Real-time Metrics

Que

ue

Que

ue

Que

ue

Que

ue

Que

ue

Que

ue

Que

ue

Que

ue

Que

ue

Que

ue

Que

ue

Network Input

Network Output

Linked List Queues

Page 59: Drinking from the Firehose - Real-time Metrics

Findings

• Max throughput per collector: 2600 events/second

• ~10 receivers needed for prod

Page 60: Drinking from the Firehose - Real-time Metrics
Page 61: Drinking from the Firehose - Real-time Metrics

Why ElasticSearch

• Open-source Lucene search engine

• Highly-distributed storage engine

• Clusters nicely

• Built-in aggregations like whoa

Page 62: Drinking from the Firehose - Real-time Metrics

Aggregations

• Geographic Boxing & Radius Grouping

• Time-Series

• Histograms

• Min/Max/Avg Statistical Evaluation

• MapReduce (coming soon!)

Page 63: Drinking from the Firehose - Real-time Metrics

• How many users viewed my post on an android tablet in portrait mode within 10 miles of Denton, TX?

• What is the average time from start of page-load to first click for readers on linux desktops between 3am and 5am?

• Given two sets of link texts, which has the higher CTR for a randomized sample of readers on tablet devices?

Page 64: Drinking from the Firehose - Real-time Metrics

Browser to Browser in < 5 seconds

Page 65: Drinking from the Firehose - Real-time Metrics

But wait…

Is that “real-time”?

Page 66: Drinking from the Firehose - Real-time Metrics
Page 67: Drinking from the Firehose - Real-time Metrics

Real-Time for Real

• Live analysis of data as it is collected

• Active visualization of very short-term trends in data

Page 68: Drinking from the Firehose - Real-time Metrics

Potential Problems

• Small sample sizes for new datasets / small analysis windows

• Data volumes too high for end-user comprehension

• Data volumes too high for end-user hardware/network connections

Page 69: Drinking from the Firehose - Real-time Metrics

Version 4

CollectorCollectorCollectorStreamer CollectorCollectorCollectorReceiver

ElasticSearch

RabbitMQ

CollectorCollectorCollectorProcessor/ Router

Websocket Server

Page 70: Drinking from the Firehose - Real-time Metrics
Page 71: Drinking from the Firehose - Real-time Metrics

D3JS

• Open-source data visualization library written in JavaScript

Page 72: Drinking from the Firehose - Real-time Metrics

function plot(point) { var points = svg.selectAll("circle") .data([point], function(d) { return d.id; });

points.enter() .append("circle") .attr("cx", function (d) { return projection([parseInt(d.location.geopoint.lon), parseInt(d.location.geopoint.lat)])[0] }) .attr("cy", function (d) { return projection([parseInt(d.location.geopoint.lon), parseInt(d.location.geopoint.lat)])[1] }) .attr("r", function (d) { return 1; }) .style('fill', 'red') .style('fill-opacity', 1) .style('stroke', 'red') .style('stroke-width', '0.5px') .style('stroke-opacity', 1) .transition() .duration(10000) .style('fill-opacity', 0) .style('stroke-opacity', 0) .attr('r', '32px').remove(); }

var buffer = []; var socket = io(); socket.on('geopoint', function(point) { if (point.location.geopoint) { plot(point); } });

Page 73: Drinking from the Firehose - Real-time Metrics
Page 74: Drinking from the Firehose - Real-time Metrics
Page 75: Drinking from the Firehose - Real-time Metrics

By the way…

xn = x + (r * COS(2π * n / v)) yn = y + (r * COS(2π * n / v))

where n = ordinal of vertex and where v = number of vertices and

x,y = center of the polygon

Page 76: Drinking from the Firehose - Real-time Metrics

var views = 0; var socket = io(); socket.on('pageview', function(point) { views++; });

function tick() { data.push(views); views = 0;

path .attr("d", line) .attr("transform", null) .transition() .duration(500) .ease("linear") .attr("transform", "translate(" + x(0) + ",0)") .each("end", tick);

data.shift(); }

tick();

Page 77: Drinking from the Firehose - Real-time Metrics

Pageview Heartbeat

Page 78: Drinking from the Firehose - Real-time Metrics
Page 79: Drinking from the Firehose - Real-time Metrics

Real-Time Profiling

Page 80: Drinking from the Firehose - Real-time Metrics

Receiver Layer

Receiver Buffer/Transit

Processing & Routing Layer

Processing & Routing Transit

Storage Engine End-User Consumable Queues

Layers are • Geographically decoupled • Capable of independent scaling • Fully encapsulated with no cross-

layer dependencies

Page 81: Drinking from the Firehose - Real-time Metrics

Interfaces

Input Stream (Java)

Routing (node.js)

Filtering (node.js)

Aggregation (PHP)

Visualization (D3JS)

MV Testing (PHP)

Page 82: Drinking from the Firehose - Real-time Metrics

Languages & Tools

RabbitMQ

Hadoop

Elastic Search

PHP

JS (node)

JS (D3)

Java

MySQL

Page 83: Drinking from the Firehose - Real-time Metrics

Where are we Now?

• It took 6 months to build a rock-solid data pipeline

• Entry points from:

• User data collectors

• Application code

Page 84: Drinking from the Firehose - Real-time Metrics

That was the easy part.

Page 85: Drinking from the Firehose - Real-time Metrics

What’s next?

Page 86: Drinking from the Firehose - Real-time Metrics

• Live debugging & runtime profiling

• Embeddable visualizations

• On-demand stream filters

• Predictive performance analysis

• Real-time sentiment analysis

Page 87: Drinking from the Firehose - Real-time Metrics

???

Page 88: Drinking from the Firehose - Real-time Metrics

@ieatkillerbees http://samanthaquinones.com

https://joind.in/13742