Scaling Realtime at DISQUS

Post on 22-Oct-2015

52 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Scaling Realtime at DISQUS

Transcript

Adam Hitchcock@NorthIsUp

Scaling Realtime at DISQUS

Sunday, 17 March, 13

Sunday, 17 March, 13

Adam Hitchcock@NorthIsUp

Scaling Realtime at DISQUS

Sunday, 17 March, 13

we’re hiringdisqus.com/jobs

If this is interesting to you...

Sunday, 17 March, 13

what is DISQUS?

Sunday, 17 March, 13

Sunday, 17 March, 13

why do realtime?

๏ getting new data to the user asap๏ for increased engagement๏ and it looks awesome๏ and we can sell (or trade) it

Sunday, 17 March, 13

DISQUS sees a lot of traffic

Google Analytics: Feb 2013 - March 2012

Sunday, 17 March, 13

realertime

๏ currently active on all DISQUS sites

๏ tested ‘dark’ on our existing network๏ during testing:

๏ 1.5 million concurrently connected users๏ 45 thousand new connections per second๏ 165 thousand messages/second๏ <.2 seconds latency end to end

Sunday, 17 March, 13

so, how did we do it?

Sunday, 17 March, 13

Node.js and MongoDB!

Sunday, 17 March, 13

Node.js and MongoDB!

Sunday, 17 March, 13

This is PyCon.We used Python.

Sunday, 17 March, 13

and some otherTechnology You Know™

Sunday, 17 March, 13

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

architecture overview

Sunday, 17 March, 13

old-june

memcache

New Posts memcache

DISQUS embed clients

DISQUS

poll memcacheever 5 seconds

Sunday, 17 March, 13

june-july

redis pub/sub

New Posts redis pub/sub

DISQUS embed clients

DISQUS

HA Proxy

Flask FEcluster

Sunday, 17 March, 13

HA Proxy

july-october

Flask FEcluster

redis queue

“python glue”Gevent server

New Posts redis pub/sub

DISQUS embed clientsredis pub/sub

DISQUS

“python glue”Gevent server

Sunday, 17 March, 13

HA Proxy

august-october

Flask FEcluster

redis queue

“python glue”Gevent server

New Posts redis pub/sub

DISQUS embed clientsredis pub/sub

DISQUS

“python glue”Gevent server

2

14 BIG 6 servers

5 servers

Sunday, 17 March, 13

HA Proxy

august-october

Flask FEcluster

redis queue

“python glue”Gevent server

New Posts redis pub/sub

DISQUS embed clientsredis pub/sub

DISQUS

“python glue”Gevent server

2

6 servers

5 servers

2 for

14 BIG lots of servers,we can do better

Sunday, 17 March, 13

“python glue”Gevent server

october-now

nginx+

push streammodule

redis queue

New Posts ngnix pub endpoint

DISQUS embed clientshttp post

DISQUS

Sunday, 17 March, 13

“python glue”Gevent server

october-now

nginx+

push streammodule

redis queue

New Posts ngnix pub endpoint

DISQUS embed clientshttp post

DISQUS

2

5

Why still 5 for this?Network memory restriction, we

can’t fix this without kernel hacking, tweaking, etc.

(if you know how, tell us, then apply for a job, then fix it for us)

Sunday, 17 March, 13

october-now

django

Formatter

Publishers

thoonk queue

http post

ngnix pub endpoint

DISQUS embed clientsother realtime

stuff

nginx+

push streammodule

New Posts

Sunday, 17 March, 13

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

the thoonk queue

๏ django post_save and post_delete hooks๏ thoonk is a queue on top of redis๏ implemented as a DFA๏ provides job semantics

๏ useful for end to end acking๏ reliable job processing in distributed system

๏ did I mention it’s on top of redis?๏ uses zset to store items == ranged queries

Sunday, 17 March, 13

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

the python glue

๏ listens to a thoonk queue๏ cleans & formats message

๏ this is the final format for end clients

๏ compress data now๏ publish message to nginx and

other firehoses๏ forum:id, thread:id, user:id,

post:id

Formatter

Publishers

Sunday, 17 March, 13

gevent is nice

# the code is too big to show here, so just import it# http://bitly.com/geventspawn

from realertime.lib.spawn import Watchdogfrom realertime.lib.spawn import TimeSensitiveBackoff

Sunday, 17 March, 13

data pipelines

class Pipeline(object): def parse_data(self, data): raise NotImplemented('No ParserMixin used')

def compute_data(self, data, parsed_data): raise NotImplemented('No ComputeMixin used')

def publish_data(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used')

def handle(self, data): parsed_data = self.parse_data(data) computed_data = self.compute_data(data, parsed_data) return self.publish_data(data, parsed_data, computed_data)

Sunday, 17 March, 13

Example Mixinsclass JSONParserMixin(Pipeline): def parse_data(self, data): return json.loads(data)

class AnnomizeDataMixin(Pipeline): def parse_data(self, data, parsed_data): return {}

class SuperSecureEncryptDataMixin(Pipeline): def parse_data(self, data, parsed_data): return parsed_data.encode('rot13')

class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u

class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data)

Sunday, 17 March, 13

Finished Pipeline

class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass

class JSONSecureHTTPPipeline( JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass

class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass

Sunday, 17 March, 13

real live DISQUS codeclass FEOrbitalNginxMultiplexer(

SchemaTransformerMixin, JSONFormatterMixin, SelfChannelsMixin, HTTPPublisherMixin):

def __init__(self, domains, api_version=1): schema_namespace = 'orbital' self.channels = ('orbital', )

super(FEOrbitalNginxMultiplexer, self).__init__(domains=domains, api_version=api_version, schema_namespace=schema_namespace)

class FEPublicAckingMultiplexer( PublicTransformerMixin, JSONFormatterMixin, FEChannelsMixin, ThoonkQueuePubSubPublisherMixin):

def __init__(self, domains, api_version): schema_namespace = 'general' super(FEPublicAckingMultiplexer, self).__init__(domains=domains, api_version=api_version, schema_namespace=schema_namespace)

Sunday, 17 March, 13

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

nginx push stream

๏ follow John Watson (@wizputer) for updated #humblebrags as we ramp up traffic

๏ an example config can be found here:http://bit.ly/disqus-nginx-push-stream

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

nginx push stream

๏ Replaced webservers and Redis Pub/Sub๏ But starting with Pub/Sub was important for

us๏ Encouraged us to over publish on keys

Sunday, 17 March, 13

nginx push stream

๏ Turned on for 70% of our network...๏ ~950K subscribers (peak single machine)๏ peak 40 MBytes/second (per machine)๏ CPU usage is still well under 15%

๏ 99.845% active writes (the socket is written to often enough to come up as ACTIVE)

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

config push stream

location = /pub { allow 127.0.0.1; deny all;

push_stream_publisher admin; set $push_stream_channel_id $arg_channel;}

location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(.*)$ { # Url encoding things? $1%3A2$2 set $push_stream_channels_path $1:$2;

push_stream_subscriber streaming; push_stream_content_type application/json; }}

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

examples

# Subscurl -s 'localhost/sub/forum/cnn'curl -s 'localhost/sub/thread/907824578'curl -s 'localhost/sub/user/northisup'

# Pubscurl -s -X POST 'localhost/pub?channel=forum:cnn' \ -d '{"some sort": "of json data"}'

curl -s -X POST 'localhost/pub?channel=thread:907824578' \ -d '{"more": "json data"}'

curl -s -X POST 'localhost/pub?channel=user:northisup' \ -d '{"the idea": "I think you get it by now"}'

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

measure nginx

location = /push-stream-status { allow 127.0.0.1; deny all;

push_stream_channels_statistics; set $push_stream_channel_id $arg_channel;}

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

long(er) polling

onProgress: function () { var self = this; var resp = self.xhr.responseText; var advance = 0; var rows;

// If server didn't push anything new, do nothing. if (!resp || self.len === resp.length) return;

// Server returns JSON objects, one per line. rows = resp.slice(self.len).split('\n');

_.each(rows, function (obj) { advance += (obj.length + 1); obj = JSON.parse(obj); self.trigger('progress', obj); }); self.len += advance;}

Sunday, 17 March, 13

Soon... EventSource

// Currently EventSource has CORS issuesev = EventSource(dat_url);ev.addEventListener("Post", handlePostEvent);

Sunday, 17 March, 13

test, measure, repeat

Sunday, 17 March, 13

test

๏ Darktime๏ use existing network to load test๏ (user complaints when it didn’t work...)

๏ Darkesttime๏ load testing a single thread

๏ have knobs you can twiddle

Sunday, 17 March, 13

measure

๏ measure all the things!๏ especially when the numbers don’t line up๏ measuring is hard in distributed systems๏ try to express things as +1 and -1 if you

can๏ Sentry for measuring exceptions

Sunday, 17 March, 13

pretty graphs

Sunday, 17 March, 13

how does it really scale?

POPE

white smokefrancis announced

Sunday, 17 March, 13

maths

Sunday, 17 March, 13

it’s been a busy few weeks

Sunday, 17 March, 13

wha?

๏ People do weird stuff with your stuff๏ turned off this server in Oct 2012๏ Still getting 100 req/sec

Sunday, 17 March, 13

lessons

๏ do hard (computation) work early๏ end-to-end acks are good, but expensive๏ redis/nginx pubsub is effectively free

Sunday, 17 March, 13

If this was interesting to you...

psst, we’re hiringdisqus.com/jobs

Sunday, 17 March, 13

special thanks

๏ the team at DISQUS๏ like jeff a.k.a. @nfluxx who had to review all

my code๏ and especially our dev-ops guys๏ like john watson a.k.a. @wizputer who

found the nginx-push-stream module

psst, we’re hiringdisqus.com/jobs

Sunday, 17 March, 13

slide full o’ links

๏ Nginx push stream modulehttp://wiki.nginx.org/HttpPushStreamModule

๏ Thoonk (redis queue)http://github.com/andyet/thoonk.py

๏ Sentry (distributed traceback aggregation)http://github.com/dcramer/sentry

๏ Gevent (python coroutines and greenlets)http://gevent.org/

๏ Scales (in-app metrics)http://github.com/Greplin/scales

code.disqus.com

Sunday, 17 March, 13

Come find me here!PyCon 2013

Santa Clara Convention CenterHall A-B

Santa Clara, CA

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

NOTE: - ALL BOOTHS ARE 10’x10’ UNLESS NOTED - (50) 10’x15’ BOOTHS - (64) 10’x10’ BOOTHS - (2) 10’x20’ BOOTH - (1) 8’x20’ BOOTH - ALL AISLES ARE 10’ UNLESS NOTED

20’ 20’

8’ 8’

LUNCH&

BREAKS

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

8’

20’ 20’

10’20’

19’

Revised 1/9/2013

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’10’x15’

10’x20’

10’x15’

10’x15’

10’x20’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

8’x20’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

Sunday, 17 March, 13

we are still hiring

psst, we’re hiringdisqus.com/jobs

Sunday, 17 March, 13

Questions I have

๏ What is the best kernel config for webscale concurrency. Nginx?

๏ I <3 gevent, but what if I want to pypy?๏ Nginx + lua? Seems kind of awesome.๏ Composing data pipelines: good or bad?๏ I didn’t have time to mention:

๏ Kafka, what is it good for?๏ Seriously, why not RabbitMQ?

Sunday, 17 March, 13

Adam Hitchcock@NorthIsUp

DISQUSsion?

Sunday, 17 March, 13

top related