Top Banner
Adam Hitchcock @NorthIsUp Making DISQUS Realtime
41

2012 07 making disqus realtime@euro python

Jul 16, 2015

Download

Technology

Adam Hitchcock
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2012 07 making disqus realtime@euro python

Adam Hitchcock @NorthIsUp

Making DISQUS Realtime

Page 2: 2012 07 making disqus realtime@euro python

what is DISQUS?

Page 3: 2012 07 making disqus realtime@euro python
Page 4: 2012 07 making disqus realtime@euro python

why do realtime?

๏ getting new data to the user asap ๏ increased engagement ๏ looks awesome ๏ we can sell it

Page 5: 2012 07 making disqus realtime@euro python
Page 6: 2012 07 making disqus realtime@euro python

how many of you currently have a realtime component?

Page 7: 2012 07 making disqus realtime@euro python

realtime

๏ polls memcache ๏ is kinda #failscale

Page 8: 2012 07 making disqus realtime@euro python

DISQUS sees a lot of traffic

Google Analytics: May 29 2012 - June 28 2012

Page 9: 2012 07 making disqus realtime@euro python

realertime

๏ currently active on all DISQUS 2012 sites

๏ tested ‘dark’ on ~50% of our network ๏ 1.5 million concurrently connected users ๏ 45 thousand new connections per second ๏ 165 thousand messages/second ๏ ~.2 seconds latency end to end

Page 10: 2012 07 making disqus realtime@euro python

so, how did we do it?

Page 11: 2012 07 making disqus realtime@euro python

technology

๏ node.js and mongodb for webscale

Page 12: 2012 07 making disqus realtime@euro python

technology

๏ just kidding :) we used python

Page 13: 2012 07 making disqus realtime@euro python

technology๏ gevent ๏ gunicorn ๏ flask ๏ thoonk (a queue built on redis)

๏ redis ๏ nginx ๏ haproxy

Page 14: 2012 07 making disqus realtime@euro python

architecture overview

Page 15: 2012 07 making disqus realtime@euro python

architecture overview

redis pub/sub

redis queue

“Frontend” Gunicorn and Flask“Backend”

Gevent server

New Postsdjango redis pub/sub

nginx + haproxy

Page 16: 2012 07 making disqus realtime@euro python

architecture overviewDISQUS

Formatter

Multiplexe

Publisher

Listener

Sub Pool

Requests

Incoming HTTP requests from the

redis pub/sub

thoonk queue

New Posts redis pub/sub

Page 17: 2012 07 making disqus realtime@euro python

the backend

Page 18: 2012 07 making disqus realtime@euro python

the backend

๏ listens to a Thoonk queue ๏ cleans & formats message

๏ this is the final format before http publish

๏ compress data now ๏ publish message to pubsub

๏ forum:id, thread:id, user:id, post:id

Formatter

Multiplexe

Publisher

Page 19: 2012 07 making disqus realtime@euro python

the backend

๏ average processing time is ~0.2 seconds ๏ queue maintenance

๏ ACK timeouts (5 secondsish)

Page 20: 2012 07 making disqus realtime@euro python

random redis lessons

๏ separate pub/sub and non pub/sub redis usage by physical node

๏ transactions can be prickly

Page 21: 2012 07 making disqus realtime@euro python

the backend

# redis key for the 'claimed' zset claimed = thoonk_worker.feed_claimed

# what jobs to re-queue too_late = int((time() - MAX_AGE) * 1000)

# get and cancel jobs job_ids = redis.zrange(claimed, 0, too_late) if len(job_ids): for job_id in job_ids: thoonk_worker.cancel(job_id)

Page 22: 2012 07 making disqus realtime@euro python

gevent is nice

# the code is too big to show here, so just import it # http://bitly.com/geventspawn

from realertime.lib.spawn import Watchdog from realertime.lib.spawn import TimeSensitiveBackoff

Page 23: 2012 07 making disqus realtime@euro python

the frontend

Page 24: 2012 07 making disqus realtime@euro python

the frontend

๏ needs to be fast! ๏ pools redis connections ๏ routes messages from pubsub to http

Page 25: 2012 07 making disqus realtime@euro python

the frontend

๏ new request! ๏ create/register a subscription

with the pool ๏ sub pool returns a (python)

queue based on the channel

Listener

Sub Pool

Requests

Page 26: 2012 07 making disqus realtime@euro python

the frontend

๏ Listener receives message on a pubsub channel

๏ If that channel has a subscriber pass it on

๏ subscriber then passes message on to all appropriate requests

Listener

Sub Pool

Requests

Page 27: 2012 07 making disqus realtime@euro python

long pollingish

๏ long held http connection ๏ stream JSON over this http connection

Page 28: 2012 07 making disqus realtime@euro python

long pollingishdef __subscription_generator(self, q): #Returns a generator for the WSGI response try: to = Timeout(self.timeout_duration) to.start()

while True: queue_data = q.get() # one per line yield queue_data['data'] + '\n'

except Timeout, t: if t is to: pass else: raise t finally: self.unsubscribe(q)

Page 29: 2012 07 making disqus realtime@euro python

pooling redis pub/sub

# old way was pretty failscale def subscribe(redis, channel): pubsub = redis.pubsub() pubsub.subscribe(channel) with Timeout(30): while True: yield pubsub.listen()

Page 30: 2012 07 making disqus realtime@euro python

pooling redis pub/sub

pipe = Queue() pipe.put(‘subscribe’, ‘thread:12345’) pipe.put(‘unsubscribe’, ‘forum:cnn’)

... elsewhere ...

# new way is def listener(pubsub, pipe): for data in pubsub.listen(): # handle data here...

# handle new subscriptions if not pipe.empty(): action, channel = pipe.get_nowait() getattr(pubsub, action)(channel)

Page 31: 2012 07 making disqus realtime@euro python

timeouts?

๏ needless reclaiming of ‘resources’

๏ maximize usage of cheap things ๏ connection count

๏ minimize expensive things ๏ requests per second

Page 32: 2012 07 making disqus realtime@euro python

test, measure, repeat

Page 33: 2012 07 making disqus realtime@euro python

testing

๏ Darktime ๏ use existing network to loadtest ๏ (user complaints when it didn’t work...)

๏ Darkesttime ๏ load testing a single thread

๏ have knobs you can twiddle

Page 34: 2012 07 making disqus realtime@euro python

stats

๏ measure all the things! ๏ especially when the numbers don’t line up ๏ is hard in distributed systems ๏ try to express things as +1 and -1 if you can ๏ i used scales from greplin “metrics for py”

Page 35: 2012 07 making disqus realtime@euro python

lessons

๏ do hard work early ๏ defer work that you might never need ๏ end-to-end acks are good, but expensive ๏ timeouts are not free ๏ greenlets are effectively free ๏ pubsub is effectively free

Page 36: 2012 07 making disqus realtime@euro python

nginx lessons

location / { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_redirect off;

# this line is really important proxy_buffering off;

if (!-f $request_filename) { proxy_pass http://app_server; break; } }

http://gunicorn.org/deploy.html

Page 37: 2012 07 making disqus realtime@euro python

slide full o’ links๏ Gevent (python coroutines and greenlets)

http://gevent.org/ ๏ Gunicorn (python pre-fork WSGI server)

http://gunicorn.org/ ๏ Thoonk (redis queue)

https://github.com/andyet/thoonk.py ๏ Sentry (log aggregation)

https://github.com/dcramer/sentry ๏ Scales (in-app metrics)

https://github.com/Greplin/scalescode.disqus.com

Page 38: 2012 07 making disqus realtime@euro python

special thanks

๏ the team at DISQUS ๏ especially our dev-ops guys ๏ and jeff who had to review all my code

Page 39: 2012 07 making disqus realtime@euro python

open questions

๏ best system config for thousands of rps? ๏ how to make the front end faster?

๏ something faster than pywsgi? ๏ FapWS?

๏ libevent -> libev? (i.e. gevent 1.0) ๏ dump wsgi for raw sockets? (last resort)

๏ best internal python pub/sub option?

Page 40: 2012 07 making disqus realtime@euro python

DISQUSsion?

psst, we’re hiringdisqus.com/jobs

Page 41: 2012 07 making disqus realtime@euro python

Adam Hitchcock @NorthIsUp

Making DISQUS Realtime