Top Banner
I got 10 trillion problems, but logging ain’t one John GrahamCumming
41

10 Trillion

Jan 12, 2016

Download

Documents

Dragan Zubac

Trillion http requests
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 10 Trillion

I  got  10  trillion  problems,  but  logging  ain’t  one

John  Graham-­‐Cumming

Page 2: 10 Trillion
Page 3: 10 Trillion
Page 4: 10 Trillion
Page 5: 10 Trillion
Page 6: 10 Trillion
Page 7: 10 Trillion
Page 8: 10 Trillion
Page 9: 10 Trillion
Page 10: 10 Trillion
Page 11: 10 Trillion

10  trillion  HTTP  requests  per  month

Page 12: 10 Trillion

4Mhz  log  lines

Page 13: 10 Trillion

A  log  processing*  company  that  also  runs  a  CDN  and  web  security  service

Page 14: 10 Trillion

A  log  processing*  company  that  also  runs  a  CDN  and  web  security  service

*not storage

Page 15: 10 Trillion

The  Data  Team400TB/day*

Page 16: 10 Trillion

The  Data  Team400TB/day** 146 PB per year

Page 17: 10 Trillion

The  Data  Team400TB/day** 146 PB per year* and that’s the

compressed size

Page 18: 10 Trillion

The  Data  Team400TB/day** 146 PB per year* and that’s the

compressed size

122,000,000,000 floppies

Page 19: 10 Trillion

Privacy

Page 20: 10 Trillion
Page 21: 10 Trillion

Three  Things

•Provide  charts  for  our  customers  

•Give  customers  data  about  attacks    

•Automatically  spot  attacks  in  real-­‐time

Page 22: 10 Trillion
Page 23: 10 Trillion
Page 24: 10 Trillion
Page 25: 10 Trillion

Things  we  ❤️• NGINX  +  LuaJIT  

• Cap’n  Proto  

• Apache  KaSa  

• Redis  

• Go  

• Postgres  and  CitusDB  

• Streaming  Algorithms

Page 26: 10 Trillion

Things  we  ❤️• NGINX  +  LuaJIT  

• Cap’n  Proto  

• Apache  KaSa  

• Redis  

• Go  

• Postgres  and  CitusDB  

• Streaming  Algorithms

Page 27: 10 Trillion

Things  we  ❤️• NGINX  +  LuaJIT  

• Cap’n  Proto  

• Apache  KaSa  

• Redis  

• Go  

• Postgres  and  CitusDB  

• Streaming  Algorithms

https://blog.cloudflare.com/tag/go/

Page 28: 10 Trillion

NGINX + LuaJIT

• Every  request  executes  Lua  code…  lots  

• LuaJIT  is  very  fast  

• Mixture  of  human-­‐written  Lua  and  generated  code  

• http://wiki.nginx.org/HttpLuaModule

Page 29: 10 Trillion

Lua  for  WAF

Page 30: 10 Trillion

Generated  Code

Page 31: 10 Trillion

NGINX + LuaJIT

• Go  program  receives  log  events  from  NGINX  in  Cap’n  Proto  format  

• Batches  events  

• Compresses  using  LZ4  

• Sends  via  TLS  to  Data

Page 32: 10 Trillion

Cap’n Proto• Insanely  fast  

• Saw  20x  speedup  over  cjson  

• It’s  a  wire  format  and  an  in  memory  representation  

• Extend  with  no  penalty  

• https://capnproto.org/  

• Our  interface  from  Luahttps://blog.cloudflare.com/introducing-­‐lua-­‐capnproto-­‐better-­‐serialization-­‐in-­‐lua/

Page 33: 10 Trillion

Apache  KaSa

• Fast,  scalable,  resilient  queue  

• Queue  on  a  cluster  not  a  single  machine  

• Allows  clusters  of  readers  to  process  queue  messages  

• https://kaSa.apache.org/

Page 34: 10 Trillion

Apache  KaSa• Cluster  of  Go  programs  process  log  messages  

• Generate  detailed  attack  logs  for  customers  

• Feed  aggregates  to  Postgres

Page 35: 10 Trillion

Attack  Log

Page 36: 10 Trillion

Postgres  and  CitusDB• Go  processes  produce  1  minute  roll  ups  of  customer  analytics  an  insert  into  CitusDB  

• Later  1  hour,  1  day  etc.  roll  ups  created  

• CitusDB  is  a  sharded,  replicated  Postgres  implementation  for  very  fast  queries  

• https://blog.cloudflare.com/scaling-­‐out-­‐postgresql-­‐for-­‐cloudflare-­‐analytics-­‐using-­‐citusdb/    

• https://www.citusdata.com/

Page 37: 10 Trillion
Page 38: 10 Trillion

• 128GB  RAM2x  Intel®  Xeon®  Processor  E5-­‐2630  v3  (16  cores) 10G  EthernetDisks  vary  

• Custom  made  for  us  by  Quanta

Page 39: 10 Trillion

• 40  machines  for  KaSa400TB  of  compressed,  replicated  500-­‐byte  log  linesIngest  at  ~15Gbps ~50TB  of  spinning  rust  per  node  

• 5  machines  for  CitusDBAnalytics  for  2  million  customers ~12TB  of  SSD  per  node  

• >  100  machines  for  consumers  written  in  Go The  analytics  roll  up  processesAttack  detectionBotnet  analysis

Page 40: 10 Trillion

Streaming  Algorithms• Space  Saving  AlgorithmEfficient  Computation  of  Frequent  and  Top-­‐k  Elements  in  Data  Streams https://icmi.cs.ucsb.edu/research/tech_reports/reports/2005-­‐23.pdfHasn’t  worked  well  for  ‘long  tail  data’  

• HyperLogLogCounting  distinct  elementshttps://github.com/aggregateknowledge/postgresql-­‐hll

Page 41: 10 Trillion

https://cloudflare.github.io/