Top Banner
MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering
50

MySQL Performance Monitoring

May 11, 2015

Download

Technology

MySQL Performance Monitoring using Statsd and Graphite
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MySQL Performance Monitoring

MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering

Page 2: MySQL Performance Monitoring

2  

1.  Who  are  we?  2.  What  monitoring  tools  do  we  use?  3.  What  are  StatsD,  Collectd  and  Graphite?  4.  How  MySQL  logs  to  StatsD  5.  Graphing  examples  6.  Challenges  7.  QuesHons?  

Overview

Page 3: MySQL Performance Monitoring

Who are we? Who  is  Spil  Games?    

Page 4: MySQL Performance Monitoring

4  

•  Company  founded  in  2001  •  350+  employees  world  wide  •  180M+  unique  visitors  per  month  •  Over  50M  registered  users  •  45  portals  in  19  languages  

•  Casual  games  •  Social  games  •  Real  Hme  mulHplayer  games  •  Mobile  games  

•  35+  MySQL  clusters  •  60k  queries  per  second  (3.5  billion  qpd)  

Facts

Page 5: MySQL Performance Monitoring

5  

Geographic Reach 180  Million  Monthly  AcHve  Users(*)  

Source:  (*)  Google  Analy3cs,  August  2012    

Page 6: MySQL Performance Monitoring

6  

Girls,  Teens  and  Family    

spielen.com  juegos.com  gamesgames.com  games.co.uk  

Brands

Page 7: MySQL Performance Monitoring

Monitoring

We  use(d)  many  many  many  monitoring  tools  so  far!    

Page 8: MySQL Performance Monitoring

8  

•  Opsview/Nagios  (mainly  availability)  •  CacH  (using  Baron  Schwartz/Percona  templates)  •  MONYog  •  Good  ol’  RRD  

Existing monitoring systems we use(d)

Page 9: MySQL Performance Monitoring

9  

Opsview/Nagios

•  Strong  points:  •  Easy  to  create  (nagios)  plugins  •  Slaves  for  scaling  out  

•  Weak  points:  •  Stats  gathering  through  polling  •  Low  granularity  (1  to  5  minutes)  •  Difficult  URIs  for  graphs  

Page 10: MySQL Performance Monitoring

10  

Cacti

•  Strong  points:  •  Awesome  Percona  templates  •  Great  overviews  and  graphs  

•  Weak  points:  •  Hard  to  add  new  metrics  (to  90+  servers)  •  Not  scalable  •  Low  granularity  (1  to  5  minutes)  •  Hard  to  correlate  

Page 11: MySQL Performance Monitoring

11  

MonYOG

•  Strong  points:  •  Easy  to  set  up  •  Compare  any  server  with  another  •  Compare  configuraHons  

•  Weak  points:  •  “Closed  source”  •  Not  scalable  •  Jack  of  all  trades  

Page 12: MySQL Performance Monitoring

12  

Poll limitations

•  Limited  to  a  set  interval  •  Data  gets  averaged  out  •  (Host)  checks  are  run  serial  •  Slowdowns  in  a  run  means  no/less  data  •  Scaling:  add  more  masters/slaves  •  Sekng  up  an  SSH  connecHon  is  slow  

Page 13: MySQL Performance Monitoring

13  

Difficult to add a new metric host065!bash-3.2# netstat -s | grep "listen queue"!    26 times the listen queue of a socket overflowed!!host066!bash-3.2# netstat -s | grep "listen queue"!    33 times the listen queue of a socket overflowed!

Page 14: MySQL Performance Monitoring

14  

Other things you can’t do!

Page 15: MySQL Performance Monitoring

Statsd + Collectd + Graphite What  are  they?    

Page 16: MySQL Performance Monitoring

16  

•  Highly  scalable  real-­‐Hme  graphing  system  •  Collects  numeric  Hme-­‐series  •  Backend  daemon  Carbon  

•  Carbon-­‐cache:  receives  data  •  Carbon-­‐aggregator:  aggregates  data  •  Carbon-­‐relay:  replicaHon  and  sharding    

•  RRD  or  Whisper  database  

What is Graphite?

Page 17: MySQL Performance Monitoring

17  

•  Each  metric  is  in  its  own  bucket  •  Periods  make  folders  •  prod.syseng.mmm.<hostname>.admin_offline  

•  Metric  types  •  Counters  •  Gauge  

•  RetenHon  can  be  set  using  a  regex  •  [mysql]    •  pasern  =  ^prod\.syseng\.mysql\..*$    •  retenHons  =  2s:1d,1m:3d,5m:7d,1h:5y  

Graphite’s capabilities

Page 18: MySQL Performance Monitoring

18  

•  Unix  daemon  that  gathers  system  staHsHcs  •  Over  90  (input/output)  plugins  •  Plugin  to  send  metrics  to  Graphite/Carbon  •  Very  useful  for  system  metrics  

What is Collectd?

Page 19: MySQL Performance Monitoring

19  

•  Front-­‐end  proxy  for  Graphite/Carbon  (by  Etsy)  •  NodeJS  daemon  (also  other  languages)  •  Receives  UDP  (on  localhost)  •  Buffers  metrics  locally  •  Flushes  periodically  data  to  Graphite/Carbon  (TCP)  •  Client  libraries  available  in  about  any  language  •  Send  any  metric  you  like!  

What is StatsD?

Page 20: MySQL Performance Monitoring

20  

•  StatsD  funcHons  •  update_stats  •  increment/decrement  •  set  •  gauge  •  Hmers  

StatsD functions

Page 21: MySQL Performance Monitoring

21  

PHP:  $statsd = new StatsD();!$statsd->increment(“prod.app1.pages_rendered”, 1);!$statsd->gauge(“prod.app1.page_concurrency”, 10);!$statsd->set(“prod.app1.unique_users”, $userid);!…!$start = microtime(true); !serve_out_content_to_clients(); !$statsd->timing(”prod.app1.rendering_time", (microtime(true) - $start) * 1000);!!Library:!https://github.com/etsy/statsd/blob/master/examples/php-example.php!!

StatsD PHP code examples

Page 22: MySQL Performance Monitoring

22  

Our Graphite cluster(s)

Client  requesHng  graphs  

Graphite  Rendering  Cluster   Carbon  relay  

Loadbalancer  (port  443)  

DEV   SYSENG   SERVICES1   SERVICES2  

Server-­‐1   Server-­‐2   Server-­‐n  

Loadbalancer  (port  2003)  

8 nodes

3 nodes 2 nodes

Page 23: MySQL Performance Monitoring

23  

Graphite Storage Clusters

Page 24: MySQL Performance Monitoring

24  

Collectd

Collectd  

Gather  data  plugins  

CPU   DISK   LOAD   ….  

Carbon  TCP  

30 second interval

Page 25: MySQL Performance Monitoring

25  

StatsD

StatsD  

ApplicaHon  Level  

#  OF  LOGINS   CACHE  HIT/MISS   STATUS   INNODB  STATUS  

Carbon  TCP  

2 second interval

MySQL_Statsd  

localhost:8125 UDP

Page 26: MySQL Performance Monitoring

26  

Global scale?

Page 27: MySQL Performance Monitoring

MySQL + StatsD

How  do  we  use  them?    

Page 28: MySQL Performance Monitoring

28  

•  MySQL  plugin  for  Collectd  •  Sends  SHOW  STATUS  •  No  INNODB  STATUS  •  Plugin  not  flexible  

•  DBI  plugin  for  Collectd  •  Metrics  based  on  columns  

•  Different  granularity  needed  •  Separate  daemon  (with  persistent  connecHon)  •  StatsD  is  easy  as  ABC  

Why use StatsD over Collectd?

Page 29: MySQL Performance Monitoring

29  

•  Wrisen  in  Python  •  Gathers  data  every  0.5  seconds  •  Sends  to  StatsD  (localhost)  a�er  every  run  •  Easy  to  set  up:  no  configuraHon  •  Persistent  connecHon  •  Baron  Schwartz’  InnoDB  status  parser  (cacH  poller)  •  Other  interesHng  metrics  and  counters  

•  InformaHon  Schema  •  MySQL  5.5/5.6  Performance  Schema  •  MariaDB  specific  •  Galera  specific  

MySQL StatsD daemon

Page 30: MySQL Performance Monitoring

30  

MySQL StatsD overview

MySQLCollector

SHOW STATUS

SHOW INNODB STATUS

SHOW VARIABLES

Persistentconnection

StatsD

Flushedevery

0.5 seconds

Page 31: MySQL Performance Monitoring

31  

•  Perl  (Net::Statsd)  •  Sends  any  status  change  to  StatsD  (localhost)  •  Non-­‐blocking  (thanks  to  UDP)  •  Draw  as  infinite  in  Graphite  

MySQL Multi Master patch

Page 32: MySQL Performance Monitoring

32  

use Net::Statsd;!$Net::Statsd::HOST = 'localhost'; # Default!$Net::Statsd::PORT = 8125; # Default!!…!!# ONLINE -> HARD_OFFLINE!unless ($ping && $mysql) {! Net::Statsd::update_stats('prod.syseng.mmm.'.$host.'.hard_offline', 1);! FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK'));! $agent->state('HARD_OFFLINE');!}!!…!!

MMM Perl code example

Page 33: MySQL Performance Monitoring

33  

•  Deployments  •  User  iniHated  acHons  

•  Logins  •  High  scores  •  Comments  /  raHngs  •  Images  uploaded  •  Payments  

•  ApplicaHon  metrics  •  Error  counts  •  Cache  staHsHcs  (cache  hit/miss)  •  Request  Hmers  •  Image  sizes  

Other metrics

Page 34: MySQL Performance Monitoring

Start graphing! Now  it  starts  to  get  interes=ng!  

Page 35: MySQL Performance Monitoring

35  

•  IdenHfy  your  KPIs  •  Don’t  graph  everything  

•  More  graphs  ==  less  overview  •  Combine  metrics  •  Stack  clusters  

What is important for you?

Page 36: MySQL Performance Monitoring

36  

•  Include  other  metrics  into  your  graphs  •  Deployments  •  Failover(s)  

•  Combine  applicaHon  metrics  with  your  database  •  Other  influences  

•  Solar  flares  •  Start  of  the  new  Maya  calendar  

Correlate!

Page 37: MySQL Performance Monitoring

37  

•  URI  based  rendering  API  •  Support  for  wildcards  

•  stats.prod.syseng.mysql.*.status.com_select  •  sumSeries  (stats.prod.syseng.mysql.*.status.com_select)    •  aliasByNode(stats.prod.syseng.mysql.*.status.com_select,  4)    

•  Many  funcHons  •  Nth  percenHle  •  Holt-­‐Winters  Forecast  •  Timeshi�  

Graphite Graphing Engine

Page 38: MySQL Performance Monitoring

38  

Graphite Aggregator syseng => {!           nodes => [”databasehost1", ”databasehost2"],!           copying_relay_instances => 8,!           hashing_relay_instances => 8,!           cache_instances => 8,!           aggregation => {!               0 => {!                   name => ”mysql",!                   pattern => '.*\.mysql\..*',!                   send_raw => 1,!               },!           }!       }!!!stats.<env>.syseng.mysql.cluster1.status.questions.all (2) = !

!sum stats.<env>.syseng.mysql.*.status.questions!!

Page 39: MySQL Performance Monitoring

39  

Graphite web interface

               

Page 40: MySQL Performance Monitoring

40  

Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&until=23%3A59_20130421!

Page 41: MySQL Performance Monitoring

41  

Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&until=23%3A59_20130421!

Page 42: MySQL Performance Monitoring

42  

Other examples: MMM

Page 43: MySQL Performance Monitoring

43  

Other examples: timeshift

Page 44: MySQL Performance Monitoring

44  

Other examples: multiple weeks

Page 45: MySQL Performance Monitoring

Challenges The  road  ahead  

Page 46: MySQL Performance Monitoring

46  

•  MySQL_statsd  rewrite  necessary  (not  opensource  yet)  •  No  alerHng  through  Graphite  (yet)  •  Machine  learning  •  Eternal  hunger  for  more  metrics  •  Abuse  of  the  system  

What challenges do we have?

Page 47: MySQL Performance Monitoring

47  

•  Persistent  connecHons  +  repeatable  read  •  History  list  skyrocketed  

•  Too  many  metrics  slows  down  graphing  •  Too  many  metrics  can  kill  a  host  

•  EstatsD  for  Erlang  

What lessons have we learned?

Page 48: MySQL Performance Monitoring

Questions…

Page 49: MySQL Performance Monitoring

49  

•  Graphite:  hsp://graphite.readthedocs.org/en/latest/  •  Collectd:  hsps://collectd.org/  •  StatsD  on  Github  by  Etsy:  hsps://github.com/etsy/statsd/wiki  •  Etsy  on  StatsD:  hsp://codeascra�.etsy.com/2011/02/15/measure-­‐anything-­‐measure-­‐everything/    

Practical links

Page 50: MySQL Performance Monitoring

50  

•  PresentaHon  can  be  found  at:  hsp://spil.com/perconasc2013  •  If  you  wish  to  contact  me:  [email protected]  •  Don’t  forget  to  rate  my  talk!  

Thank you!