Monitoring PostgreSQL using Time-Series systems like Graphite and/or Grafana with OpenCollector Jan Wieck - OpenSCG
Monitoring PostgreSQL using Time-Seriessystems like Graphite and/or Grafana with
OpenCollectorJan Wieck - OpenSCG
Overview
• What is Monitoring?• Graphite & Carbon• Grafana• Why use Carbon?• Why use Graphite AND Grafana?• PostgreSQL Metric Data• OpenCollector• osinfofdw
What is Monitoring?
• Capture Time-Series data• Metric-Name, Value, Timestamp
• Visualize Time-Series data• Define alerts base on Time-Series data• Statistical analysis of Time-Series data
• Getting an alert when your primary DB server isdown is covered by the above!
Graphite & Carbon
• Carbon is a server for collecting Time-Seriesdata
• Simple line based protocol on port 2003• Python-pickle protocol on port 2004
• Graphite is a WEB based GUI on top of Carbon• Some Dashboard functionality
Example Graphite screen
Grafana
• Grafana is more Dashboard focused• Grafana can use many Time-Series data sources
• Graphite• Elasticsearch• CloudWatch• InfluxDB• OpenTSDB• KairosDB• Prometheus
Example Grafana dashboard
Why use Carbon?
Carbon provides an extremely simple protocol to sendTime-Series data
#!/bin/shCHOST=”graphite.host.name”CPORT=”2003”METRIC=”test.PI”VALUE=”3.1415”
echo ”$METRIC $VALUE ‘date +%s‘” | nc $CHOST $CPORT
Why use Carbon?
Why use Carbon?
Not a very useful metric, but consider capturing theruntime of a shell script based cron job.
Carbon also provides a Python-pickle based protocolon port 2004 that can be used to send hundreds ofmetric points condensed in one send(1).
Why use Graphite AND Grafana?
• Grafana is more Dashboard focused• Templating makes it easy to define oneDashboard and use it for many hosts/databases
• Getting to a Dashboard is easier• Can define Alerts• Looks cool
• Graphite is better at ad-hoc graphing• The metric tree is easier to navigate than clickingthrough Grafana’s pull down system
However …
This isn’t a talk advertising Graphite or Grafana.
This is a talk about capturing monitoring data fromPostgreSQL and delivering it into a Time-Series datasystem. Carbon/Graphite and Grafana are exampledestinations.
PostgreSQL Metric Data
PostreSQL produces quite a number of data points.• On the table level
• about 30 metric points• On the index level
• about 6 metric points• On the database level
• about 20 metric points
PostgreSQL Metric Data
Those per table/index numbers are not of concernwhen you look at your typical benchmark database.
But what about a database with 1,800 tables and13,000 indexes?
Now we are talking about 132,000 metric points everytime interval! Captured every minute that is 7.9M perhour, 190M per day, 17.1B per quarter. Don’t do thatwith snapshots captured inside the DB.
PostgreSQL Metric Data
That isn’t as exotic as it looks at first glance
PostgreSQL system views like pg_stat_all_user_tableswill report every single metric point even if a table orindex hasn’t been used for the past 12 months.
How many dead tables (schemas) does yourdatabase have?
A generic monitoring system can’t tell them apart.
PostgreSQL Metric Data
But that isn’t all. Many metrics are presented in whatis a continuous counter, but the useful value is actuallytheir increase per second.
Examples:• Tuples inserted, updated, deleted, fetched• Index/Sequential scans
This is the same as for OS statistics like:• Network operations• Disk operations
PostgreSQL Metric Data
While that is efficient inside of the PostgreSQL serverfor collecting the data, it is rather inconvenient whenbrowsing it in a system like Graphite or Grafana.
Sure, they can apply a function like “persecond()” andit is only 20 mouse clicks away …
OpenCollector
• OpenCollector is a PostgreSQL monitoringdaemon sponsored by OpenSCG
• It is designed to address the aforementionedproblems
• JSON configuration files define all the operation• Target Carbon server• Source Database(s)• Queries to run and what metrics they return• Sparse metric reporting
OpenCollectorAn example from the sample configs:”name”: ”global_stats”,”prefix”: ”database:{datname}.global_stats”,”query”: [”SELECT ”,” datname, numbackends::float8, ”,” xact_commit::float8, xact_rollback::float8, ”,” blks_read::float8, blks_hit::float8, ”,” pg_catalog.pg_database_size(datid)::float8, ”,” pg_xlog_location_diff(pg_current_xlog_insert_location(), ’0/0’) ”,”FROM pg_catalog.pg_stat_database ”,”WHERE datname = current_database() ”
],”result”: [{ ”name”: ”datname”, ”type”: ”internal” },{ ”name”: ”numbackends”, ”type”: ”value” },{ ”name”: ”xact_commit”, ”type”: ”counter” },...
]
OpenCollector
• Since the queries are in config files, you cancustomize them
• Additional WHERE clauses• Change from pg_stat_all_ to pg_stat_user_
• Add your own, application specific queries• OpenCollector is modular and allows to addother things
• OpenCollector is open source
osinfofdw
• osinfofdw is another open source projectsponsored by OpenSCG
• A MultiCorn based FDW around Python-psutil• Access OS level statistics via SELECT
• CPU usage• Memory usage• Disk IO• Network IO• Filesystem information
Links
Links:• https://bitbucket.org/openscg/opencollector• https://bitbucket.org/openscg/osinfofdw
Questions?