Monitoring pg with_graphite_grafana

Monitoring PostgreSQL using Time-Seriessystems like Graphite and/or Grafana with

OpenCollectorJan Wieck - OpenSCG

Overview

• What is Monitoring?• Graphite & Carbon• Grafana• Why use Carbon?• Why use Graphite AND Grafana?• PostgreSQL Metric Data• OpenCollector• osinfofdw

What is Monitoring?

• Capture Time-Series data• Metric-Name, Value, Timestamp

• Visualize Time-Series data• Define alerts base on Time-Series data• Statistical analysis of Time-Series data

• Getting an alert when your primary DB server isdown is covered by the above!

Graphite & Carbon

• Carbon is a server for collecting Time-Seriesdata

• Simple line based protocol on port 2003• Python-pickle protocol on port 2004

• Graphite is a WEB based GUI on top of Carbon• Some Dashboard functionality

Example Graphite screen

Grafana

• Grafana is more Dashboard focused• Grafana can use many Time-Series data sources

• Graphite• Elasticsearch• CloudWatch• InfluxDB• OpenTSDB• KairosDB• Prometheus

Example Grafana dashboard

Why use Carbon?

Carbon provides an extremely simple protocol to sendTime-Series data

#!/bin/shCHOST=”graphite.host.name”CPORT=”2003”METRIC=”test.PI”VALUE=”3.1415”

echo ”$METRIC $VALUE ‘date +%s‘” | nc $CHOST $CPORT

Why use Carbon?

Why use Carbon?

Not a very useful metric, but consider capturing theruntime of a shell script based cron job.

Carbon also provides a Python-pickle based protocolon port 2004 that can be used to send hundreds ofmetric points condensed in one send(1).

Why use Graphite AND Grafana?

• Grafana is more Dashboard focused• Templating makes it easy to define oneDashboard and use it for many hosts/databases

• Getting to a Dashboard is easier• Can define Alerts• Looks cool

• Graphite is better at ad-hoc graphing• The metric tree is easier to navigate than clickingthrough Grafana’s pull down system

However …

This isn’t a talk advertising Graphite or Grafana.

This is a talk about capturing monitoring data fromPostgreSQL and delivering it into a Time-Series datasystem. Carbon/Graphite and Grafana are exampledestinations.

PostgreSQL Metric Data

PostreSQL produces quite a number of data points.• On the table level

• about 30 metric points• On the index level

• about 6 metric points• On the database level

• about 20 metric points


Those per table/index numbers are not of concernwhen you look at your typical benchmark database.

But what about a database with 1,800 tables and13,000 indexes?

Now we are talking about 132,000 metric points everytime interval! Captured every minute that is 7.9M perhour, 190M per day, 17.1B per quarter. Don’t do thatwith snapshots captured inside the DB.


That isn’t as exotic as it looks at first glance

PostgreSQL system views like pg_stat_all_user_tableswill report every single metric point even if a table orindex hasn’t been used for the past 12 months.

How many dead tables (schemas) does yourdatabase have?

A generic monitoring system can’t tell them apart.


But that isn’t all. Many metrics are presented in whatis a continuous counter, but the useful value is actuallytheir increase per second.

Examples:• Tuples inserted, updated, deleted, fetched• Index/Sequential scans

This is the same as for OS statistics like:• Network operations• Disk operations


While that is efficient inside of the PostgreSQL serverfor collecting the data, it is rather inconvenient whenbrowsing it in a system like Graphite or Grafana.

Sure, they can apply a function like “persecond()” andit is only 20 mouse clicks away …

OpenCollector

• OpenCollector is a PostgreSQL monitoringdaemon sponsored by OpenSCG

• It is designed to address the aforementionedproblems

• JSON configuration files define all the operation• Target Carbon server• Source Database(s)• Queries to run and what metrics they return• Sparse metric reporting

OpenCollectorAn example from the sample configs:”name”: ”global_stats”,”prefix”: ”database:{datname}.global_stats”,”query”: [”SELECT ”,” datname, numbackends::float8, ”,” xact_commit::float8, xact_rollback::float8, ”,” blks_read::float8, blks_hit::float8, ”,” pg_catalog.pg_database_size(datid)::float8, ”,” pg_xlog_location_diff(pg_current_xlog_insert_location(), ’0/0’) ”,”FROM pg_catalog.pg_stat_database ”,”WHERE datname = current_database() ”

],”result”: [{ ”name”: ”datname”, ”type”: ”internal” },{ ”name”: ”numbackends”, ”type”: ”value” },{ ”name”: ”xact_commit”, ”type”: ”counter” },...

]

OpenCollector

• Since the queries are in config files, you cancustomize them

• Additional WHERE clauses• Change from pg_stat_all_ to pg_stat_user_

• Add your own, application specific queries• OpenCollector is modular and allows to addother things

• OpenCollector is open source

osinfofdw

• osinfofdw is another open source projectsponsored by OpenSCG

• A MultiCorn based FDW around Python-psutil• Access OS level statistics via SELECT

• CPU usage• Memory usage• Disk IO• Network IO• Filesystem information

Links

Links:• https://bitbucket.org/openscg/opencollector• https://bitbucket.org/openscg/osinfofdw

Questions?

https://bitbucket.org/openscg/opencollector

https://bitbucket.org/openscg/osinfofdw

Monitoring pg with_graphite_grafana

Software