Top Banner
Monitoring your Swift cluster health Christian Schwede Principal Software Engineer, Red Hat OpenStack Summit Vancouver, May 2015
30
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Monitoring your Swift cluster health

Christian SchwedePrincipal Software Engineer, Red HatOpenStack Summit Vancouver, May 2015

Page 2: Monitoring Swift - OpenStack Summit May 2015, Vancouver

All good things come in threes

Swift Architecture MetricsBasic Monitoring

Page 3: Monitoring Swift - OpenStack Summit May 2015, Vancouver

A short Swift overview

Page 4: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Proxy server PUT http://swift.com/v1/account/container/objectname

disk

Se

rve

r

Re

pl ic

ato

r

Au

di to

r

Up

da

ter

disk

disk disk

disk

Se

rve

r

Re

pl ic

ato

r

Au

di to

r

Up

da

ter

disk

disk disk

disk

Se

rve

r

Re

pl ic

ato

r

Au

di to

r

Up

da

ter

disk

disk disk

disk

Se

rve

r

Re

pl ic

ato

r

Au

di to

r

Up

da

ter

disk

disk disk

Page 5: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Basic Monitoring

Page 6: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Basic monitoring

● Services available?

curl http://server:port/healthcheck → “200 OK”

● Drives OK?

swift-drive-audit

● Checking replication, auditors, updaters, async_pending, ...

swift-recon

● Check data availability

swift-dispersion-report

● Audit a speci-c account/container?

swift-account-audit

Page 7: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Metrics

Page 8: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Collecting metrics

[28.381567892711667, 1430596860],

[26.190797487908338, 1430596920],

[28.006374835958336, 1430596980],

[28.425395488741668, 1430597040],

[27.621122305142339, 1430597100],

[30.334730943041667, 1430597160],

[31.013429164883334, 1430597220],

[28.327365745216325, 1430597280],

[27.783294518800002, 1430597340],

[27.764280637108341, 1430597400],

?

Page 9: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Collecting metrics

[28.381567892711667, 1430596860],

[26.190797487908338, 1430596920],

[28.006374835958336, 1430596980],

[28.425395488741668, 1430597040],

[27.621122305142339, 1430597100],

[30.334730943041667, 1430597160],

[31.013429164883334, 1430597220],

[28.327365745216325, 1430597280],

[27.783294518800002, 1430597340],

[27.764280637108341, 1430597400],

Page 10: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Swift, statsd & graphite interaction

object-server object-replicatorcollectd

statsd

carbon-cache

whisperdb

graphite-web

Page 11: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Installation & con-guration

Page 12: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Packages & important con-guration -les

● statsd

● python-carbon

● graphite-web

● graphite-web-selinux

● collectd

/etc/swift/*-server.conf

/etc/collectd.conf

/etc/statsd/con-g.js

/etc/carbon/storage-schemas.conf

/etc/carbon/storage-aggregation.conf

Page 13: Monitoring Swift - OpenStack Summit May 2015, Vancouver

0

2

4

6

8

10

0 1 2 3 4 5 6 7 8 9 10

Sa

mple

valu

e

Time

Retention period & Aggregation method

Page 14: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Working with graphite-web

Page 15: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 16: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 17: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 18: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 19: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Selected Metrics

Page 20: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 21: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 22: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 23: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 24: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 25: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 26: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 27: Monitoring Swift - OpenStack Summit May 2015, Vancouver
Page 28: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Thank you!

[email protected]

#openstack-swift: cschwede

@cschwede_de

Page 29: Monitoring Swift - OpenStack Summit May 2015, Vancouver

References

● docs.openstack.org/developer/swift/admin_guide.html#cluster-telemetry-and-monitoring

● docs.openstack.org/developer/swift/admin_guide.html#reporting-metrics-to-statsd

● github.com/etsy/statsd/blob/master/docs/graphite.md

● graphite.readthedocs.org/en/latest/

● graphite.readthedocs.org/en/latest/functions.html

● collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_write_graphite

Page 30: Monitoring Swift - OpenStack Summit May 2015, Vancouver

Used graphite functions

1a groupByNode(stats.counters.*.proxy-server.object.*.2*.xfer.count, 5, "avg")

1b groupByNode(stats.timers.*.proxy-server.object.*.2*.timing.median, 5, "avg")

2a substr(stats.timers.*.proxy-server.object.*.2*.timing.count, 5,6)

2b substr(stats.timers.*.proxy-server.object.*.4*.timing.count, 5,7)

3 substr(avg(*.cpu.*.cpu.wait), 4)

4 substr(lowestCurrent(*.df.*.df_complex.free,5), 0, 1)

5 groupByNode(stats.counters.*.object-replicator.partition.update.count.*.count, 2, "sum")

6 substr(*.counters.*.proxy-server.*.handoff_count.count, 4, 5)

7 groupByNode(*.filecount.*_async_pending.files, 0, "sum")