Top Banner
Prometheus Is Good for Your Small Startup ShuttleCloud Corp. 2016
61

Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Jan 23, 2018

Download

Technology

ShuttleCloud
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Prometheus Is Good for Your Small Startup ShuttleCloud Corp. 2016

Page 2: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Nacho Carretero @carretops@ShuttleCloud@ShuttleCloudEng

Page 3: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

ShuttleCloud

• Techstars 2011

• Chicago & Madrid

• Email & contacts import API

• ISPs and email & address book providers

Page 4: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

ShuttleFacts

• Gmail: 3 million users with our API

• High availability HA SLA 99.5%

• 6 TB/h

• +18k migrations per day

• ~30 million emails per day

• ~3 million contacts per day

• 247 providers around the world

Page 5: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

What to expect from this presentation

• Disclaimers

• In the beginning…

• A new dawn

• Middle Ages

• Modern History

• Back to the Future

Page 6: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Disclaimers

Page 7: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Disclaimer #1

Page 8: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Disclaimer #2

Page 9: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Disclaimer #3

Page 10: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

In the beginning…

(What we had)

Page 11: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Automatic Scripts

Page 12: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Pingdom

Page 13: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Dashboard and Stats

Page 14: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Status Importer

Status Importer

Status DB

Dashboard Publisher

Email Reports

Views

Page 15: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 16: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

A new dawn

(New metric and alert systems)

Page 17: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 18: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Why Prometheus?

• Metrics have labels - flexibility (can be added/changed)

• No need of external services (i.e. Sensu with RabbitMQ)

• Service Discovery from our DNS

• Easy to install and test

Page 19: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Some first steps

• Bronze Age: The targets were statically fed using Ansible

• Iron Age: DNS service discovery

• Operations Metrics from node_exporter

• Others using textfile in node_exporter:

Page 20: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Some first alerts

• Only Operations Alerts:

• Hard drive usage

• InstanceDown

• Absolute thresholds HD 85% capacity — Send email HD 90% capacity — Page

Page 21: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 22: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Middle Ages

(Business Metrics: Operation Exporter)

Page 23: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Status Importer

Status Importer

Dashboard Publisher

Email Reports

Status DB

Views

Page 24: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Replicating behaviour

Status Importer

Dashboard Publisher

Email Reports

Status DB

Prometheus Publisher

Page 25: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Push vs Pull

Prometheus Publisher

?

Page 26: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Push Gateway

• Prometheus Pushgateway (https://github.com/prometheus/pushgateway)

• Metrics Cache

• Publisher frequency vs Prometheus scraping time

Prometheus Publisher

Push Gateway

Page 27: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Metrics

• 3 main metrics:

• incoming_operations_last_10_min*

• incoming_status_last_10_min*

• migrations_last_5_min

• Gauges (and we were wrong!)

Page 28: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Gauges consequences

• Alerts kept fired unnecessarily

10mint=now

10mint=now

Page 29: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 30: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

HA

• Publisher code is not ready to have two instances of itself

• Publisher operations are not atomic

Prometheus Publisher

Push Gateway

Prometheus Publisher

Page 31: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Clusterize

• Clustering solution because of other services in the same ecosystem

Prometheus Publisher

Push Gateway

Prometheus Publisher

Page 32: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 33: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Modern History

(Business Metrics: Revamp)

Page 34: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Journey to the Past

Prometheus Publisher

Push Gateway

Page 35: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

New Architecture

Operation Exporter

Status DB

Page 36: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

From Publisher to Exporter

• Cron based script

• Gauges

• Aggregation in Publisher

• Stateful

Operation Exporter

• Standalone App (Dockerized)

• Counters

• Aggregation in Prometheus

• Prometheus handles resets

• Stateless

Prometheus Publisher

Page 37: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

From Publisher to Exporter

Prometheus Publisher

Operation Exporter

Page 38: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

HA

• Operation Exporter is Stateless

• Aggregation in Prometheus:

• max(…) without(instance)

Operation Exporter

Operation Exporter

Page 39: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Metrics

• 3 metrics:

• operation_requests_total

• operation_statuses_total

• operation_errors_total

• Counters \ (^_^) /

Page 40: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 41: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Alerts

• Predict Linear vs Absolute Threshold

Page 42: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Alert Manager

• 0.0.4 0.1.1(now 0.5)

• Alerts as a condition-based tree vs condition-based list

• Similar alerts are grouped when notified

• Pagerduty integration improved

Page 43: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Current Architecture?

Operation Exporter

Status DB

Page 44: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 45: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Dashboard

Operation Exporter

Status DB

Dashboard Publisher

Page 46: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Grafana

Operation Exporter

Status DBz

Page 47: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Our new fancy Dashboard

Page 48: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Anything missing?

Operation Exporter

Status DBz

Status Importer

Email Reports

Page 49: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Blackbox Exporter

• Metrics on certificates expiring date

• Kudos to:

(http://www.robustperception.io/get-alerted-before-your-ssl-certificates-expire/)

Page 50: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Finally

Operation Exporter

Status DBz

Status Importer

Email Reports

Blackbox Exporter

Page 51: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 52: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 53: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Approximate usage

• Monitor 200 instances with:

• 1 Prometheus instance in GCE (n1-standard-2)

• 1 HD (used ~30GB with default retention period 15d)

• 1 Meta-monitoring instance (f1-micro)

Page 54: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Back to the future

(Next Steps)

Page 55: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

CouchDB Exporter

• Currently exporting metrics with textfile via node_exporter

Page 56: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Prometheus in HA

Operation Exporter

Operation Exporter

Page 57: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Simplify current Alert System

Super critical Project

Normal Project

Normal Project

Page 58: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

Instrumenting Code

• Currently:

• Operation Metrics

• Business Metrics (~Blackbox Monitoring)

• Missing Metrics from Services

Page 59: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
Page 60: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016

THANK YOU! @ShuttleCloud

@ShuttleCloudEng

Page 61: Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016