Top Banner
Host Health Monitoring with `docker run` Noah Zoschke @nzoschke [email protected] 10 / 28 / 2015
15

Host Health Monitoring with Docker Run

Apr 12, 2017

Download

Software

Noah Zoschke
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Host Health Monitoring with Docker Run

Host Health Monitoring with `docker run`

Noah Zoschke @nzoschke

[email protected] 10 / 28 / 2015

Page 2: Host Health Monitoring with Docker Run

Health Monitoring

circa 1999• Nagios Core

• Event scheduler • Event processor • Alert manager

• Host groups config • Ping • HTTP • SSH

• Nagios Remote Plugin Executor • SNMP • load • disk

photo credit: https://en.wikipedia.org/wiki/Nagios

Page 3: Host Health Monitoring with Docker Run

Health Monitoring circa 2012

• AMI • Chef / Ansible

• ELB / Health Check • Protocol: HTTP (or HTTPS, TCP, SSL) • Port: 80 • Path: /index.html • Timeout / Interval: 5s / 30s • Unhealthy / Healthy Threshold: 2 / 10

• EC2 / Status Checks • Loss of network • Loss of power • Host software problems • Host hardware problems

• ASG photo credit: http://aws.amazon.com/architecture/ http://blog.domenech.org/2012/11/aws-ec2-auto-scaling-basic-configuration.html

Page 4: Host Health Monitoring with Docker Run

But you probably still need…

• Nagios for monitoring

• or Zabbix, Ganglia, Sensu…

• or OpsView, SolarWinds…

• or Pingdom, Datadog…

• To provide system feedback

• ASG SetInstanceHealth

photo credit: http://itomibhaa.deviantart.com/art/Who-watches-the-Watchmen-276285938

Page 5: Host Health Monitoring with Docker Run

Health Monitoring circa 2016, the age of containers

• Generic AMI • Docker

• ECS • Container scheduling and re-scheduling as a service

• ASG / EC2 / Status Checks • Simple monitoring container

photo credit: https://github.com/docker/swarm

Page 6: Host Health Monitoring with Docker Run

ecs-agent dockerd ecs-agent dockerd ecs-agent dockerd

api128 MB

registry256 MB

rails web.21024 MB

data worker.1512 MB

rails web.31024 MB

data worker.2512 MB

rails worker.2256 MB

rails worker.3256 MB

rails web.11024 MB

rails worker.1256 MB

rails worker.4256 MB

ECS

ASG

api ELB rails ELB

Page 7: Host Health Monitoring with Docker Run

ecs-agent dockerd ecs-agent dockerd ecs-agent dockerd

api128 MB

registry256 MB

rails web.21024 MB

data worker.1512 MB

rails web.31024 MB

data worker.2512 MB

rails worker.2256 MB

rails worker.3256 MB

rails web.11024 MB

rails worker.1256 MB

rails worker.4256 MB

ECS

ASG

api ELB rails ELB

Failure Scenarios• web.2 container crashes

• web.2 port unresponsive

• ecs-agent fails

• dockerd fails

• Instance hardware fails

• Instance fails to register with ECS

• Instance userspace gets wacky

Page 8: Host Health Monitoring with Docker Run

Failure Scenarios• web.2 container crashes

• web.2 port unresponsive

• ecs-agent fails

• dockerd fails

photo credit: http://paper-replika.com/index.php?option=com_content&view=article&id=76&Itemid=207693

>rescheduletask

Page 9: Host Health Monitoring with Docker Run

Container Schedulers are the new watchman

• Container process monitoring

• Service health check monitoring

• Automatic re-scheduling

photo credit: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_life_cycle.html

Page 10: Host Health Monitoring with Docker Run

ecs-agent dockerd ecs-agent dockerd ecs-agent dockerd

api128 MB

registry256 MB

rails web.21024 MB

data worker.1512 MB

rails web.31024 MB

data worker.2512 MB

rails worker.2256 MB

rails worker.3256 MB

rails web.11024 MB

rails worker.1256 MB

rails worker.4256 MB

ECS

ASG

api ELB rails ELB

Failure Scenarios• web.2 container crashes

• web.2 port unresponsive

• ecs-agent fails

• dockerd fails

• Instance hardware fails

• Instance fails to register with ECS

• Instance userspace gets wacky

Still need to configure an ASG to maintain capacity…

Page 11: Host Health Monitoring with Docker Run

ecs-agent dockerd ecs-agent dockerd ecs-agent dockerd

api128 MB

registry256 MB

rails web.21024 MB

data worker.1512 MB

rails web.31024 MB

data worker.2512 MB

rails worker.2256 MB

rails worker.3256 MB

rails web.11024 MB

rails worker.1256 MB

rails worker.4256 MB

ECS

ASG

api ELB rails ELB

Failure Scenarios• web.2 container crashes

• web.2 port unresponsive

• ecs-agent fails

• dockerd fails

• Instance hardware fails

• Instance fails to register with ECS

• Instance userspace gets wacky

Still need a monitor…

Page 12: Host Health Monitoring with Docker Run

ecs-agent dockerd ecs-agent dockerd ecs-agent dockerd

api128 MB

registry256 MB

rails web.21024 MB

data worker.1512 MB

rails web.31024 MB

data worker.2512 MB

rails worker.2256 MB

rails worker.3256 MB

rails web.11024 MB

rails worker.1256 MB

rails worker.4256 MB

ECS

ASG

api ELB rails ELB

Health Monitoring circa 2016, the age of containers

• Schedule a monitor process in container cluster

• Describe ASG an ECS membership

• Mark all instances unregistered with ECS unhealthy

• `docker run` a user space health check on every instance

• Mark instances that fail to connect to Docker unhealthy

• Mark instances that fail user space health check unhealthy

No Nagios server + plugins!

Page 13: Host Health Monitoring with Docker Run

Partial Failure Scenarios battle scars

• web.2 container crashes

• web.2 port unresponsive

• ecs-agent fails

• dockerd fails

• Instance hardware fails

• Instance fails to register with ECS

• Instance userspace gets wacky

• Disk full

• Disk partition corrupt / read-only

• Network packet loss

• CPU steal

• Kernel bugs triggered

• Security vulnerabilities

• Security breaches

• …

Page 14: Host Health Monitoring with Docker Run

User Space Health Check

$dockerrunbusyboxsh-c\'dmesg|grep"Remountingfilesystemread-only"'

#whynot:$dockerrunhealth-check

To package, distribute and run common top, netstat, smartmontools, etc. binaries and scripts

Page 15: Host Health Monitoring with Docker Run

Thanks!

Slides available on Medium / SlideSharehttps://medium.com/@nzoschke/host-health-monitoring-with-docker-run-46315eb38286

http://www.slideshare.net/nzoschke/host-health-monitoring-with-docker-run

Open source Golang monitor available on GitHubhttps://github.com/convox/rack/blob/master/api/workers/cluster.go

Questions / feedback to @nzoschke or [email protected]