Performance monitoring for Docker - Lucerne meetup

Performance Monitoringfor Docker environmentsMonitoring DockerAnomaly detectionLive demo

About me

@coscale

www.coscale.com

@spolfliet

[email protected]

• Scale & dynamic behavior:

Number of containers >> number of servers

Containers come and go at a much faster pace

Container monitoring challenges

• Diversity

Different application technologies

Overload of metrics to monitor and alert on

Monolithic application monitoring

(Virtualized) OS

Application

End user

System / Infrastructure monitoring

Application performance monitoring (APM)

Real user monitoring (RUM)

Microservices monitoring

(Virtualized) OS

End user

System / Infrastructure monitoring

Container monitoring +In-container application monitoring

Real user monitoring (RUM)

Container

Applicationcomponent

Container


Container


Hosts (CPU, memory, disk)

Orchestrator (services, volumes, replication controllers, …)

Containers (cpu, memory, disk, network, ...)

Container internals (application, database, caching, etc.)

Impact on user and application performance

What to monitor?

Lightweight monitoring for lightweight microservices environment

Docker stats API

$ docker statsCONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O1285939c1fd3 0.07% 796 KiB / 64 MiB 1.21% 788 B / 648 B 3.568 MB / 512 KB9c76f7834ae2 0.07% 2.746 MiB / 64 MiB 4.29% 1.266 KB / 648 B 12.4 MB / 0 Bd1ea048f04e4 0.03% 4.583 MiB / 64 MiB 6.30% 2.854 KB / 648 B 27.7 MB / 0 B

Docker API

docker run \

--volume=/:/rootfs:ro \

--volume=/var/run:/var/run:rw \

--volume=/sys:/sys:ro \

--volume=/var/lib/docker/:/var/lib/docker:ro \

--publish=8080:8080 \

--detach=true \

--name=cadvisor \

google/cadvisor:latest

open http://<your-hostname>:8080/

CAdvisor

agent runs in 1 container or on hostcontainer resource usage basic application monitoring

15$ / month / server

Datadogdatadoghq.com

kernel module captures system callscontainer resource usage basic application monitoring

Sysdig sysdig.com

Heavyweight, deep application monitoringDesigned for monolithic application in specific programming languageToo many dynamic metrics to handle with static alertsPutting an agent inside a container is an anti-pattern

100+$ / month / server

APM vendors

● Extra work in setting up, maintaining, and supporting● Generic tools, no specific container or cluster visualizations ● No Real User Monitoring● No out-of-the-box anomaly detection and predictive analytics

Prometheus

Open source

Performance Monitoringfor Docker environmentsAnomaly detection

Anomaly: definition

Static alerts

TODO : more realistic business examples!

!

!

?seasonality

correlations

changing or dynamic environment

Static alert limitations

Challenges

statistical significance relevance⇏

Simple technique: 3- rule

Exponential smoothing: α=0.03, z=3

Does not work with seasonal data

Holt-Winters● seasonal exponential smoothing

● works quite well on ‘laboratory data’

● calculation of prediction intervals relies on normal distribution after removal of seasonality

● => on our real world seasonal data generates too many false positives

Sliding window approach

model

evaluation of new data

Local outlier factorExisting instance based machine learning technique (lazy, ~kNN)

Based on concept of local density

local outlier factor(A) = density at point A

average density of kNN of point A

LOF >> 1 ⇒ outlier

en.wikipedia.org/wiki/Local_outlier_factor

Load balance detector

Compare multiple signals (mean + variance) in load-balanced environment

Anomaly detection @ service level

Lightweight agent• Server metrics from OS

• Container and cluster metrics from Kubernetes and Docker APIs

• Application metrics from log files and management interfaces

• Business & custom metrics from various sources

Contextual events

• Container lifecycle

• Deployments & software releases

• Infrastructure changes

• Custom events

CoScale approach

Scalable Architecture

APPAPP

APP

APPAPP

API

APPAPP

RUM

PostgresqlMetadata

CassandraMetric data

ElasticsearchEvent data

HaProxy

LoadbalancerHTTPS handling

Analysis workers

Alerting workers

Data workers

RUM

Boomerang.js

Agent

Log & api parsing

DEMO

Questions?

or contact me at [email protected]

@spolfliet

Backup slides

Local outlier factor, no strong model assumption

heavy process

Local outlier factor, no strong model assumption

Local outlier factor, no free lunch

Scaling: comparing apples and oranges

scale ⇒ distance ⇒ density ⇒ LOF-score

Autoscaling? (Mahalanobis distance) => enlarges dimensions with low variance

“Curse of dimensionality”

dimensionality reduction preprocessing (e.g. PCA), but don’t throw away the anomalies with the bathwater

Choosing cross-sections of data to analyze together, e.g.

different metric on same container

same metric on different containers

Performance monitoring for Docker - Lucerne meetup

Software

Performance monitoring for Docker - Lucerne meetup