Performance Monitoringfor Docker environmentsMonitoring DockerAnomaly detectionLive demo
• Scale & dynamic behavior:
Number of containers >> number of servers
Containers come and go at a much faster pace
Container monitoring challenges
• Diversity
Different application technologies
Overload of metrics to monitor and alert on
Monolithic application monitoring
(Virtualized) OS
Application
End user
System / Infrastructure monitoring
Application performance monitoring (APM)
Real user monitoring (RUM)
Microservices monitoring
(Virtualized) OS
End user
System / Infrastructure monitoring
Container monitoring +In-container application monitoring
Real user monitoring (RUM)
Container
Applicationcomponent
Container
Applicationcomponent
Container
Applicationcomponent
Hosts (CPU, memory, disk)
Orchestrator (services, volumes, replication controllers, …)
Containers (cpu, memory, disk, network, ...)
Container internals (application, database, caching, etc.)
Impact on user and application performance
What to monitor?
Lightweight monitoring for lightweight microservices environment
Docker stats API
$ docker statsCONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O1285939c1fd3 0.07% 796 KiB / 64 MiB 1.21% 788 B / 648 B 3.568 MB / 512 KB9c76f7834ae2 0.07% 2.746 MiB / 64 MiB 4.29% 1.266 KB / 648 B 12.4 MB / 0 Bd1ea048f04e4 0.03% 4.583 MiB / 64 MiB 6.30% 2.854 KB / 648 B 27.7 MB / 0 B
Docker API
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
open http://<your-hostname>:8080/
CAdvisor
agent runs in 1 container or on hostcontainer resource usage basic application monitoring
15$ / month / server
Datadogdatadoghq.com
kernel module captures system callscontainer resource usage basic application monitoring
Sysdig sysdig.com
Heavyweight, deep application monitoringDesigned for monolithic application in specific programming languageToo many dynamic metrics to handle with static alertsPutting an agent inside a container is an anti-pattern
100+$ / month / server
APM vendors
● Extra work in setting up, maintaining, and supporting● Generic tools, no specific container or cluster visualizations ● No Real User Monitoring● No out-of-the-box anomaly detection and predictive analytics
Prometheus
Open source
Performance Monitoringfor Docker environmentsAnomaly detection
Anomaly: definition
Static alerts
TODO : more realistic business examples!
!
!
?seasonality
correlations
changing or dynamic environment
Static alert limitations
Challenges
statistical significance relevance⇏
Simple technique: 3- rule
Exponential smoothing: α=0.03, z=3
Does not work with seasonal data
Holt-Winters● seasonal exponential smoothing
● works quite well on ‘laboratory data’
● calculation of prediction intervals relies on normal distribution after removal of seasonality
● => on our real world seasonal data generates too many false positives
Sliding window approach
model
evaluation of new data
Local outlier factorExisting instance based machine learning technique (lazy, ~kNN)
Based on concept of local density
local outlier factor(A) = density at point A
average density of kNN of point A
LOF >> 1 ⇒ outlier
en.wikipedia.org/wiki/Local_outlier_factor
Load balance detector
Compare multiple signals (mean + variance) in load-balanced environment
Anomaly detection @ service level
Lightweight agent• Server metrics from OS
• Container and cluster metrics from Kubernetes and Docker APIs
• Application metrics from log files and management interfaces
• Business & custom metrics from various sources
Contextual events
• Container lifecycle
• Deployments & software releases
• Infrastructure changes
• Custom events
CoScale approach
Scalable Architecture
APPAPP
APP
APPAPP
API
APPAPP
RUM
PostgresqlMetadata
CassandraMetric data
ElasticsearchEvent data
HaProxy
LoadbalancerHTTPS handling
Analysis workers
Alerting workers
Data workers
RUM
Boomerang.js
Agent
Log & api parsing
DEMO
Backup slides
Local outlier factor, no strong model assumption
heavy process
Local outlier factor, no strong model assumption
Local outlier factor, no free lunch
Scaling: comparing apples and oranges
scale ⇒ distance ⇒ density ⇒ LOF-score
Autoscaling? (Mahalanobis distance) => enlarges dimensions with low variance
“Curse of dimensionality”
dimensionality reduction preprocessing (e.g. PCA), but don’t throw away the anomalies with the bathwater
Choosing cross-sections of data to analyze together, e.g.
different metric on same container
same metric on different containers