Telemetry & Analytics Izabella Raulin [email protected] https://github.com/IzabellaRaulin https://github.com/intelsdi-x/snap © 2016 Intel Corporation
Telemetry & Analytics
Izabella Raulin
https://github.com/IzabellaRaulin
https://github.com/intelsdi-x/snap
© 2016 Intel Corporation
2
Agenda
1. Software Defined Infrastructure Team
2. Data center scheduling and workload management
• Intelligent Resource Orchestration
3. Role of telemetry in resource orchestration
4. Snap – an open source telemetry tool
• How to get started with Snap
• How to monitor at scale easy
• Snap DEMO
© 2016 Intel Corporation
3© 2016 Intel Corporation
4© 2016 Intel Corporation
5
The next level of cloud evolution
emerges data centers
which are: smarter
self-aware
self-optimizing
self-scaling
self-healing
© 2016 Intel Corporation
6
“When technological progress increases the efficiency with which a resource is used (reducing the amount necessary for any one use), but the rate of consumption of that resource rises because of increasing demand.”
Jevon’s Paradox
© 2016 Intel Corporation
7
© 2016 Intel Corporation
8
© 2016 Intel Corporation
9
© 2016 Intel Corporation
10
© 2016 Intel Corporation
11
Inteligent Resource Orchestration
Watch: Observe environment, collect metrics
Decide: Based on observation
determine the best decision
Act: Take an action on that decision
Learn: Learn from that decision and improve the future ones
and knowledge,
© 2016 Intel Corporation
Watch: Observe environment, collect metrics
Decide: Based on observation
determine the best decision
Act: Take an action on that decision
Learn: Learn from that decision and improve the future ones
and knowledge,
12
Inteligent Resource Orchestration
© 2016 Intel Corporation
13
What to measure and how to measure it?
perf
ethtool
iostat
pidstat
netstat
htop
collectd
diamond
Every potential indicator has its own set of tools that don’t necessarily fit into other tools
© 2016 Intel Corporation
14
• How to collect and bind metrics gathered from different tools?
• How to avoid writing customize scripts?
• Where to store collected data?
• How to monitor at scale easy?
• How to connect data to make valuable analysis?
• How to visualize collected metrics?
• How to compare data collaborate with others teams on that?
What to measure?How to measure it?What next?
© 2016 Intel Corporation
15
snap – an open telemetry framework
Easily collect, process, and publish telemetry data at scale
• Empower systems to expose a consistent set of telemetry data
• Simplify telemetry ingestion across ubiquitous storage systems
• Improve the deployment model, packaging and flexibility for collecting telemetry
• Allow flexible processing of telemetry data on agent (e.g. filtering and decoration)
• Provide powerful clustered control of telemetry workflows across small or large clusters -
TRIBE
Snap is not intended to• Operate as an analytics platform – It is intended to feed them• Compete with existing metric/monitoring/telemetry agents
© 2016 Intel Corporation
16
snap | Workflow
© 2016 Intel Corporation
17
Collect telemetry data via plugins for:
Hardware: SNMP, CPU, Disk, NIC, Intel NodeManager, Intel PCM, SMART, …
Containers and VMs: Cgroups, Docker, Libvirt, Mesos, Perf events, Processes, …
Applications and Services: Apache, Cassandra, CEPH, Etcd, HAProxy,
InfluxDB, MySQL, NFS, RabbitMQ, …
OpenStack: Nova, Cinder, Glance, Keystone, Neutron
snap | Collectors
© 2016 Intel Corporation
18
Filter, alter or append metadata as many times as you need via plugins for:
Filtering
Anomaly Detection
Statistics and Normalization
Encryption for all or part of the data set
Injection of remote requires for tokens
snap | Processors
© 2016 Intel Corporation
19
Publish data as many times as you need via plugins for:
Dashboard Tools: Graphite, Grafana, Riemann
Queues and Logs: RabbitMQ, Kafka, File
Databases: PostgresSQL, InfluxDB, OpenTSDB, MySQL, HANA, Etcd, KairosDB
Storing the same telemetry on independent pipelines.
snap | Publishers
© 2016 Intel Corporation
20
List of all available plugins:
https://github.com/intelsdi-x/snap/blob/master/docs/PLUGIN_CATALOG.md(*) Right now snap only supports Linux and OS X (Darwin)
snap | Plugins
© 2016 Intel Corporation
snap | Plugin Lifecycle
21
a) Plugin load
• Dynamic, does not require restart
• Automatically is informed by plugin on the features, metrics
• Dynamically extends the metric catalog when loaded
b) Plugin unload
• Removes metrics from catalog automatically
c) Plugin swap
• Swaps a newer version plugin for an old one in a safe transaction
Dynamic plugin operations means loading, updating, and unloading plugins without restarting snap orextra configuration management. That ensures simple and secure bug fixes, security patching, and improvedaccuracy in production.
© 2016 Intel Corporation
Everything is Challenging At Scale
23
Send task to one host have it replicated to all hosts
snap | Tribe
© 2016 Intel Corporation
24
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host Physical/VM Host
Collection
Collection
Collection
Scheduler
Processing Publishing
snap | Distributed workload
© 2016 Intel Corporation
© 2016 Intel Corporation
26
snap up to find more
Github https://github.com/intelsdi-x/snap
Slack channel https://intelsdi-x.slack.com/messages/snap-telemetry/
Medium blogposts https://medium.com/intel-sdi
”Snap and Kubernetes: together at last” written by Andrzej Kuriata
”Setting Up Your Snap Development Environment” written by Sarah Han
”Measuring Snap performance” written by Olivier Cano
”What I mean by telemetry” written by Matthew Brender
”The Guts of Tasks: How Snap Gathers Telemetry” written by Matthew Brender
See latest release on https://github.com/intelsdi-x/snap/releases
© 2016 Intel Corporation
27
© 2016 Intel Corporation
Golang
28
http://www.meetup.com/GoLang-User-Group-Wroclaw/
https://www.facebook.com/GolangTricity