Top Banner
Monitoring “unknown unknowns” With Machine Intelligence
21

Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Jan 22, 2018

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Monitoring “unknown unknowns” With Machine Intelligence

Page 2: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

@guyfig

On-Call Engineer by Nature

"If a tree falls in a forest and no one is

around to hear it, does it make a sound?"

Page 3: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Observability is a superset between

monitoring and instrumentation.

Making systems debuggable and

understandable

@mipsytipsy

Do you really know what to observe?

Instrumentation - mostly Developer driven

What is the output? Dashboard? Exploration tool?

Page 4: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

one can determine the behavior of the

entire system from the system's outputs

Observability In Control Theory

Page 6: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

-Static thresholds-Defined Alerts-Static Runbooks

-Anomaly Detection-Predictions-External Knowledge

-Knowledge-Recommendations-Auto Collaboration

-Inference-Auto Correlations-Semantic Analysis-Decision making

The Observability Quadrant (Based on Johari window)

Page 7: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Humans Driven Detection

Set thresholds to find patterns

Simulate based on known

Use percentiles, basic stats

Page 8: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017
Page 9: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Will That Help in a “Fire-Fighting” Mode?

Page 10: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Find The Problem

Thresholds? Baseline? Anomaly?

- Scale matters

- Stationary noise matters

- Use Autocorrelation

Page 11: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Preprocessing Data

sklearn.preprocessing

Page 13: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Find The Problem

CPU

90%

Time in Minutes EC2 Instance

changed from

t2.small to m3.xl

Events & context matters

Anomly?

Page 14: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Use Enrichment

Page 15: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

What Can Machines Do?

Process different types of data, transform it fast and handle huge amounts in real-time

Automate and adapt Anomaly Detection

Apply Semantic text similarities to find patterns (Information Retrieval)

Apply auto correlation models

Evolve and adapt (overtime) based on human interaction

Page 16: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

The Goal - Centralization

Observability for systems with imperfect outputs

Events enrichments, symptoms detection and inference

Automatic Outlier Detection

Automatic Correlation

Get closer to the Control Theory mathematical definition

Page 17: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Pick The Right Tools

https://github.com/turi-code/SFrame

Page 18: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

- Define the model. Use a single schema (Apache Avro)

- Events are agnostic. Can represent logs, stack trace, metric, user action, HTTP event,

etc.

- Every event should have a set of common fields as well as optional key/value

attributes

Get a Common SchemaUse Common Schema

Page 19: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Deterministic models are better to start with (Fuzzy Logic, Rules)

Choose your logic and start run it across your data (schema)

Apply similarity checks to strings first (TF-IDF, BM25, Fuzzy, other classifiers)

Look into correlations, start with simple obvious ones, before building classifiers

(Unsupervised/Semi-supervised learning is much more relevant overall)

Build your prediction models on time series data first. (Statistics has solid models)

Time and context are dimensions you will be able to start addressing

Best Practices

Page 20: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

Use It In Production

- Your team == your users

- Ask for feedback

- Re-calculate relevancy

- Apply Recommendations

based on your own team

knowledge

Page 21: Monitoring "unknown unknowns" - Guy Fighel - DevOpsDays Tel Aviv 2017

github.com/signifai

@guyfig

Thank You