Top Banner
DEBS 2010 – 4 th ACM International Conference on Distributed Event-Based System Cambridge, United Kingdom HOLMES: An event-driven solution to monitor data centers through continuous queries and machine learning Pedro Henriques dos Santos Teixeira Ricardo Gomes Clemente Ronald Andreu Kaiser Denis Almeida Vieira Jr
28
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES: An event-driven solution to monitor data centers through continuous queries and

machine learning

Pedro Henriques dos Santos TeixeiraRicardo Gomes Clemente

Ronald Andreu KaiserDenis Almeida Vieira Jr

Page 2: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Topics

• Motivation• Use Case• The Solution

• Overview• System architecture• CEP• Machine learning• CEP & Machine learning integration• Visualization and User Interface

• Conclusion

Page 3: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

Page 4: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Non-stop growing environment, dynamic• Understand our environment• Too many dependencies• Can't afford downtime

Page 5: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Motivation

• Monitoring can be tricky• Precede the inevitable and try to avoid chaos• 1.2K servers• 14K+ monitored items• Correlation

Page 6: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

Page 7: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Use Case

• Big Brother Brazil• New world record• 151 million votes in 2 days• Peaks of 13500 votes per minute (~220 v/s)• DDoS atack detected

Page 8: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Overview

Page 9: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Page 10: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

The System Architecture

Page 11: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

HOLMES

Page 12: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

System architecture – modules and its purposes

• CEP module: known problems• Machine learning module: unknown problems• Visualization module: situational awareness• Storage: events history/log

Page 13: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

Page 14: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP

• Reaction to incidents in real-time is a requirement for data center monitoring

• Expression of abstract rules related to the business is desirable

• Correlation of events through user-defined queries

Page 15: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP - Esper

• Open source CEP Implementation

• Supports an EPL

• High throughput, requirement in our context

• Ease of embed in our application

Page 16: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP – simple example

SELECT avg(response_time) FROM HTTP.win:time(5 min)

E1E5 E4 E3 E2 E1

events stream

Ei

response time...

5 min

4 t.u. 3 t.u. 2 t.u. 3 t.u. 5 t.u.

Page 17: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

If the number of sessions increase in 10% in a 3 minute window and the

average of cpu's usage of the web farm do not

increase in 5% and the number of slow queries in

the database is higher than 10, then we have achieved a

database contention situation. Alarm it!

Page 18: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning“any signal, which is totally predictable, carries no information” - Shannon

Page 19: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Machine learning characteristics

• FRAHST learns to detect anomalous behaviors

• Unsupervised streaming algorithm

• Linear complexity to the number of data streams

Page 20: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

FRAHST, state-of-the-art

For further information, see reference [12] in our paper.

Page 21: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Anomaly detection

Page 22: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

CEP & Machine Learning Integration

• Users choose the data streams to be correlated

• CEP module aggregates events

• Notifications are raised whether a rank variance is detected

Page 23: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

Page 24: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Visualization and User Interface

• Users can create Perspectives

• Real-time dashboard personalizations

• Events history visualization

Page 25: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Dashboards

Page 26: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Page 27: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

Conclusion

• Successfully implementation and acceptance in a real use case

• New challenges• improving situational

awareness & prediction• Make creation of queries

more intuitive

Page 28: Debs2010

DEBS 2010 – 4th ACM International Conference on Distributed Event-Based SystemCambridge, United Kingdom

This presentation:

http://www.slideshare.net/intelie/debs2010

Our Nagios Plugin source code:

http://github.com/intelie/neb2activemq

Intelligent Monitoring with Esper:

http://esper.codehaus.org/tutorials/tutorial/presentations.html

Denis Vieira Jr. - [email protected] Ronald Kaiser - [email protected]