Top Banner
26/05/2004 HEPIX, Edinburgh, May 24- 28 1 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO http://cern.ch/lemon- status
15

26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

Dec 10, 2015

Download

Documents

Rachel Bradley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 1

Lemon Web Monitoring

Miroslav ŠiketCERN IT/FIO

http://cern.ch/lemon-status

Page 2: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 2

Outline

Concepts Design and architecture Web visualization Deployment Current development

Page 3: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 3

Concepts of Monitoring

Monitoring information in Computer Centers CERN ~ 2000 computers and ~70 clusters Huge amount of data ~150 metrics per host High demand on organization of the

information in easily accessible way and easily to parse

Variety of views for different groups of users – sysadmins, users, managers

Lemon – tries to do the job by incorporating many relatively new technologies

Page 4: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 4

Monitoring information

We have generally three types of data: Performance metrics:

CPU usage, load averages, memory use, disk use/performance, sockets, network, …

Exceptions: High load, swap use over 90%, service down,…

Status information: Uptime, boot time, kernel version,…

Heartbeat All is gathered with different frequencies from

60s to 1 day/on boot. About 1GB of data a day

Page 5: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 5

Lemon Architecture

Page 6: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 6

Components (I)

MSA (Monitoring Sensor Agent) and MS (Monitoring Sensor) - MS measures data and MSA provides transport to MR

MR (Monitoring Repository) with backend to Oracle, MySQL, flat file,…

Correlation Engine – framework for creating metric correlations

Alarm Broker (prototype) – daemon for handling exceptions and communication between alarm GUI and MR

Page 7: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 7

Components (II)

Anamon (Analysis of MONitoring information) – java based GUI for real-time visualization of metrics

SOAP/WSDL – MR provides Web services extension for any additional users

RRD/Apache/PHP framework for easy access to the pre-processed information

CDB (Configuration Database) – many components access this information which is part of Quattor framework at CERN

Page 8: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 8

RRD Tool Framework

RRD (Round Robin Database) Data is organized in time-series files of aging

information Supported types – Gauge, Counter, Derive, Absolute Framework for storing measurement averages, min,

max, derivatives,… Provides graphing capabilities Provides simple mathematic operation on stored data Data does not expand in size with time Provides export to XML, flat file formats

Is widely used by many applications – MRTG, Ganglia, CDF Farm Control, FBSNG WWW

Page 9: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 9

Framework Architecture

RRD Tool framework is used to store and to manipulate data

Data is retrieved from Monitoring Repository by a daemon in 5 min. intervals

Data are pre-processed and RRD files are updated Apache/PHP and RRD tools are accessing these files and

are creating statistics per host and per cluster In connection with CDB also configuration information is

provided JPGraph (PHP) is used to provide access to information

in graphical form from the MR that is not available through RRD Framework

Page 10: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 10

Cluster information

Page 11: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 11

Host information

Page 12: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 12

JpGraph and host reboots

Page 13: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 13

Scalability

Scalability is usually an issue with large scale monitoring frameworks

Our framework currently encompasses ~2000 computers at CERN and is scalable to more than 10.000 computers

RRD Tool reduces need to access directly MR (Oracle) and provides cached information

Our framework provides support for RRD framework clusters and is expandable – currently uses about 40 most common performance metrics

Page 14: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 14

Issues and future work

RRD Tool framework does not contain certain features that we have added to it – support for uploading historical data, easy removal and addition of metrics,…

Current development: Dynamic configuration of stored data in connection

with CDB (configuration DB) Packaging and providing site independent structure Expanding framework for Web displays – on demand

correlations, manipulation of cluster configuration,… Summary displays for exception metrics

Page 15: 26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO .

26/05/2004 HEPIX, Edinburgh, May 24-28 15

Conclusion

The framework is currently in deployment at CERN

Already help for sysadmins, developers, experiments in data challenges

Framework provides an easy overview of the computing capabilities at our computing center

It is alive and is currently being improved to suit user needs, to provide centralized information, to provide more functionality