AFFAIR – fabric monitoring AFFAIR – fabric monitoring [email protected]ROOT 2005 Tome Antičić Ruđer Bošković Institute, Zagreb,Croatia ALICE,CERN AFFAIR AFFAIR a flexible fabric and application a flexible fabric and application information recorder information recorder
15
Embed
AFFAIR – fabric monitoring [email protected] ROOT 2005 Tome Antičić Ruđer Bošković Institute, Zagreb,Croatia ALICE,CERN Tome Antičić Ruđer Bošković.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AF
FA
IR –
fab
ric
mon
itor
ing
AF
FA
IR –
fab
ric
mon
itor
ing
tom
e.an
tici
c@ce
rn.c
hR
OO
T 2
005
Tome Antičić
Ruđer Bošković Institute, Zagreb,Croatia
ALICE,CERN
AFFAIRAFFAIR
a flexible fabric and application information a flexible fabric and application information recorderrecorder
AFFAIRAFFAIR
a flexible fabric and application information a flexible fabric and application information recorderrecorder
AF
FA
IR –
fab
ric
mon
itor
ing
AF
FA
IR –
fab
ric
mon
itor
ing
tom
e.an
tici
c@ce
rn.c
hR
OO
T 2
005
Why/What is AFFAIR?Why/What is AFFAIR? Why/What is AFFAIR?Why/What is AFFAIR?
SWITCH
PDSPDS PDSPDS PDSPDS PDSPDS
1.25 GB/s
Gigabit SWITCH
GDC
40 Mb/sec
60 Mb/s
MultiEvent
bufffers
Inner tracking system
TPC TRD Particle identifcation
Muon Trigger detectors
Trigger data
L0 trigger
L1 trigger
L2 trigger
1.2 msec
5.5 msec
88.0 msec
Trigger system
x 50 GDC GDCGDC
EDM
1oo Mb/s
L3 trigger
x 278
216
x 435
216
x 334
DDL
RORCRORC
LDC
RORCRORC
LDC
RORCRORC
LDC
RORCRORC
LDC
RORCRORC
LDC
RORCRORC
LDC
2161oo Mb/s
DDL-Detector Data Link
RORC-Read-Out Receiver Card
LDC-Local Data Concentrator
GDC-Global Data Collector
EDM-Event Destination ManagerPDS-Permanent Data Storage
DATE
But, Affair is also able to run in stand alone mode (no DATE)
AF
FA
IR –
fab
ric
mon
itor
ing
AF
FA
IR –
fab
ric
mon
itor
ing
tom
e.an
tici
c@ce
rn.c
hR
OO
T 2
005
RequirementsRequirementsRequirementsRequirements
Monitor system performance (bandwidth, CPU, disk usage, …) Monitor DATE performance (LDC/GDC/DDL bandwidth, events recorded,
…) Need down to 10 (or even less) sec updates
Should be as “invisible” as possible No growing (or better yet none) logfiles on monitored nodes Not cpu intensive Not network intensive
Web access to processed, real time data in the form of graphs, histograms,..
Scalable – should work equally well for 10 as for 1000 computers
All monitored data should be permanently stored for offline analysisHas to work, with no lost data, crashes, etc, no maintainance
So some choices made, wich may not be optimal, but gets the job done
Round robin excellent way to write/read file fast and easy, with no performance lossWorks with fixed amount of data (fixed time depth), so unchanging size
All graphs created using one configuration file Completely defines units/ labels/ if graphs aggregate / if graphs superimposed Thus no code intervention needed to create the plots
New monitored variables can be added and configured easily
GUI in process
But not easy:as far as I am aware, cannot easily add rows of data
AF
FA
IR –
fab
ric
mon
itor
ing
AF
FA
IR –
fab
ric
mon
itor
ing
tom
e.an
tici
c@ce
rn.c
hR
OO
T 2
005
ConclusionConclusionConclusionConclusion
AFFAIR successfully monitors hundreds of nodes Field tested in ALICE Data Challenges
ROOT huge part of it
It is a work in progress: Much more detailed offline analysis Add feature to see performance data/plots on mobiles/palm pilots A lot more work on the GUI Add high/low warnings …