CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Open projects in Grid Monitoring
IT-GS-MDS Section Meeting
25th January 2008
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
WLCG Monitoring Working Groups• 3 groups proposed by Ian Bird to the LCG
Management Board, Oct 06.– Goal to improve the reliability of the WLCG grid
2
Grid ServicesGrid sensors
TransportRepositories
Views…….
Grid ServicesGrid sensors
TransportRepositories
Views…….
System ManagementFabric management
Best PracticesSecurity
…….
System AnalysisApplication monitoring
……
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Reliability is our reason
• Our goal is to improve the reliability of the Grid
• WLCG availability level for a Tier-2 is 95%– Greater for Tier-0 & Tier-1s
• What do we need to do ?– Detect problems before users do !– Reduce time to respond to problems
• Approach is to put the monitoring and alarms close to the site administrators
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
High-level Model
See https://twiki.cern.ch/twiki/pub/LCG/GridServiceMonitoringInfo/0702-WLCG_Monitoring_for_Managers.pdf for details
4
LEMON
Nagios
SAM
R-GMA
SAME-WS
GridView
GridView
ExperimentDashboard
GridIce
GridIceHTTP
LDAP
GOCDB
Dashboard
GridView
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Projects
• Nagios Site Monitoring (Emir)– NCG rewrite, local tests on service (Emir)– Improved Publishers (Pranabesh)– Yaim for Nagios
• Messaging Infrastucture (James, Daniel)• OSG-SAM Integration using Messaging (Piotr,
Arvind Gopu, Rob Quick)• GridMap (Max)
– Dashboard Integration
• GridView (4xBARC)– Including quattorization – Gridview using Messaging for producers (GridView)
Presentation title - 5
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Projects
• SLS/Nagios Integration (Joanna ASGC, Sebastien Lopienski)
• APEL using Messaging (STFC, Piotr)• RDF Schema for monitoring (Piotr, … )• New SAM Portal (IT-GS, …)
– (using CMS SAM Work?)
• Management Dashboard (John Shade, …)• LEMON site monitoring (James)• GOCDB as Topology Database (STFC effort, 1
BARC from Feb'08)• "Service Cards" (Oliver Keeble, 1 BARC from
Feb'08)
Presentation title - 6
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
CCRC Reporting requirements
Presentation title - 7
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Measuring according to MoU
• WLCG MoU is what sites have agreed to– But we don’t measure it right now!
Presentation title - 8
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
CCRC’08 GridMap
Presentation title - 9
• Combines Production Status of service with availability– And dashboard metrics (in a 3rd map)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Summary
• We’re involved in many projects – Most of the effort is external– CERN does architecture, project management,
coordination
• Main areas– Nagios site monitoring– Messaging for monitoring– SAM/GridView futures– CCRC’08 and WLCG operational monitoring
10