Introduction LHCOPN dashboard (proposal functional design) Monitor Working Group: • Initiated in Bologna 10 th & 11 th December 2009 • WLCG MB mandate (see url below) • First meeting 22 th January 2010 • TC 26 th May 2010 • TC 15 th June 2010 • Barcelona 28 th and 29 th June 2010: first proposal Chairman: John Shade (CERN) Website: https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWG Full version of functional design proposal on above url. My name Hanno Pet <[email protected]> (NL-T1 / SARA) SARA Computing & Networking service, 25-6-2010
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
LHC experiments and WLCG users have not enough insight in the functioning of the LHCOPN because:
• Monitoring is decentralized at T0/T1 sites
• Monitoring is not accessible to them
The dashboard should solve these problems!
SARA Computing & Networking service, 25-6-2010
Requirements (1/4)
The requirements of the dashboard are as follows:
• Must only provide information about the LHCOPN keeping in mind the way application layers are using the LHCOPN. This means a full mesh of measurements is required
• Must provide correct and up to date information about each site’s IPv4 connectivity in the LHCOPN
• Must be simple for the LHC experiments and the WLCG user community
• Must provide more in-depth information for the T0/T1 sites router operators. The router operators must be able to drill down into the dashboard to see which measurements are causing the degraded or down status
SARA Computing & Networking service, 25-6-2010
Requirements (2/4)
• Must display a full mesh of end-to-end IPv4 unicast connectivity in the LHCOPN between each T0/T1 site
• Must use the application programming interface (API) of the perfSONAR-MDM measurement points to collect the data which is necessary for the functioning of the dashboard
• Must collect and display One Way Delay data gathered by the perfSONAR-MDM measurement points (and other parameters in the future)
• Must store (historical) data in its own database
SARA Computing & Networking service, 25-6-2010
Requirements (3/4)
• Must add new data from perfSONAR-MDM measurement points to its own database every <to be defined> minute(s)
• Must refresh dashboard status each <to be defined> minute
• Must provide an API for T0/T1 sites to generate alarms in their own NMS
• Must be able to make end-to-end IPv4 unicast connectivity reports
SARA Computing & Networking service, 25-6-2010
Requirements (4/4)
• Must be accessible via a web (https) interface for the LHC experiments and WLCG users with a grid certificate
• More detailed information will be available for the T0/T1 sites router operators with a grid certificate
• Must provide an explanation of the impact if end to end IPv4 unicast connectivity between two sites becomes degraded or down or if no data is available
SARA Computing & Networking service, 25-6-2010
Current perfSONAR-MDM implementation in LHCOPN (1/2)
The GEANT application service desk has installed perfSONAR-MDM measurement points at each T0/T1 site with the following applications/tools:
• Weathermap based on End to End Monitoring (E2EMON) information
• E2EMON information (no E2EMON measurement point)
• perfSONAR User Interface (UI)Alarm Service (Prototype based on Nagios)
SARA Computing & Networking service, 25-6-2010
Current perfSONAR-MDM implementation in LHCOPN (2/2)
• Hades Performance Measurements• Bandwidth Test Control / Achievable Bandwidth (BWCTL,
automated 1Gbit/s TCP Bandwidth Control Test)• One Way Delay (OWD) measurements using OWAMP• One Way Delay Variance / Jitter (OWDV) measurements
using OWAMP• Packet loss (measured between Hades nodes)• Traceroute (number of hops between each Hades nodes)• Possibly duplicate packets (measured between Hades
nodes)• Possibly out of order packets (measured between Hades
nodes)
SARA Computing & Networking service, 25-6-2010
Current perfSONAR-MDM setup en future dashboard
SARA Computing & Networking service, 25-6-2010
Dashboard approach
The first version of the dashboard must be based on:• The “keep it simple” principle• The data which perfSONAR-MDM is already collecting at the
moment
Proposal is to use One Way Delay (OWD) (using One Way Active Measurement Protocol (OWAMP)) to make the first version of the dashboard to “monitor” end-to-end IPv4 connectivity between each site in the LHCOPN (full mesh).
So OWAMP is “only” used to monitor connectivity and not yet used to monitor the delay itself.
Later versions of the dashboard could include parameters that are new(er) to perfSONAR-MDM (i.e. packet loss, traceroute, achievable bandwidth, interface status, BGP status, OWD and OWDV)
SARA Computing & Networking service, 25-6-2010
How it might look like (1/3)(current view)
SARA Computing & Networking service, 25-6-2010
End to End IPv4 unicast connectivity availability (current view)
End to End IPv4 unicast connectivity availability weekly view 17-6-2010
1 2 3 4 5 6 7 Availabilty
From IT-INFN-CNAF to US-BNL 95%From IT-INFN-CNAF to CH-CERN 100%From IT-INFN-CNAF to US-FNAL-CMS 85%From IT-INFN-CNAF to FR-CCIN2P3 100%From IT-INFN-CNAF to DE-KIT 100%From IT-INFN-CNAF to NDGF 100%From IT-INFN-CNAF to NL-T1 85%From IT-INFN-CNAF to ES-PIC 100%From IT-INFN-CNAF to UK-T1-RAL 100%From IT-INFN-CNAF to TW-ASGC 100%From IT-INFN-CNAF to CA-TRIUMF 96%
Status on the dashboard
The status of the end-to-end IPv4 unicast connectivity between sites must be shown on the dashboard in the following way:
• Normal, availability of the end-to-end IPv4 unicast connectivity between site A en B is 100% in the given timeframe
• Degraded, availability of the end-to-end IPv4 unicast connectivity between site A en B is less then 100% in the given timeframe
• Down, availability of the end-to-end IPv4 unicast connectivity between site A en B is 0% in the given timeframe
• No data, the dashboard server can connect to the perfSONAR-MDM measurement point on site but receives no data from the measurement archives.
SARA Computing & Networking service, 25-6-2010
Notifications
Notification should be done via:
• E-mail
• RSS-feeds
• API for integration into T0/T1 site NMS systems for raising alarms
• Grid Notifications for LHC experiments
We need to discuss this with grid notification experts at the LHC experiments and ask them how they would integrate this in their dashboards.
SARA Computing & Networking service, 25-6-2010
Questions
Interesting to know:
• Is this the right direction for the dashboard?
• Is perfSONAR-MDM able to support this?
• Is it possible to use OWAMP like this?
• Are T0/T1 sites going to use this?
• Are the LHC experiments going to use this?
• Are WLCG users (physicists) going to use this?
• Do we agree on the functional design?
SARA Computing & Networking service, 25-6-2010
WRAP UP
Read the full version of the functional design!
Please send your comments about this functional design to [email protected] before the 5th of July 2010!!