Top Banner
Management of the LHCb DAQ Network Guoming Liu *† , Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy
20

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Jan 04, 2016

Download

Documents

Aron Bruce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Management of the LHCb DAQ Network

Guoming Liu*†, Niko Neufeld*

* CERN, Switzerland† University of Ferrara, Italy

Page 2: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

2

Outline

Introduction to LHCb DAQ system

Network Monitoring based on SCADA system

Network Configuration

Network Debugging

Status of LHCb network installation and deployment

Page 3: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

3

LHCb online system

LHCb Online system consists of three major components:

Data Acquisition (DAQ) transfers the event data from the detector front-end electronics

to the permanent storage

Timing and Fast Control (TFC) drives all stages of the data readout of the LHCb detector

between the front-end electronics and the online processing farm

Experiment Control System (ECS), controls and monitors all parts of the experiment: the DAQ

System, the TFC Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure etc.

Page 4: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

4

LHCb online system

Control and Monitoring data

CASTOR

SWITCH

HLT farm

Detector

TFC System

SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH

READOUT NETWORK

L0 triggerLHC clock

MEP Request

Event building

Front-End

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Readout Board

Expe

rimen

t Con

trol

Sys

tem

(EC

S)

VELO ST OT RICH ECal HCal Muon

L0 Trigger

Event dataTiming and Fast Control Signals

SWITCH

MON farm

CPU

CPU

CPU

CPU

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

FEE FEE FEE FEE FEE FEE FEE

Page 5: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

5

LHCb online network

Two large scale Ethernet networks: DAQ network Dedicated to data

acquisition Control network For the instruments and

computers in LHCb experiment

In total: ~170 switches ~9000 ports

TELL1 ccpc switches

sw-d3b07-c1TFC

sw-d3b07-d1

Dummy det.sw-d2e01-d1

sw-daq-01

sw-agg-01

sw-agg-02

HLT FARMsw-d1exx-d1

HLT FARMsw-d1axx-d1

sw-d1exx-c1

sw-d1axx-c1

sw-d2e01-c1

sw-ux-01

L0Muon Trgsw-d3a01-01Sw-d3a03-01

sw-d2c08-01instruments

sw-d2c05-01

sw-d2c05-02

sw-d2d05-01

Storage Aggregationsw-d2b07-s1

sw-d2b05-s1

sw-d2b01-s1

sw-d2a07-d1

sw-d2a08-d1

DAQCONTROL

MONITORING

UKL1RICH1: sw-d3c01-01RICH2: sw-d3c04-01

sw-d2a07-c1

sw-d2a08-c1

sw-d2a08-01

sw-d2c05-m1IP: 10.132.10.21

10 G

DATA AGGREGATION

STORAGEsw-d2b04-s1sw-d2b03-s1sw-d2b02-s1

sw-d1dxx-c1sw-d2cxx-c1sw-d2bxx-c1

sw-d1dxx-c1sw-d2cxx-c1sw-d2bxx-c1

sw-d2d07-c1Calibration Farm

Page 6: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

6

LHCb DAQ network

DAQ works in a push mode

Components: Readout board:

TELL1/UKL1 In total: ~330

Aggregation switches Core DAQ switch: Force10 E1200i

Supports up to 1260 GbE ports

Switch capacity: 3.5Tb/s Edge switches

Core Switch

HLT CPU

50 EdgeSwitches

~330 Readout Boards

HLT CPUHLT CPU

Storage Aggregation

CASTOR

AggregationSwitches

Page 7: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

7

LHCb DAQ network

Core Switch

HLT CPU

50 EdgeSwitches

~330 Readout Boards

HLT CPUHLT CPU

Storage Aggregation

CASTOR

AggregationSwitches

Protocols Readout: MEP light-weight datagram

protocol over IP Storage: standard TCP/IP

Network throughputs Read out: ~35 GByte/s L0 trigger accept rate: 1

MHz Avg. event size: ~ 35 kByte

Storage: ~ 70 MByte/s HLT accept rate: ~ 2 kHz

~280 Gb/s

~560 Mb/s

Page 8: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

8

Network Monitoring

Part of the LHCb ECS Uses the same tool and

framework Provides the same

operation interface

Implementation Monitoring and

integration: PVSS and JCOP Data collection: Varied front-end

processors Data exchange: Distributed Information

Management (DIM)

Page 9: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

9

Architecture of the Network Monitoring

Network Monitoring

Monitoring the status of the LHCb DAQ network at different levels Topology IP routing Traffic Hardware/system

Page 10: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

10

Network Monitoring

Monitoring the status of the LHCb DAQ network at different levels Topology IP routing Traffic Hardware/system

Structure of the Finite State Machinefor Network Monitoring

Page 11: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

11

Network Monitoring: Topology

The topology is quite “static” NeDi: an open source tool to discover the network

Discovery of the network topology based on Link Layer Discovery Protocol (LLDP)

Queries the neighbors of the seed, and then the neighbors of those neighbors, and so on until all the devices have been discovered in the network.

Discovery of the network nodes

All information is stored in the database, and can be queried by PVSS

PVSS Monitors the topology only (the uplinks between the switches). The nodes are monitored by Nagios.

Page 12: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

12

Network Monitoring: IP routing

Monitoring the status of the routing with Internet Control Message Protocol (ICMP), specifically “ping“

Three stages for the DAQ: Entire read-out event from the readout board to HLT farm ICMP not fully implemented in the readout board, a general

computer is inserted to simulate the the readout board: Test the status of the readout board using “arping” Test the availability of the HLT nodes using “ping”

Selected events from the HLT to the LHCb online storage From the online storage to CERN CASTOR

The front-end script gets the result and exchanges the message with PVSS using DIM

Page 13: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

13

Network Monitoring: traffic

Front-end processors: Collect all the interface counters from the network devices

using SNMP Input and output traffic Input and output errors, discards

Exchange data as a DIM server

PVSS: Receives the data via PVSS-DIM bridge Analyzes the traffic and archives them Displays the current status and trending of the bandwidth

utilization Issues alarm in case of error

Page 14: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

14

Network Monitoring: traffic

Page 15: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

15

Network Monitoring: hardware/system

Syslog server is setup to receive the syslog messages from the network devices and parse the messages.

When the network devices run into problems, the error messages will be generated and sent to the syslog server as configured in the network device Hardware: temperature, fan status, power supply status System: CPU, memory, login authentication etc.

Syslog can collect some information not covered by SNMP

All the collected messages will be communicated to PVSS

Page 16: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

16

Network Configuration

The LHCb online network system is quite large: Different devices with different OS and command sets But quite static luckily, only a few features are essential for

configuring the network devices.

Currently a set of Python scripts is used for configuring the network devices, using module pexpect for interactive CLI access. Initial setup for new installed switch Firmware upgrade Configuration file backup and restore

Page 17: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

17

Network Configuration

NeDi CLI access

Web-based interface

Possible to select a set of switches by type, IP, or name etc.

Can apply a batch of commands on a set of switches

Page 18: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

18

Network Diagnostics Tools

sFlow Sampler sFlow is a mechanism to capture packet headers, and collect

the statistics from the device, especially in high speed networks

samples the packet on the switch port and displays the header information

It is very useful to debug the packet loss problem, e.g. caused by wrong IP or MAC address

Relative high speed traffic monitoring Queries the counters for selected interfaces using SNMP or

CLI with a finer time resolution Shows the utilization for the selected interfaces

Page 19: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

19

Status of Network Installation and Deployment

Current setup: With 2 aggregation switches Only 2 linecards inserted to the core DAQ switch For L0 trigger rate at ~200kHz

Upgrade for 1 MHz full speed readout. Core DAQ switch: Forec10 E1200i

14 linecards, 1260 GbE ports will be ready at the end of June Upgrade from Terascale to Exascale: double the switch capacity

and all ports run in line rate All readout boards will be connected to the core DAQ switch

directly

Page 20: Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

20