Top Banner
LHCb Trigger & DAQ an Introductory Overview Niko Neufeld CERN/PH Department Yandex, July 3 rd Moscow
40

Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Oct 31, 2014

Download

Business

Yandex

Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012

Niko Neufeld, CERN
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHCb Trigger & DAQ an Introductory Overview

Niko NeufeldCERN/PH Department

Yandex, July 3rd Moscow

Page 2: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

The Large Hadron Collider

LHC Trigger & DAQ - Niko Neufeld, CERN 2

Page 3: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

3

Physics, Detectors, Trigger & DAQ

High throughput DAQ, Niko Neufeld, CERN

rare, need many collisions

High rate collider

Fast electronics

Data acquisitionTrigger

decisions

Event Filter

MassStorage

data

decisions

data

signals

Page 4: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

4

The Data Acquisition Challenge at LHC

?

• 15 million detector channels• @ 40 MHz• = ~15 * 1,000,000 * 40 * 1,000,000 bytes

• = ~ 600 TB/sec

LHC Trigger & DAQ - Niko Neufeld, CERN

Page 5: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHC Trigger & DAQ - Niko Neufeld, CERN 5

Should we read everything? • A typical collision is “boring”

– Although we need also some of these “boring” data as cross-check, calibration tool and also some important “low-energy” physics

• “Interesting” physics is about 6–8 orders of magnitude rarer (EWK & Top)

• “Exciting” physics involving new particles/discoveries is 9 orders of magnitude below tot– 100 GeV Higgs 0.1 Hz*– 600 GeV Higgs 0.01 Hz

• We just need to efficiently identify these rare processes from the overwhelming background before reading out & storing the whole event

109 Hz

5106 Hz

EWK: 20–100 Hz

10 Hz

*Note: this is just the production rate, properly finding it is much rarer!

Page 6: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

6

Know Your Enemy: pp Collisions at 14 TeV at 1034 cm-2s-1

• (pp) = 70 mb --> >7 x 108 /s (!)

• In ATLAS and CMS* 20 – 30 min bias events overlap

• HZZZ mmH 4 muons:the cleanest(“golden”)signature

Reconstructed tracks with pt > 25 GeV

And this (not the H though…)

repeats every 25 ns…

*)LHCb @4x1033 cm-2-1 isn’t much nicer and in Alice (PbPb) is even more busyLHC Trigger & DAQ - Niko Neufeld, CERN

Page 7: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Trivial DAQ with a real trigger 2

High throughput DAQ, Niko Neufeld, CERN 7

ADC

Sensor

Delay

Proces-sing

Interrupt

Discriminator

Trigger

Start

Deadtime (%) is the ratio between the time the DAQis busy and the total time.

SetQClear

and not

Busy Logic

Ready

storage

Page 8: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

A “simple” 40 MHz track trigger – the LHCb PileUp system

LHC Trigger & DAQ - Niko Neufeld, CERN 8

Niko Neufeld
Page 9: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Finding vertices in FPGAs

• Use r-coordinates of hits in Si-detector discs (detector geometry made for this task!)

• Find coincidences between hits on two discs

• Count & histogram

LHC Trigger & DAQ - Niko Neufeld, CERN 9

Page 10: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHCb Pileup Finding multiple vertices and quality

LHC Trigger & DAQ - Niko Neufeld, CERN 10

Comparing with the “offline” truth(full tracking, calibration, alignment)

Page 11: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHCb Pileup Algorithm

• Time-budget for this algorithm about 2 us

• Runs in conventional FPGAs in a radiation-safe area

• Limited to low pile-up (ok for LHCb)

LHC Trigger & DAQ - Niko Neufeld, CERN 11

Page 12: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

After the TriggerDetector Read-out and DAQ

Page 13: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

DAQ design guidelines

• Scalability – change in event-size, luminosity (pileup!)• Robust (very little dead-time, high efficiency, non-

expert operators) intelligent control-systems• Use industry-standard, commercial technologies (long-

term maintenance) PCs, Ethernet• Low cost PCs, standard LANs• High band-width (many Gigabytes/s) use local area

networks (LAN)• “Creative” & “Flexible” (open for new things) use

software and reconfigurable logic (FPGAs)

LHC Trigger & DAQ - Niko Neufeld, CERN 13

Page 14: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

One network to rule the all• Ethernet, IEEE 802.3xx, has almost become

synonymous with Local Area Networking• Ethernet has many nice features: cheap,

simple, cheap, etc…• Ethernet does not:

– guarantee delivery of messages – allow multiple network paths– provide quality of service or bandwidth

assignment (albeit to a varying degree this is provided by many switches)

• Because of this raw Ethernet is rarely used, usually it serves as a transport medium for IP, UDP, TCP etc…

High throughput DAQ, Niko Neufeld, CERN 14

Ethernet

• Flow-control in standard Ethernet is only defined between immediate neighbors

• Sending station is free to throw away x-offed frames (and often does )

Xoff data

Page 15: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Generic DAQ implemented on a LAN

LHC Trigger & DAQ - Niko Neufeld, CERN 15

Powerful Core routers

“Readout Units”for protocol adaptation

Custom links from thedetector

Edge switches

Servers for eventfiltering

Typical number of piecesDetector

1

1000

100 to 1000

2 to 8

50 to 100

> 1000

Page 16: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

16

Congestion

2• "Bang" translates into

random, uncontrolled packet-loss

• In Ethernet this is perfectly valid behavior and implemented by many low-latency devices

• This problem comes from synchronized sources sending to the same destination at the same time

• Either a higher level “event-building” protocol avoids this congestion or the switches must avoid packet loss with deep buffer memories

LHC Trigger & DAQ - Niko Neufeld, CERN

2

2

Bang

Page 17: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

17

Push-Based Event Building with store& forward switching and load-balancing

Data AcquisitionSwitch

1Event Builders notify Event Manager available capacity 2

Event Managerensures that data are sent only to nodes with available capacity 3

Readout system relies on feedback from Event Builders

“Send next event

to EB1”

“Send next event

to EB2”

EB1: EB2:EB3:

0 00

“Send me an event!”

“Send me an event!”

1 11

“Send me an event!”

“Send me an event!”

0 11

0 01 “Send

next event to EB3”

1 01

Event Builder 1

Event Builder 2

Event Builder 3

LHC Trigger & DAQ - Niko Neufeld, CERN

Event Manager

Sources do not buffer –so switch must buffer to avoid packet loss due to overcommitment

Page 18: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

18

LHCb DAQ

SWITCH

HLT farm

Detector

TFC System

SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH

READOUT NETWORK

L0 triggerLHC clock

MEP Request

Event building

Front-End

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Readout Board

Expe

rimen

t Con

trol

Sys

tem

(EC

S)

VELO ST OT RICH ECal HCal MuonL0

Trigger

Event dataTiming and Fast Control SignalsControl and Monitoring data

SWITCH

MON farm

CPU

CPU

CPU

CPU

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

55 GB/s

200 - 300 MB/s

Average event size 55 kBAverage rate into farm 1 MHzAverage rate to tape 4 – 5 kHz

LHC Trigger & DAQ - Niko Neufeld, CERN

Page 19: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHCb DAQ

• Events are very small (about 55 kB) total – each read-out board contributes about 200 bytes (only!!)– A UDP message on Ethernet takes 8 + 14 + 20 + 8 + 4 = 52

bytes 25% overhead(!)• LHCb uses coalescence of messages, packing about 10

to 15 events into one message (called MEP) message rate is ~ 80 kHz (c.f. CMS, ATLAS)

• Protocol is a simple, single stage push, every farm-node builds complete events, the TTC system is used to assign IP addresses coherently to the read-out boards

LHC Trigger & DAQ - Niko Neufeld, CERN 19

Page 20: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

DAQ network parametersLink load Technology Protocol Eventbuilding[%]

30% Ethernet TCP/IP pull InfiniBand (HLT) pull (RDMA)

20% 10 Gbit/s (L2) Ethernet TCP/IP pull 50% (Event-collection)

65% Myrinet Myrinet push (with credits) 40 – 80% Ethernet TCP/IP pull

40 - 80% Ethernet UDP push

ALIC

EAT

LAS

CMS

LHCb

20LHC Trigger & DAQ - Niko Neufeld, CERN

Page 21: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHC Trigger/DAQ parameters (as seen 2011/12)

# Level-0,1,2 Event Network StorageTrigger Rate (Hz) Size (Byte) Bandw.(GB/s) MB/s (Event/s)

4 Pb-Pb 500 5x107 25 4000 (102) p-p 103 2x106 200 (102)

3 LV-1 105 1.5x106 6.5 700 (6x102) LV-2 3x103

2 LV-1 105 106 100 ~1000 (102)

2 LV-0 106 5.5x104 55 250 (4.5x103)

ALIC

EAT

LAS

CMS

LHCb

21LHC Trigger & DAQ - Niko Neufeld, CERN

Page 22: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

High throughput DAQ, Niko Neufeld, CERN 22

High Level Trigger Farms

And that, in simple terms, is what we do in the High Level Trigger

Page 23: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHC Trigger & DAQ - Niko Neufeld, CERN 23

Online Trigger Farms 2012 ALICE ATLAS CMS LHCb

# cores(+ hyperthreading)

2700 17000 13200 15500

# servers (mainboards)

~ 2000 ~ 1300 1574

total available cooling power

~ 500 ~ 820 800 525

total available rack-space (Us)

~ 2000 2400 ~ 3600 2200

CPU type(s) AMD Opteron, Intel 54xx, Intel 56xx

Intel 54xx, Intel 56xx

Intel 54xx, Intel 56xxIntel E5-2670

Intel 5450,Intel 5650,AMD 6220

And counting…

Page 24: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Not yet approved!

LHC planning

LHC Trigger & DAQ - Niko Neufeld, CERN 24

Long Shutdown 1 (LS1)

Long Shutdown 2 (LS2)Long Shutdown 3 (LS3)

CMS track-trigger

ALICE continuous read-outLHCb 40 MHz read-out

CMS: Myrinet InfiniBand / EthernetATLAS: Merge L2 and EventCollection infrastructures

Page 25: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Motivation

• The LHC (large hadron collider) collides protons every 25 ns (40 MHz)

• Each collision produces about 100 kB of data in the detector

• Currently a pre-selection in custom electronics rejects 97.5% of these events unfortunately a lot of them contain interesting physics

• In 2017 the detector will be changed so that all events can be read-out into a standard compute platform for detailed inspection

Niko Neufeld, CERN 25

Page 26: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHCb after LS2

LHC Trigger & DAQ - Niko Neufeld, CERN 26

• Ready for all software trigger (resources permitting)• 0-suppression on front-end electronics mandatory!• Event-size about 100 kB, readout-rate up to 40 MHz• Will need a network scalable up to 32 Tbit/s:

InfiniBand, 10/40/100 Gigabit Ethernet?

Page 27: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Key figures

• Minimum required bandwidth: > 32 Tbit/s • # of 100 Gigabit/s links > 320• # of compute units > 1500• An event (“snapshot of a collision”) is about 100

kB of data• # of events processed every second: 10 to 40

millions• # of events retained after filtering: 20000 to

30000 (data reduction of at least a factor 1000)

Niko Neufeld, CERN 27

Page 28: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

GBT: custom radiation- hard link over MMF, 3.2 Gbit/s (about 10000)

Input into DAQ network (10/40 Gigabit Ethernet or FDR IB) (1000 to 4000)

Output from DAQ network into compute unit clusters (100 Gbit Ethernet / EDR IB) (200 to 400 links)

Compute units could be servers with GPUs or other coprocessors

LHCb DAQ as of 2018

LHC Trigger & DAQ - Niko Neufeld, CERN 28

Detector

DAQ network

100 m rock

Readout Units

Compute Units

Page 29: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Readout Unit

• Readout Unit needs to collect custom-links• Some pre-processing• Buffering• Coalescing of data-fragment reduce message-rate /

transport overheads• Needs an FPGA• Sends data using standard network protocol (IB, Ethernet)• Sending of data can be done directly from the FPGA or via

a standard network silicon• Works together with Compute Units to build events

Niko Neufeld, CERN 29

Page 30: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Compute Unit

• A compute unit is a destination for the event-data fragments from the readout units

• It assembles the fragments into a complete “event” and runs various selection algorithms on this event

• About 0.1 % of events is retained• A compute unit will be a high-density server

platform (mainboard with standard CPUs), probably augmented with a co-processor card (like Intel MIC or GPU)

Niko Neufeld, CERN 30

Page 31: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Future DAQ systems: trends

• Certainly LAN based – InfiniBand deserves a serious evaluation for high-bandwidth (> 100

GB/s)– In Ethernet if DCB works, might be able to build networks from

smaller units, otherwise we will stay with large store&forward boxes• Trend to “trigger-free” do everything in software bigger

DAQ will continue– Physics data-handling in commodity CPUs

• Will there be a place for multi-core / coprocessor cards (Intel MIC / CUDA)?– IMHO this will depend on if we can establish a development

framework which allows for longterm maintenance of the software by non-”geek” users, much more than on the actual technology

High throughput DAQ, Niko Neufeld, CERN 31

Page 32: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Fat-Tree Topology for One Slice• 48-port 10 GbE switches• Mix readout-boards (ROB) and filter-farm-servers in one switch

– 15 x readout-boards– 18 x servers– 15 x uplinks

Non-block switchinguse 65% of installed bandwidth(classical DAQ only 50%)

• Each slice accomodates– 690 x inputs (ROBS)– 828 x outputs servers

Ratio (server/ROB) is adjustable

High throughput DAQ, Niko Neufeld, CERN 32

Page 33: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

33

Pull-Based Event Building

Data AcquisitionSwitch

1Event Builders notify Event Manager of available capacity 2

Event Manager elects event-builder node 3

Readout traffic is driven by Event Builders

“EB1, get next

event”

“EB2, get next

event”

EB1: EB2:EB3:

0 00

“Send me an event!”

“Send me an event!”

1 11

“Send me an event!”

“Send me an event!”

0 11

0 01

1 01

Event Builder 1

Event Builder 2

Event Builder 3

LHC Trigger & DAQ - Niko Neufeld, CERN

“Send event to EB 1!”“Send event

to EB 1!”“Send event to EB 1!”“Send event

to EB 1!”

Page 34: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Summary

• Large modern DAQ systems are based entirely (mostly) on Ethernet and big PC-server farms

• Bursty, uni-directional traffic is a challenge in the network and the receivers, and requires substantial buffering in the switches

• The future:– It seems that buffering in switches is being reduced (latency vs. buffering)– Advanced flow-control is coming, but it will need to be tested if it is sufficient for

DAQ– Ethernet is still strongest, but InfiniBand looks like a very interesting alternative– Integrated protocols (RDMA) can offload servers, but will be more complex– Integration of GPUs, non-Intel processors and other many-cores will be need to

be studied

• For the DAQ and triggering the question is not if we can do it, but how we can do it so we can afford it!

High throughput DAQ, Niko Neufeld, CERN 34

Page 35: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

More Stuff

Page 36: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

36

Cut-through switchingHead of Line Blocking

2 24

Packet to node 4 must waiteven though port to node 4 is free

• The reason for this is the First in First Out (FIFO) structure of the input buffer

• Queuing theory tells us* that for random traffic (and infinitely many switch ports) the throughput of the switch will go down to 58.6% that means on 100 MBit/s network the nodes will "see" effectively only ~ 58 MBit/s

*) "Input Versus Output Queueing on a Space-Division Packet Switch"; Karol, M. et al. ; IEEE Trans. Comm., 35/12

LHC Trigger & DAQ - Niko Neufeld, CERN

42

1 3

Page 37: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

GBT: custom radiation- hard link over MMF, 3.2 Gbit/s (about 10000)

Input into DAQ network (10/40 Gigabit Ethernet or FDR IB) (1000 to 4000)

Output from DAQ network into compute unit clusters (100 Gbit Ethernet / EDR IB) (200 to 400 links)

Event-building

Niko Neufeld, CERN 37

Detector

DAQ network

100 m rock

Readout Units

Compute Units

Readout Units send to Compute UnitsCompute Units receive passively“Push-architecture”

Page 38: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

Runcontrol

© W

arne

r B

ros.

Page 39: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHC Trigger & DAQ - Niko Neufeld, CERN 39

Runcontrol challenges

• Start, configure and control O(10000) processes on farms of several 1000 nodes

• Configure and monitor O(10000) front-end elements

• Fast data-base access, caching, pre-loading, parallelization and all this 100% reliable!

Page 40: Niko Neufeld "A 32 Tbit/s Data Acquisition System"

LHC Trigger & DAQ - Niko Neufeld, CERN 40

Runcontrol technologies

• Communication:– CORBA (ATLAS)– HTTP/SOAP (CMS)– DIM (LHCb, ALICE)

• Behavior & Automatisation:– SMI++ (Alice) – CLIPS (ATLAS)– RCMS (CMS)– SMI++ (in PVSS) (used also in the DCS)

• Job/Process control:– Based on XDAQ, CORBA, … – FMC/PVSS (LHCb, does also fabric monitoring)

• Logging:– log4C, log4j, syslog, FMC (again), …