Top Banner
Resource Utilization in the ATLAS Data Acquisition System Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/2010 1
17

Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

Dec 14, 2015

Download

Documents

Mauricio Tripp
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

1

Resource Utilization in the ATLAS Data Acquisition

System

Sander Klous on behalf of the ATLAS CollaborationReal-TimeMay 2010

28/5/2010

Page 2: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

2

Contents

Introduction of the ATLAS DataFlow system

Modeling DataFlow and Resource Utilization

Cost monitoring explained

Example of performance data analysis

Conclusions

28/5/2010

Page 3: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

328/5/2010

DataFlow (1)

To muon calibration

centers

Acronyms:Frontend Electronics (FE)Read Out Driver (ROD)Region of Interest (RoI)Read Out Buffer (ROB)Read Out System (ROS)Trigger Level 2 (L2)Event Filter (EF)

Event Builders

Local Event Storage

Page 4: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

4

Modeling DataFlow and resource utilization

Historically studies have been done with different levels of detail Paper model (static model)▪ Back of the envelope calculations▪ Average data volumes and data fragmentation info

Dynamic model (computer simulation)▪ Discrete event model of the DataFlow system▪ Cross-check with results of the paper model▪ Additional information on queuing in the system

How do these studies match with reality? What predictions can be made for the future?

28/5/2010

Page 5: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

5

Discrete event model (TDR 2003)

28/5/2010

Page 6: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

6

Cost monitoringin the real DAQ system

Introduce a mechanism in the running DAQ system to: Collect performance info (i.e. resource utilization) on

the fly

▪ On event by event basis

Group performance information together Use this information to validate the model

Trigger rates, Processing times

Access to information fragments28/5/2010

Page 7: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

7

Obtaining input data fromthe real system

28/5/2010

Page 8: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

8

Intermezzo (1):Event structure and transport

Data driven Event contains multiple

parts Header Meta data Payload

Meta data added by L2 (L2 result) EF (EF result)

28/5/2010

Event HeaderL2 resultEF result

Event payloadDetector ADetector B

Etc.

Page 9: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

9

Intermezzo (2): Partial event building (PEB) and stripping

Reduced event payload Calibration events Not all detector data needed

Smaller events Partially built at LVL2 Stripped before stored▪ By EF or SFO

Improved efficiency Disk (less storage capacity) Network (reduced bandwidth) CPU (bypass L2/EF if

possible)28/5/2010

Event HeaderL2 resultEF result

Event payloadDetector ADetector B

Etc.

Page 10: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

10

Collect and ship performance data

Performance data stored in L2/EF result: Each event:▪ L1 accept time and HLT host local time▪ HLT application ID▪ L1 and HLT trigger counters▪ L1 and HLT trigger decision bits.

Every 10th event:▪ Start/stop times of HLT algorithms▪ HLT trigger requesting the HLT algorithm▪ RoI information, ROB IDs, ROB request time and

ROB size

28/5/2010

Page 11: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

11

PEB and performance data

Transport information by piggybacking on rejected events that can be built partially: Without event payload (only L2/EF

result) Avoid mixing with other data

Collection rate of buffered information Each Nth rejected event (N=100) Cost algorithm fires, buffer is serialized Typically less than 1 MB/second collected

28/5/2010

Page 12: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

12

DataFlow (2)

28/5/2010

Buffer performance

data

Buffer performance

dataTo muon

calibration centers

Event Builders

Local Event Storage

Page 13: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

13

Results

Separate stream with performance data Automatic NTuple production and analysis Results listed on html pages:

Trigger rates Trigger sequences Processing times

Feedback information for: Operations and menu coordination Performance studies, modeling and

extrapolation

28/5/2010

Page 14: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

14

Example performance study (step 1)

The online L2 monitoring show a long tail in the event processing time (wall clock time):

28/5/2010

Trigger Steering @ L2• Run 142165• L1 BPTX seeding at 5 kHz

Page 15: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

15

Example performance study (step 2)

In our new tool, we identify the dominating algorithm, responsible for the long tail:

28/5/2010

Minimum Bias algorithm @ L2

Page 16: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

16

Example performance study (step 3)

With our tool we can investigate the different aspects of the algorithm:

28/5/2010

CPU consumption is healthy

Typical retrieval time about 1 ms

Problem is in ROB retrieval(congestion?, ROS problem?)

Page 17: Sander Klous on behalf of the ATLAS Collaboration Real-Time May 2010 28/5/20101.

17

Conclusions

Cost monitoring is a valuable new tool for performance measurements

The tool makes intelligent use of existing features in the ATLAS TDAQ system

The tool is operational and is working fine, as demonstrated with the example

Next steps: Validate MC event performance model with real data Modeling with higher luminosity MC events

(extrapolate) Make cost monitoring available online

28/5/2010