Top Banner
ATLAS Data Challenges LCG - PEB meeting CERN December 12th 2001 Gilbert Poulard CERN EP-ATC
22

ATLAS Data Challenges

Mar 19, 2016

Download

Documents

fynn

ATLAS Data Challenges. LCG - PEB meeting CERN December 12th 2001 Gilbert Poulard CERN EP-ATC. Outlook. ATLAS Data Challenges Some considerations. ATLAS Data challenges. Goal understand and validate: our computing model, our data model and our software our technology choices - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ATLAS Data Challenges

ATLAS Data Challenges

LCG - PEB meeting

CERN December 12th 2001

Gilbert PoulardCERN EP-ATC

Page 2: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 2

Outlook ATLAS Data Challenges Some considerations

Page 3: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 3

ATLAS Data challenges Goal

understand and validate: our computing model, our data model and our softwareour technology choices

How?In iterating on a set of DCs of increasing complexity

Ideally: Start with data which looks like real data• Run the filtering and reconstruction chain• Store the output data into our database• Run the analysis• Produce physics results• To study performances issues, database technologies, analysis

scenarios, ...• To identify weaknesses, bottle necks, etc… (but also good points)

But we need to produce the ‘data’ and satisfy ‘some’ communities

• Simulation will be part of DC0 & DC1• Data needed by HLT community

Page 4: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 4

ATLS Data Challenges: DC0 Three ‘original’ paths involving databases:

GeneratorGeant3(ZebraObjy)Athena reconstructionsimple analysis

This is the “primary” chain (100,000 events)Purpose: this is the principal continuity test

Atlfast chain: GeneratorAtlfastsimple analysisDemonstrated for Lund, but (transient) software is changingPurpose: continuity test

Physics TDR(ZebraObjy)Athena reconstructionsimple analysis

Purpose: Objy test? Additional path:

GeneratorGeant4(Objy)Purpose: Robustness test (100,000 events)

Page 5: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 5

ATLAS Data Challenges: DC0 Originally: November-December 2001

'continuity' test through the software chainaim is primarily to check the state of readiness for DC1We plan ~100k Z+jet events, or similarsoftware works:

• issues to be checked include – G3 simulation running with the ‘latest’ version of the geometry– reconstruction running.

• data must be written/read to/from the database Now:

Before Xmas • ~30k events (full simulation) + ~30k events (conversion)• G4 robustness test (~100k events)

Early January• Repeat the exercise with a new release (full chain)

DC0 : End January• Statistics to be defined (~100k events)

Page 6: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 6

ATLAS Data Challenges: DC1 DC1 February-July 2002

reconstruction & analysis on a large scalelearn about data model; I/O performances; identify bottle necks …

use of GRID as and when possible and appropriatedata management

Use (evaluate) more than one database technology (Objectivity and ROOT I/O)

• Relative importance under discussionLearn about distributed analysis

should involve CERN & outside-CERN sitessite planning is going on, an incomplete list already includes sites from Canada, France, Italy, Japan, UK, US, Russia scale 107 events in 10-20 days, O(1000) PC’s

data needed by HLT & Physics groups (others?)simulation & pile-up will play an important roleshortcuts may be needed (especially for HLT)!

checking of Geant4 versus Geant3

Page 7: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 7

ATLAS Data Challenges: DC1

DC1 will have two distinct phasesFirst, production of events for HLT TDR, where the primary concern is delivery of events to HLT community;Second, testing of software (G4, dBases, detector description,etc.) with delivery of events for physics studiesSoftware will change between these two phases

Simulation & pile-up will be of great importancestrategy to be defined (I/O rate, number of “event” servers?)

As we want to do it ‘world-wide’ we will ‘port’ our software to the GRID environment and use as much as possible the GRID middleware (ATLAS kit to be prepared)

Page 8: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 8

ATLAS Data Challenges: DC2

DC2 Spring-Autumn 2003Scope will depend on what has and has not been achieved in DC0 & DC1 At this stage the goal includes:

Use of ‘TestBed’ which will be built in the context of the Phase 1 of the “LHC Computing Grid Project”

• Scale at a sample of 108 events • System at a complexity X% of 2006-2007 system

Extensive use of the GRID middlewareGeant4 should play a major rolePhysics samples could(should) have ‘hidden’ new physicsCalibration and alignment procedures should be tested

May be to be synchronized with “Grid” developments

Page 9: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 9

DC scenarioProduction Chain:

Event generationDetector Simulation Pile-upDetectors responsesReconstructionAnalysis

These steps should be as independent as possible

Page 10: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 10

Production stream for DC0-1 Input Output Framework

Event generation Pythia Herwig Isajet

none OO-db Athena

Detector Simulation Geant3 Dice Geant4

OO-db OO-db

FZ OO-db

Atlsim FADS/Goofy

Pile-up (DC1) & Detector responses

Atlsim Dice

OO-db OO-db

FZ OO-db

Atlsim/Dice Athena

Data conversion FZ OO-db Athena

Reconstruction OO-db OO-db “Ntuple”

Athena

Analysis “Ntuple” Paw / Root Anaphe / Jas

“OO-db” is used for “OO database”, it could be Objectivity, ROOT/IO, …

Page 11: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 11

DC0

Ntuplelike

Pythia, Isajet,Herwig HepMC Obj., Root ATLFAST

OO

Ntuple

Obj., Root

GENZ

G3/DICE RD event ?OO-DB ?

ATHENA reconstruction

Comb. Ntuple

Obj., RootComb. Ntuple

Phys. TDR data

Missing:-- filter, trigger-- HepMC in Root-- ATLFAST output in Root (TObjects)-- Link MC truth - ATLFAST-- Reconstruction output in Obj., Root-- EDM (e.g. G3/DICE input to ATHENA)

ZEBRA

Page 12: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 12

DC1

Ntuple

Pythia, Isajet,Herwig, MyGeneratorModule

HepMC Obj., Root ATLFAST OO

Ntuple

Obj., Root

GENZ

G3/DICE RD event ?OO-DB ?

ATHENA reconstruction

Comb. Ntuple

Obj., RootComb. Ntuple

G4Obj.

Missing:-- filter, trigger -- Detector description-- HepMC in Root -- Digitisation-- ATLFAST output in Root (TObjects) -- Pile-up-- Link MC truth - ATLFAST -- Reconstruction output in Obj., Root-- EDM (e.g. G3/DICE , G4 input to ATHENA)

Ntuple-like

ZEBRA

Page 13: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 13

DC0 G4 Robustness Test

Test planTwo kinds of tests:

A ‘large-N’ generation with the ATLAS detector geometry• Detailed geometry for the muon system (input from AMDB)• A crude geometry for InnerDetector and Calorimeter

A ‘large-N’ generation with a test beam geometry• TileCal - Test beam for electromagnetic interactions

Physics processesHiggs -> 4 muons (by Pythia) <---- Main targetMinimum bias event <---- if possible

Page 14: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 14

DC0 G4 Robustness Test Expected data size and CPU required (Only for ATLAS detector geometry)

per event 1,000 events4-vectors database ~ 50 KB ~ 50 MB

Hits/Hit-collections ~ 1.5 MB ~ 1.5 GB database(See the note below for these numbers)

CPU ~ 60 sec ~ 17 hours (Pentium III, 800MHz)[Note] Not the final number. It includes a safety factor to reserve extra disk space.

Required resources (Only for ATLAS detector geometry)PC farm ~ 10 CPUs ( 5 machines with dual processors)Disk space ~ 155 GBProcess period ~ 1 week

Page 15: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 15

Data management It is a key issue Evaluation of more than one technology is part of

DC1Infrastructure has to be put in place:

For Objectivity & ROOT I/O• Software, hardware, tools to manage the data

– creation, replication, distribution, …

Tools are needed to run the production“bookkeeping” , “cataloguing” , “job submission”…

We intend to use as much as possible GRID tools• Magda for DC0

Page 16: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 16

DC1-HLT - CPU Number of events

Time per event sec SI95

Total time Sec SI95

Total timeHoursSI95

simulation107 3000

3 * 1010 107

reconstruction 107 640

6.4 * 109 2 * 106

Based on experience from Physics TDR

Page 17: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 17

DC1-HLT - dataNumber of events

Event sizeMB

Total size GB

Total sizeTB

simulation107 2 20000 20

reconstruction 107 0.5 5000 5

Page 18: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 18

DC1-HLT data with pile-up

L Number of events

Event size MB

Total size GB Total size TB

2 x 1033 1.5 x 106 (1) 2.6(2) 4.7

40007000

47

1034 1.5 x 106 (1) 6.5(2) 17.5

1000026000

1026

In addition to ‘simulated’ data, assuming ‘filtering’ after simulation (~14% of the events kept).

- (1) keeping only ‘digits’

- (2) keeping ‘digits’ and ‘hits’

Page 19: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 19

Ramp-up scenario @ CERN

050

100150200250300350400

7 11 16 20 24 25 26

CPU

Week in 2002

Page 20: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 20

Some considerations (1):We consider that LCG is crucial for our successWe agree to have as much as possible common

projects under the control of the projectWe think that a high priority should be given on

the development of the shared Tier0 & shared Tier1 centers

We are interested in “cross-grid” projects Obviously to avoid duplication of work We consider as very important the interoperability between US and EU Grid (Magda as a first use case)

Page 21: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 21

Some considerations (2):We would like to set up a really distributed

production system (simulation, reconstruction, analysis) making use, already for DC1, of the GRID tools (especially those of EU-DataGrid Release 1)

The organization of the operation of the infrastructure should be defined and put in place

We need a ‘stable’ environment during the data challenges and a clear picture of the available resources as soon as possible

Page 22: ATLAS Data Challenges

G. Poulard LCG-PEB 12 December 2001 22

Some considerations (3):We consider that the discussion on the

common persistence technology should start as soon as possible under the umbrella of the project

We think that other common items (eg. dictionary languages, release tools, etc) are worthwhile (not with the same priority) but we must ask what is desirable and what is necessary

We think that the plan for the simulation should be understood