This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
1
Alexei Klimentov
Brookhaven National Laboratory
ATLAS Distributed ComputingComputing Model, Data Management, Production System, Distributed Analysis, Information System, Monitoring
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
2
Introduction
The title that Vladimir gave me cannot be done in
20 mins.
I’ll talk about Distributed Computing
Components, but I am certainly biased as any
Operations person.
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
3
ATLAS Collaboration
6 Continents 37 Countries 169 Institutions 2800 Physicists 700 Students>1000 Technical and support staff
Albany, Alberta, NIKHEF Amsterdam, Ankara, LAPP Annecy, Argonne NL, Arizona, UT Arlington, Athens, NTU Athens, Baku,
IFAE Barcelona, Belgrade, Bergen, Berkeley LBL and UC, HU Berlin, Bern, Birmingham, Bogotá, Bologna, Bonn, Boston, Brandeis,
Bratislava/SAS Kosice, Brookhaven NL, Buenos Aires, Bucharest, Cambridge, Carleton, Casablanca/Rabat, CERN, Chinese Cluster, Chicago,
Job brokering is done by the PanDA Service (bamboo) according to input data and site availability
A.Read, Mar09
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
Data Processing Cycle
Data processing at CERN (Tier-0 processing) First-pass processing of the primary event stream
The derived datasets (ESD, AOD, DPD, TAG) are distributed from the Tier-0 to the Tier-1s
RAW data (received from Event Filter Farm) are exported within 24h. This is why first-pass processing can be done by Tier-1s (though this facility was not used during LHC beam and cosmic ray runs)
Data reprocessing at Tier-1s 10 Tier-1 centers world wide. Each takes a subset of RAW data
(Tier-1 shares from 5% to 25%), ATLAS production facilities at CERN can be used in case of emergency.
Each Tier-1 reprocessed its share of RAW data. The derived datasets are distributed ATLAS-wide.
13
See P.Nevski’ talk NEC2009, LHC Computing
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009ATLAS Data Simulation and
Reprocessing
14
RunningJobs
ReprocessingSep08-Sep09
Production System in continuous operations10 clouds use LFC as file catalog and Panda as jobs executorCPUs are under utilized in average, peak rate 33kjobs/dayProdSys can produce 100 TB/week of MCAverage walltime efficiency is over 90%System does : Data simulation and data reprocessing
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
ATLAS Distributed Analysis
15
Probably the most important area at this pointIt depends on a functional data management and job management system
Two widely used distributed analysis tools (Ganga and pathena)They capture the great majority of usersWe expect the usage to grow substantially in the preparation and especially in the 2009/10 run Present/traditional use cases: AOD/DPD analysis clearly very important But also run over selected RAW (for detector debugging, studying etc…)
J.Elmsheuser Sep09ATLAS jobs go to the data
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009ATLAS Grid Information System (AGIS)
The overall purpose of ATLAS Grid Information
System is to store and to expose static, dynamic
and configuration parameters needed by ATLAS
Distributed Computing (ADC) applications. AGIS is
a database oriented system.
The first AGIS proposal from G.Poulard. The pioneering work of
R.Pezoa and R.Rocha in summer 2008, and definition of basic
design principles implemented in ‘dashboards’. Now development
is leaded by ATLAS BINP group.
Today’s situation :
various configuration parameters and information about available resources, services and its status and properties are extracted from different sources or they are defined in different configuration files (sometimes Grid information is hard coded in application programs).
16
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
AGIS Architecture Overview
System architecture should allow to add new classes of
information or sites configuration parameters, to
reconfigure ATLAS clouds topology and production
queues, to add and to modify users information.
AGIS is ORACLE based information system.
AGIS stores as read-only data extracted from the external
databases (f.e, OIM, GOCDB, BDII) and ADC configuration
information which can be modified.
The synchronization of AGIS content with the external
sources will be done by agents (data providers), agents
shboard/wiki/WorkInProgresso HTTP for transporto JSON for data serialization
• Attempt to have a common (single) dashboard client applicationo Built using the Google Web
Toolkit (GWT)• Source data exposed directly from
its source (like the Panda database)o Avoid aggregation databases
like we have todayo Server side technology left
open
Alexei Klimentov : ATLAS Distributed Computing
NEC2009 Varna- 10 September 2009
Summary & Conclusions
The ATLAS Collaboration has developed a set of software and middleware tools that enable access to data for physics analysis purposes to all members of the collaboration, independently of their geographical location.
Main building blocks of this infrastructure are
the Distributed Data Management system;
the Ganga and pathena for distributed analysis on the Grid.
Production System to (re)process and to simulate ATLAS data
Almost all required functionalities are already provided; and extensively used for simulated, as well as real data from beam and cosmic ray events.
Grid Information System technical proposal is finalized and the
system must be in production by the end of the year
Monitoring system standardization is in progress 22