David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN
David Adams
ATLAS
ATLAS Distributed Analysis Plans
David AdamsBNL
December 2, 2003
ATLAS software workshopCERN
ADA Plans ATLAS SW – Grid session December 2, 2003 2
David Adams
ATLAS
Contents
DAC mandate
Scope
Strategy
Scenario for first release
Plans for the first release
GANGA status
DIAL status
Deliverables for the first release
Conclusions
ADA Plans ATLAS SW – Grid session December 2, 2003 3
David Adams
ATLAS
DAC MandateDistributed Analysis Coordinator
• Is responsible for coordinating the development of software tools for distributed analysis and their integration into the ATLAS software environment
• Start with the analysis of existing tools such as GANGA, DIAL, AtCom…
• Provide users with transparent access to metadata of different sorts as well as to event data in all stages of processing
• Participate actively in the definition of LCG projects such as ARDA
• Is a member of relevant LCG committees and working groups
ADA Plans ATLAS SW – Grid session December 2, 2003 4
David Adams
ATLAS
ScopeAnalysis (not necessarily distributed)
• Supports the manipulation and extraction of summary data (e.g. histograms) from any type of event data
– AOD, ESD, …
• Supports user-level production of event data– e.g. MC generation, simulation and reconstruction
Distributed analysis• Extends the extraction and production support to
include distributed processing and distributed data• Natural extension of non-distributed analysis• Easily invoked from any ATLAS analysis environment
– including Python, ROOT, command line– easily ported to any future environment (e.g. JAS)
ADA Plans ATLAS SW – Grid session December 2, 2003 5
David Adams
ATLAS
StrategyImplement DA as a collection of grid services
• As described in ARDA document
• Use ARDA components where possible
• Add missing and ATLAS-specific pieces
Provide clients for ATLAS analysis environments• Python, ROOT, command line
Regular releases• Perhaps for each SW week and ATLAS X.0
• Provide useful tool
• Demonstrate functionality
• Expand functionality with each release
ADA Plans ATLAS SW – Grid session December 2, 2003 6
David Adams
ATLAS
Strategy (cont)Look to common projects for most of the pieces
• ARDA, GANGA, DIAL, …
• Share as much as possible with ATLAS production– Also distributed
– Similar interfaces and code for bulk and user-level production
• ADA (ATLAS distributed analysis) must identify these pieces and tie them together
Deployment• ADA services must be deployed at relevant sites
• Provide testing and monitoring of these services
• Work with facilities to deploy and maintain– Also to develop facility-specific features
ADA Plans ATLAS SW – Grid session December 2, 2003 7
David Adams
ATLAS
Scenario for first releaseHere is a scenario for user interaction with the first release of ADA
• Authenticate – Proxy from authentication service
• Choose application– E.g. PAW to process DC1 ntuples– Or Athena to process DC2 AOD– Also Athena reconstruction?
• Define task– Analysis: provide code to define and fill histograms– Production: athena job options, maybe code– Perhaps select starting point from provenance catalog
• Select input dataset– From dataset metadata catalog service
ADA Plans ATLAS SW – Grid session December 2, 2003 8
David Adams
ATLAS
Scenario for first release (cont)• Create job configuration
– Response time, role, …
• Locate processing service
• Submit job– Application, task, dataset, configuration
• While job is running– Query service for status and partial results
– Examine partial results (e.g. histograms)
– Kill job if results are bad
• When job is finished– Examine complete result
– Modify task or select new dataset and repeat
ADA Plans ATLAS SW – Grid session December 2, 2003 9
David Adams
ATLAS
Plan for first releaseSchedule
• Implement and deploy in advance of March 2004 software workshop
• Provide starting point for discussion at that meeting
Building blocks• Code and developers in GANGA and DIAL
– Following sections summarize current status
• LCG project following from ARDA– Just starting; so don’t wait but
– Stay closely coupled to that project
• Open to contributions (especially effort) from others
ADA Plans ATLAS SW – Grid session December 2, 2003 10
David Adams
ATLAS
Ganga: status update (1)Work since September software week has focused on refactoring, to create a system that is more modular and more flexible
• In short-term (next 1-2 months), changes will mainly affect developers
• In longer term (in time for DC2) will see significant gains for users: improvements in functionality, ease of use and stability
Have introduced PyBus software bus, developed by W. Lavrijsen with contributions from K. Harrison
• Allows association between module and logical name to be made at run time
• Makes system more configurable: supports ATLAS/LHCb customizations and user add-ons
Moving to XML-based job description• Mechanics have been worked out, but still defining details of XML
schema• Aim to have job description consistent with DIAL (and others?)
ADA Plans ATLAS SW – Grid session December 2, 2003 11
David Adams
ATLAS
Ganga: status update (2)Job-options editor (JOE) is evolving to become a more powerful, standalone component, which will be loaded by Ganga
• Assist user in the creation/modification of Gaudi/Athena job options by presenting the user with a hierarchical view of available options files and helping the user with value entry
• In process of creating Job Options Information Resource (JOIR) database
– JOIR database of job options will facilitate validation by providing valid ranges, valid option choices, and option descriptions
– Considering suggestions from LHCb for improving automated job-option extraction
ADA Plans ATLAS SW – Grid session December 2, 2003 12
David Adams
ATLAS
Ganga: job definition and submission to LCG
Application
SelectApplication
PrepareSandbox
PrepareAlgFlowOptions
and DLLs
AlgorithmFlowEdit
AlgorithmFlow
AlgParamOptions
SelectDatasets
EditAlgParamOptions
DatasetOptions
AlgFlowOptions
SandboxDLLs
JobOptionsFileCatalogue slice
Submit Job
Metadatacatalogue
AlgOptionscatalogue
DLLs
Filecatalogue
ADA Plans ATLAS SW – Grid session December 2, 2003 13
David Adams
ATLAS
Ganga: future plans
Plans well defined up to March 2004• Work towards Ganga/DIAL integration within ADA
• Enable job submission to LCG
• Release improved version of JOE
• Include interface to Pacman 3 for package installation– Informal Pacman workshop pencilled in for January 2004
• More tentatively, looking at possibilities for interfacing to Atlantis for displaying event data
Request for GridPP funding beyond December 2004 requires ATLAS/LHCb work plan for Ganga up to September 2007
• Need to ensure ATLAS priorities are taken into account
ADA Plans ATLAS SW – Grid session December 2, 2003 14
David Adams
ATLAS
DIAL statusRelease 0.60
• Made in November
• Has application to process combined ntuple datasets with PAW
• Command line and ROOT clients
• Processing can be done by instantiating a private scheduler or by contacting a persistent web service
• Dataset catalogs have been implemented– DSC – dataset selection catalog
– DRC – dataset replica catalog
– Datasets created for all DC1 combined ntuples
ADA Plans ATLAS SW – Grid session December 2, 2003 15
David Adams
ATLAS
DIAL status (cont)High-level JDL
• DIAL envisions a hierarchy of schedulers
• Interface to these schedulers constitutes a high-level JDL (job definition language)
– Job submission, monitoring and gathering of results
– See figure
• Would like to standardize this JDL so schedulers can be shared between projects and experiments
– See figure
ADA Plans ATLAS SW – Grid session December 2, 2003 16
David Adams
ATLAS
UserAnalysis
Job 1
Job 2
Application Task
Dataset 1
Scheduler
1. Create or locate
2. select 3. Create or select
4. select
5. submit(app,tsk,ds)
6. splitDataset
Dataset 2
7. create
e.g. ROOT
e.g. athena
Result9. fill
10. gather
Result 9. fill
Result CodeComponents of DIAL
high-level JDL
ADA Plans ATLAS SW – Grid session December 2, 2003 17
David Adams
ATLAS
DIAL status: sharing via JDL
P I/S E A L
G A N G A
R O O T
J A S
C o m m a nd line
H ighle v e lJ D L
P R O O FG A N G A -LC G
C o nd o r-G
G C E /C him e raS T A RJ D A P (J A S )
D IA L-inte ra c tiv e
An a ly s is e n v iro n m e n ts S ch e d u le r s
G r idse rv ice s
A T LA S p ro d u c tio n
P lu g -inclie n ts
P o rta l/s w itc h
ADA Plans ATLAS SW – Grid session December 2, 2003 18
David Adams
ATLAS
Deliverables for first releaseComments
• Goal is to support the scenario outlined earlier
• Build on current GANGA and DIAL implementations and plans
• Emergence of ARDA project may change plans
• Coordination with ATLAS production may also lead to changes
• Add more tasks if more ideas and effort are found
ADA Plans ATLAS SW – Grid session December 2, 2003 19
David Adams
ATLAS
Deliverables for first release (cont)Authentication service
• GSI based
• Support both EDG and US certificates
High-level JDL• Start from current DIAL interface
• Incorporate ideas from PPDG, ARDA, …– If available in time
• This defines the interface (WSDL) for the following analysis and production services
ADA Plans ATLAS SW – Grid session December 2, 2003 20
David Adams
ATLAS
Deliverables for first release (cont)Interactive analysis service
• Build on existing DIAL scheduler service– Add authentication
– Deploy as web or grid service
• Client schedulers– Keep command line and ROOT clients
– Add Python (GANGA) client
> Possibly with associated GUI
• Application/task/dataset– Keep PAW with fortran task to fill histograms from HBOOK
combined ntuples
– Add ROOT with C++ task to fill from ROOT ntuples?
– Add athena with C++ task to fill from AOD?
ADA Plans ATLAS SW – Grid session December 2, 2003 21
David Adams
ATLAS
Deliverables for first release (cont)User-level batch production service?
• Start from GANGA LCG submission service– Add high-level JDL– Requires GANGA to support client-server
• Other candidates for production services:– GCE/Chimera– DIAL– New ATLAS production model– Switch to choose between these
• Supported production tasks– Reconstruction– Simulation?– Event generation?– Fill histograms from AOD?
ADA Plans ATLAS SW – Grid session December 2, 2003 22
David Adams
ATLAS
Deliverables for first release (cont)Dataset and file catalog services
• Functionality:– Means for users to select an input dataset
– Means for production to register output dataset
– Means for system (e.g. DIAL scheduler) to turn dataset specification into accessible physical files
• Start from AMI and DIAL
• Need file catalog and replication services– Magda, RLS1, RLS2, …
ADA Plans ATLAS SW – Grid session December 2, 2003 23
David Adams
ATLAS
ConclusionsDistributed analysis is a new project for ATLAS
Philosophy• Tightly integrate with non-distributed analysis
• Be neutraluse client-server mechanism to support different analysis environments and different processing systems
• Be flexiblecapabilities (and hence demands) will change as technology evolves
• Be responsive to evolving user requirements
• Build on existing ideas and projects including GANGA, DIAL, ATLAS production and ARDA
ADA Plans ATLAS SW – Grid session December 2, 2003 24
David Adams
ATLAS
Conclusions (cont)Plan of action
• Define interface (high-level JDL)
• Quickly implement services for analysis, user-level production and dataset catalogs
• Expose to users, learn lessons and re-implement
• Repeat
More information• Web site coming soon
• Mail to [email protected]