Top Banner
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's) Athena & the Grid Architectural View Craig E. Tull HCG/NERSC/LBNL ATLAS/LHCb/GridPP Workshop Cosener's House - May 23, 2002
29

Athena & the Grid Architectural View

Feb 02, 2016

Download

Documents

nedra

Athena & the Grid Architectural View. Craig E. Tull HCG/NERSC/LBNL ATLAS/LHCb/GridPP Workshop Cosener's House - May 23, 2002. What this talk is:. What this talk is not: Another presentation of GRAPPA. See Rob's talk of yesterday. What this talk is: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Athena & the GridArchitectural View

Craig E. Tull

HCG/NERSC/LBNL

ATLAS/LHCb/GridPP WorkshopCosener's House - May 23, 2002

Page 2: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

What this talk is:

• What this talk is not:—Another presentation of GRAPPA.—See Rob's talk of yesterday.

• What this talk is:—An ATLAS perspective on the view of the Grid

from the Athena/Gaudi Framework.—A seat of the pants distillation of some

impressions from this workshop's presentations.

—Food for thought and discussions in this afternoon's session.

—… and slightly Random.

Page 3: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Athena/GAUDI Architecture

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

Transient Event Store

Detec. DataService

PersistencyService

DataFiles

Transient Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverter

Page 4: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Grid vs. Athena Services

Page 5: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Bigger Picture

Converter

Algorithm

Event DataService

PersistencyService

AlgorithmAlgorithm

Transient

Event Store

Detec. DataService

PersistencyService

Transient

Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices Histogram

ServicePersistency

Service

Transient

Histogram Store

ApplicationManager

ConverterConverterEventSelector

Analysis Program

OSMass

Storage

EventDatabasePDG

Database

DataSetDB

Other

MonitoringService

HistoPresenter

Other

JobService

Config.Service

Page 6: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Bucket of Cold Water

Page 7: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Grid: The new paradigm

• The Grid offers a vision of computer resources that are: Distributed, Heterogeneous, Robust, and Integrated.

• Some concepts are qualitatively new.—Resource Discovery, Virtual Data, Reserved QoS

• Some concepts are quantitatively "new".—Number of sites/jobs/nodes/users.

• Some concepts are old wine in new skins.—Distributed processing

• Some are natural & "obvious" extensions of old concepts.—Unix GroupsVO, LFNs

Page 8: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Grid Projects: Integrated?

• We've heard here about:—GANGA, GRAPPA, BOSS, AliEn—CMT, Pacman, Packman, DAR—WP1 JSS, GriPhyN Planner—Magda, WP2 Replica Service—NetLogger, Prophesy, GMA, R-GMA, GridView,

Ganglia—VDL/IVDL, WP1 JDL, Condor ClassAds—EDG, PPDG, GriPhyN, GridPP, InfoGrid,

CrossGrid, GGF, Monarch,…• How do we take advantage of Grid capability while

protecting ourselves from potential duplication/conflicts of roles & responsibility?

Page 9: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Grid: Ready for PrimeTime?

• CHEP'98 -First HENP Grid (Clipper) Talk—#237 Directions and Issues for High Data Rate Wide Area

Network Environments• Many Grid projects are CS R&D. But production grids do exist

(eg. NASA InfoGrid) and indications are that Grid computing is gaining momentum in non-HENP (ie. mainstream) world.

• IBM/Globus Partnership - 12 developers

Page 10: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

ATLAS SW & Grid Projects

• The Grid does now offer advantages & functionality. More will certainly come.

• We cannot afford to wait to be handed the solution.• APIs to Grid services need to be compatible or adapted

with Athena Services• ATLAS interests/requirements need to be communicated to

Grid researchers/developers & DOE/NFS.• Timelines for ATLAS need to be defined.

—Grid timeline is not the same as some others—FTE resources avail. are critical input

• Much current work concentrates on issues like:—Data Volume, Data Set Distribution, ATLAS Resources

(Disk, CPU, HMS), Network Connectivity, $$$, FTE, etc.• Distributed Computing Model must be defined.• Control Framework

—Grid-compatible / Grid-aware, but not Grid-dependent

Page 11: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Grid aware, but not dependent.

• Interface Technologies—Programmatic API (eg. C, C++, etc)—Scripting as Glue ala Stallman (eg. Python)—JobOptions.{txt,py}—Sandbox—Others?

• eg. SOAP, CORBA, RMI, DCOM, .NET, etc.

• International Standards would help!—Global Grid Forum

• Staged approach is called for.—Simple Batch model to begin. Add simple Grid

functionality via Services. Continual feedback.

Page 12: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Athena/Grid Interface

• For the programmatic interface to Grid services, we are thinking in terms of Gaudi services to capture and present the functionality of the grid services (not necessarily a one-to-one mapping, BTW).

• I think it is important at this stage (maybe forever) to insure that the framework is "grid-capable" without being "grid-dependent". IE- We should always be able to run without grid services available.—Gaudi's component architecture makes this

approach to using the grid quite natural.—How do we switch between Grid/non-Grid?

Page 13: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Jul’01: PSEUDOCODE FOR ATLAS SHORT TERM UC01

Logical File NameLFN = "lfn://"hostname"/"any_string

Physical File NamePFN = "pfn://"hostname"/"path

Transfer File NameTFN = "gridftp://"PFN_hostname"/path

JDLInputData = {LFN[]}OutputSE = host.domain.name

Worker NodeLFN[] = WP1.LFNList()for (i=0;i<LFN.list;i++){

PFN[] = ReplicaCatalog.getPhysicalFileNames(LFN[i])j = Athena.eventSelectonSrv.determineClosestPF(PFN[])localFile = GDMP.makeLocal(PFN[j],OutputSE)Athena.eventSelectionSrv.open(localFile)

}PFN[] = getPhysicalFileNames(LFN):PFN = getBestPhysicalFileName(PFN[], String[] protocols)TFN = getTransportFileName(PFN, String protocol)filename = getPosixFileName(TFN)

Page 14: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

WP2: Replica Manager API(old: pre-SFN terminology)

• addPhysicalFileName(LogicalFileName, PhysicalFileName)• deletePhysicalFileName(LogicalFileName, PhysicalFileName)• SFN = getPhysicalFileNames(LogicalFileName)• copy(PhysicalFileName source, PhysicalFileName destination,

String protocol)• copyAndAddPhysicalFile(PhysicalFileName source,

PhysicalFileName destination, LogicalFileName lfn, String protocol)

• generatePhysicalFileName(LogicalFileName filename, PhysicalFileNamePattern)

• estimateCostForCopy(PhysicalFileName source, PhysicalFileName destination, String protocol)

• SFN = getLocationOfBestReplica (LogicalFileName)• getBestPhysicalFileName (PhysicalFileNameList, ProtocolList)• getTransportFileName (PhysicalFileName, Protocol)

Page 15: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Athena Distributed Instrumentation

• Part of SuperComputing 2002 ATLAS demo• IMonitorSvc IChronoStatSvc extension?

— Abstract application monitoring service.• Prophesy (http://prophesy.mcs.anl.gov/)

— An Infrastructure for Analyzing & Modeling the Performance of Parallel & Distributed Applications— Normally a Parse & auto-instrument approach (C & FORTRAN).

• NetLogger (http://www-didc.lbl.gov/NetLogger/)— End-to-End Monitoring & Analysis

of Distributed Systems— C, C++, Java, Python, Perl,

Tcl APIs— Web Service Activation

Page 16: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

WP1: Sandbox

• Working area (input & output) replicated on each CE to which Grid job is submitted.—Very convenient & natural.

• My Concerns:—Requires network access (with associated

privileges) to all CEs on Grid.• Could be a huge security issue with local

administrators.

—Not (yet) coordinated with WP2 services.—Sandbox contents not customizable to local

(CE/SE/PFN) environment.—Temptation to Abuse (not for data files)

Page 17: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Grid System

Planner

ATLAS

planner

JDL

JobOptions

GDBOutput fragment

Job

Physical

File

GDB

input

GDB Magda

Sandbox

WP2

Rep Mgr

Specify input

Logical filenames

Register

output

WP1

JSS

Page 18: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

ATLAS SW & the Grid

• What are the implications of a distributed computing model and grids for:

• The database domain?—Extensive in almost any case

• The control framework?—Depends upon the model (e.g., distributed data

sources versus distributing executables versus distributed execution)

• Other ATLAS software infrastructure?—eg. Build & install tools & kits

Page 19: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Distributed Processing Models

• Batch-like Processing (ala WP1)• Distributed Single Event (MPP)• Client-Server (interactive)• WAN Data Access (AMS, Clipper)• File Transfer and Local Processing (GDMP)• Agent-based Processing (distributed control)• Check-Point & Migrate (save & restore)• Scatter & Gather (parallel events)

• Move the data or move the executable?—No experiment is planning to write PetaBytes

of Code!

Page 20: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

ATLAS Distributed Processing Model

• At this point, it is still not clear what the final ATLAS distributed computing model will be. Although newer ideas like Agent-based Processing have a great deal of appeal, they are as yet unproven in a large-scale production environment.

• A conservative approach would be some combination of Batch-like Processing and File Transfer and Local Processing for batch jobs, with perhaps a Client-Server or Scatter-Gather approach for interactive/analysis jobs.—PPDG CS-11 - Interfacing and Integrating

Interactive Data Analysis Tools with the Grid and Identifying Common Components and Services

Page 21: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Data Access Patterns

• Data access patterns of physics jobs also heavily influence our thinking about interacting with the Grid. It is likely that all possible data access patterns will be extant in ATLAS data processing at various stages in that processing.We may find that some data access patterns lend themselves to efficient use of the Grid much better than others.

• Data access patterns include:—Sequential Access (reconstruction)—Random Access (interactive analysis)—File/Data Set Driven (LFN-friendly)—Navigational Driven (OODB-like)—Query Driven (SQL/OQL/JDO/etc)

Page 22: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

DB Architectural Elements

• Events are write-once• Three capabilities to support optimization:

• Event sharing• Data sharing• Data placement (clustering)

• Therefore, different storage formats— Does not mean different technologies!— Different ways to represent events and sets of

events.— Possible because navigation is separated

from storage.— Examples…

ATLAS DataBase Architecture - Ed Frank

Page 23: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Architectural Motif- Extract & Transform

• Architecture will express many storage formats—Any job can read any of them without reconfiguration

• Can always extract events for transport, regardless of format—Cost depends upon the storage format

• Tier 0 assigned responsibility of keeping a copy of the data in a format such that extraction costs are affordable—Archival data format

• Can always transform (write) data into a new format—Store in format for local optimization

Page 24: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Extract and Transform

Site 1

Site 3Site 2

Transport & Install

Extract & transform

Just Extract

Transport, transform & Install

ATLAS DataBase Architecture - Ed Frank

Page 25: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Object Access vs File Access

• ATLAS (like others) is basing our Event Data Model (EDM) on a (transient) Object Data Model.—This transient model maps onto a persistent

Object Model (not necessarily 1-to-1)• We require users to think of objects in the

transient store at the Algorithm level.—Transient Data Store has data access proxy

concepts built in to read-in objects from persistency to TDS.

• Current Grid products heavily oriented towards LFN-like view of data. —Perfectly natural as this is the system-level

view of data & convenient unit for atomic data transfer across the network. (eg. FTP, gridFTP)

• BUT, if we want users to think objects, the object to LFN/PFN mapping has to be somewhere.

Page 26: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Ganga Senarios

• Scenario 1—User makes a "high-level" selection of data to process

and defines processing job.• "High-level" means based on event characteristics and not

on file or even identity.

—High-level event selection uses ATLAS Bookkeeping DataBase (similar to current LArC Bookkeeping data base or BNL's Magda) to select event & logical file identities.

—Construct JDL for WP1 using LFNs—Construct jobOptions.py using PFNs (w/ WP2)—Submit job(s) using JDL & jobOptions.py in sandbox.

• Scenario 2 - The same except jobOptions.py now contains LFNs. This requires the Replica Service API-enabled EvtSelector or ConversionSrv.

Page 27: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Observation about GUIs

• Several projects are promoting GUIs.—WP1, Grappa, AliEn, others.

• Independently written "native" GUIs are notoriously difficult to integrate/make coherent.

• Web-based GUIs are easier to integrate, but offer limited functionality.

Page 28: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Rule #1: Protect the User

• Real Data vs. Virtual Data• LFN vs. PFN/TFN/SFN• Grid Enabled vs. Standalone

• We do not want the user of the Framework to know or care about details like this.—Implies: Uniform, abstract access

to/specification of data sets (ie. if Real and Virtual Data are to be used).

—Dummy (non-Grid) implementations of Grid-enabled Services?

Page 29: Athena & the Grid Architectural View

[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)

Way Forward/Discussion

• Goal: Give direction to new hires funded by GridPP to ensure that their work has the widest applicability in both ATLAS & LHCb.

• Discussion Questions:—Data-File or Data-Object level access?—Heterogeneity - How much? (Client vs. Server)—Communication Protocols?—How to synchronize/coordinate?

• ATLAS world-wide & Large Active US effort• LHCb - no US component => more EDG-centric

—GAUDI/Athena - Where to draw the line?• Grid middleware/Svc Interfaces/Implementations

—Balance Short-term Usability vs. Long-term Functionality - Remember the mainstream.