Top Banner
16-20 October 200 0 ACAT2000 - FNAL L.M.Barone – INFN Rome Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università di Roma & INFN
28

L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Management of Large Scale Data Productions for the CMS

Experiment

Presented by

L.M.Barone

Università di Roma & INFN

Page 2: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

The Framework

• The CMS experiment is producing a large amount of MC data for the development of High Level Trigger algorithms (HLT) for fast data reduction at LHC

• Current production is half traditional (Pythia + CMSIM/Geant3) half OO (ORCA using Objectivity/DB)

Page 3: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

The Problem

• Data size~ 106 - 107 events, 1 MB/ev~ 104 files (typically 500 evts/file)

• Resource dispersionmany production sitesCERN,FNAL,Caltech, INFN etc.

Dealing with actual MC productions and notwith 2005 data taking

Page 4: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

The Problem (cont’d)

• Data Relocationdata produced in site A are storedcentrally (CERN); site B may need a fraction of them;combinatorics increasing

• Objectivity/DB does not make life easier(but the problem would exist anyway)

Page 5: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

ORCA Production 2000Signal

Zebra fileswith HITS

ORCADigitization

(merge signal and MB)

ObjectivityDatabase

HEPEVTntuples

CMSIM

HLT AlgorithmsNew

ReconstructedObjects

MC

P

rod

.O

RC

A P

rod

.

HLT G

rp

Data

bases

ORCAooHit

FormatterObjectivityDatabase

MB

ObjectivityDatabase

Catalog import

Catalog import

ObjectivityDatabaseObjectivityDatabaseytivitcejbOytivitcejbOesabataDesabataD

Mirro

red

Db

’s(U

S, R

ussia

, Ita

ly..)

Page 6: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

The Old Days

• Question: how was it done before ?

A mix of ad hoc scripts/programs with a lot of manual intervention... but the problem was smaller and less dispersed

Page 7: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Requirements for a Solution

• Solution must be as automatic as possible decrease manpower

• Tools should be independent from data type and from site

• Network traffic should be optimized (or minimized ?)

• Users need complete information on data location

Page 8: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Present Status

• Job creation is managed by a variety of scripts in different sites

• Job submission again goes through diverse methods, from UNIX commands to LSF or Condor

• File transfer has been managed up to now by Perl scripts

not generic, not site independent

Page 9: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Present Status (cont’d)

• The autumn 2000 production round is a trial towards standardization same layout (OS, installation) same scripts (T.Wildish) for non Objy data transfer first use of GRID tools (see talk by

A.Samar) validation procedure for production

sites

Page 10: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Collateral Activities

• Linux + CMS software automatic installation kit (INFN)

• Globus installation kit (INFN)

• Production monitoring tools with Web interface

Page 11: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

What is missing ?

• Scripts and tools are still too specific and not robust enough need practice on this scale

• Information service needs a clear definition in our context and then an effective implementation (see later)

• File replication management is just appearing and needs careful evaluation

Page 12: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Ideas for Replica Management

• A case study with Objectivity/DB(thanks to C.Grandi Bologna,INFN) – can be extended to any kind of file

Page 13: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Cloning federations

• Cloned federations have a local catalog (boot file) – It is possible to manage each of them in an

independent way. Some databases may be attached (or exist) only in one site

– “Manual work” is needed to keep the schemas synchronized (this is not the key point today...)

Page 14: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

CERN CERN FDFD

DB1DB1 DB2DB2 DB3DB3 DBnDBn

CERN BootCERN Boot

RC1RC1FDFD

RC1 BootRC1 Boot

RC2RC2FDFD

RC2 BootRC2 Boot

Cloning federations

Clone FDClone FD

DB_aDB_a

DB_bDB_b

Page 15: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Productions

• Using a DB-id pre-allocation system it is possible to produce databases at RCs which can then be exported to other sites– A notification system is needed to inform

other sites when a database is completed– This is today accomplished by GDMP using

a publish-subscribe mechanism

Page 16: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Productions

• When a site receives notification, it can:– ooattachdb to the remote site DB– copy the DB and ooattachdb it locally– ignore it

Page 17: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Productions

CERN CERN FDFD

DB1DB1 DB2DB2 DB3DB3 DBnDBn

CERN BootCERN Boot

RC1RC1FDFD

RC1 BootRC1 Boot

RC2RC2FDFD

RC2 BootRC2 Boot

DBn+1DBn+1

DBn+mDBn+m

DBn+m+1DBn+m+1

DBn+m+kDBn+m+k

Page 18: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Analysis

• In each site a complete catalog with the location of all the datasets is needed. Some DBs are local and some are remote

• In case more copies of a DB are available it would be nice to have in the local catalog the closest one (NWS)

Page 19: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Information service

• Create an Information Service with information about all the replicas of the databases (GIS ?)

• In each RC there is a reference catalog which is updated taking into account the available replicas

• It is even possible to have a catalog created on-the-fly only for the datasets needed by a job

Page 20: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Analysis

CERN CERN FDFD

DB1DB1 DB2DB2 DB3DB3 DBnDBn

CERN BootCERN Boot

RC1RC1FDFD

RC1 BootRC1 Boot

RC2RC2FDFD

RC2 BootRC2 Boot

DBn+1DBn+1

DBn+mDBn+m

DBn+m+1DBn+m+1

DBn+m+kDBn+m+kDBn+m+kDBn+m+k

DBn+m+1DBn+m+1DBn+mDBn+m

DBn+1DBn+1

Page 21: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Logical vs Physical Datasets

• Each dataset is composed by one or more databases– datasets are managed by application-sw

• Each DB is uniquely identified by a DBid– DBid assignment is a logical-db creation

• The physical-db is the file– zero, one or more instancies

• The IS manages the link between a dataset, its logical-dbs and its physical-dbs

Page 22: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Logical vs Physical Datasets

Dataset: H 2

Dataset: H 2e

Hmm.1.hits.DB

Hmm.2.hits.DB

Hmm.3.hits.DB

Hee.1.hits.DB

id=12345

id=12346

id=12347

id=5678

Hee.2.hits.DB id=5679

Hee.3.hits.DB id=5680

pccms1.bo.infn.it::/data1/Hmm1.hits.DB

shift23.cern.ch::/db45/Hmm1.hits.DB

pccms1.bo.infn.it::/data1/Hmm2.hits.DB

shift23.cern.ch::/db45/Hmm2.hits.DB

shift23.cern.ch::/db45/Hmm3.hits.DB

pccms5.roma1.infn.it::/data/Hee1.hits.DB

shift49.cern.ch::/db123/Hee1.hits.DB

pccms5.roma1.infn.it::/data/Hee2.hits.DB

shift49.cern.ch::/db123/Hee2.hits.DB

shift49.cern.ch::/db123/Hee3.hits.DB

pccms5.roma1.infn.it::/data/Hee3.hits.DB

pccms3.pd.infn.it::/data3/Hmm2.hits.DB

Page 23: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Database creation

• In each production site we have:– a production federation including incomplete

databases– a reference federation with only complete

databases (both local and remote ones)

• When a DB is completed it is attached to the site reference federation

• The IS monitors the reference federations of all the sites and updates the database list

Page 24: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

RC1RC1ProdProd

DB4 DB4

RC1RC1RefRef

pc.rc1.net

CERN CERN FDFD

DB1DB1 DB2DB2 DB3DB3

DB4DB4

0001 DB1.DB shift.cern.ch::/shift/data0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data0005

Database creation

DB5 DB5 DB5DB5

DB5DB5

0001 DB1.DB shift.cern.ch::/shift/data0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data0005 DB5.DB pc.rc1.net::/pc/data

0001 DB1.DB shift.cern.ch::/shift/data0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data0005 DB5.db pc.rc1.net::/ps.data shift.cern.ch::/shift/data

shift.cern.ch

Page 25: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Replica Management

• In case of multiple copies of the same DB each site may choose which copy to use:– it should be possible to update the reference

federation at given times– it should be possible to create on-the-fly a

mini-catalog only with information about the datasets requested by a job

• this kind of operation is managed by application-sw (e.g. ORCA)

Page 26: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

0001 DB1.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data

0001 DB1.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data0002 DB2.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0003 DB3.DB shift.cern.ch::/shift/data

Replica Management

CERN CERN FDFD

DB1DB1DB2DB2

DB3DB3

BOBORefRef

shift.cern.ch pc1.bo.infn.it

PDPDRefRef

pc1.pd.infn.it

DB1DB1

DB2DB2

Page 27: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Summary of the Case Study

• Basic functionalities of a Replica Manager for production are already implemented in GDMP

• The use of an Information Server would allow easy synchronization of federations and optimized data access during analysis

• The same functionalities offered by the Objectivity/DB catalog may be implemented for other kind of files

Page 28: L.M.Barone – INFN Rome 16-20 October 2000 ACAT2000 - FNAL Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università.

16-20 October 2000ACAT2000 - FNALL.M.Barone – INFN Rome

Conclusions (?)

Globus and the various GRID projects try

to address the issue of Large Scale

distributed data access

Their effectiveness is still to be proven

The problem again is not the software,

it is the organization