Top Banner
Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory {aabramya, nmanukya}@mail.yerphi.am
19

Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Jan 18, 2016

Download

Documents

Toni

Storage Volume Freeing Service (SVFS). Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory {aabramya, nmanukya}@mail.yerphi.am. (General) formulation of the problem and solution. Problem: Replication of data files -> Repletion of SEs -> - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Armenuhi Abramyan, Narine Manukyan

ALICE team of A.I. Alikhanian National Scientific Laboratory{aabramya, nmanukya}@mail.yerphi.am

Page 2: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 20132

Solution: Regular (partial) cleaning of SEs by removing certain portion of replicasSolution: Regular (partial) cleaning of SEs by removing certain portion of replicas

Step one: Definition of data files (DFs) and data file sets (FSs), subjects to removal

Step two: Construction of the removal and cleaning general scheme

Problem: Replication of data files ->

Repletion of SEs ->Prevention of further inflow of files to these SEs

Problem: Replication of data files ->

Repletion of SEs ->Prevention of further inflow of files to these SEs

Step three: Determination of the order of the FSs’ removal (replicas killing queuing )

Page 3: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 20133

Page 4: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 20134

‘Centrally’ created DFs:

•<*>ESD<*>.root;•<*>AOD<*>.root;•<*>QA<*>.root; •OCDB.root

‘Centrally’ created DFs:

•<*>ESD<*>.root;•<*>AOD<*>.root;•<*>QA<*>.root; •OCDB.root

Volumes occupied by DFs in LHC12c directory:

LHC12c 286 TB (100.00 %)__________________________________________RAWS 205.0 TB (71.7 %)__________________________________________

ESDs 64.4 TB (22.5 %)AODs 13.0 TB (4.54 %)QAs 3.55 TB (1.23 %)OCDBs 0.03 TB (0.02 %)__________________________________________

PWG directories 0.02 TB (0.01%)

Volumes occupied by DFs in LHC12c directory:

LHC12c 286 TB (100.00 %)__________________________________________RAWS 205.0 TB (71.7 %)__________________________________________

ESDs 64.4 TB (22.5 %)AODs 13.0 TB (4.54 %)QAs 3.55 TB (1.23 %)OCDBs 0.03 TB (0.02 %)__________________________________________

PWG directories 0.02 TB (0.01%)•PWG-created root files(?)

•PWG-created root files(?)

Data File (DF) = any .root fileData File (DF) = any .root file

What about user-created root files(?). How much volume do they occupy?

Page 5: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 20135

Elementary File Sets (EFSs) = zip archives of DFs ( Σ of output .root files of each job). In AliEn, the data replication is done by these units.

Elementary File Sets (EFSs) = zip archives of DFs ( Σ of output .root files of each job). In AliEn, the data replication is done by these units.

Types of archives in AliEn:root_archive.zip = set of ESD | AOD | QA | PWG-created | user-created DFs

aod_archive.zip = set of AOD type DFs QA_archive.zip = set of QA related DFslog_archive | log_archive.zip = set of log files

Types of archives in AliEn:root_archive.zip = set of ESD | AOD | QA | PWG-created | user-created DFs

aod_archive.zip = set of AOD type DFs QA_archive.zip = set of QA related DFslog_archive | log_archive.zip = set of log files

Suggestions of CFSs containing the same types of EFSs:Run_AOD, Period_AOD - Full set of AOD type EFSs in Run, PeriodRun_ESD, Period_ESD - Full set of ESDs type EFS in Run, Period

Suggestions of CFSs containing the same types of EFSs:Run_AOD, Period_AOD - Full set of AOD type EFSs in Run, PeriodRun_ESD, Period_ESD - Full set of ESDs type EFS in Run, Period

To discuss and decide: The removal is convenient to be performed not on the level of the EFSs themselves, but in their (meaningful) combinations = Composite File Sets (CFSs)

To discuss and decide: The removal is convenient to be performed not on the level of the EFSs themselves, but in their (meaningful) combinations = Composite File Sets (CFSs)

Page 6: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 20136

Page 7: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 20137

1. How many custodial replicas (CMS parlance) of CFSs should be kept?

2. Should the number of custodial replicas be dependent on the type of

CFS (AOD,…. OCDB,..)?

3. Where to keep custodial replicas? ( on Tier1s? Tier2s with low CPU?)

1. How many custodial replicas (CMS parlance) of CFSs should be kept?

2. Should the number of custodial replicas be dependent on the type of

CFS (AOD,…. OCDB,..)?

3. Where to keep custodial replicas? ( on Tier1s? Tier2s with low CPU?)

Page 8: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 2013

Global removal = removal from AliEn, without taking care of the state of individual SEs.

In Global removal approach, any CFS can be removed entirely!In Global removal approach, any CFS can be removed entirely!

An ordered removal of replicas of the same archive.zip:1.One keeps custodial replica(s)2.One takes into account additional criteria related to the individual sites:

a) Repletion state of the sites where the replicas are placed;b) Number of the failures to access the replicas on the sites they are placed.………………

An ordered removal of replicas of the same archive.zip:1.One keeps custodial replica(s)2.One takes into account additional criteria related to the individual sites:

a) Repletion state of the sites where the replicas are placed;b) Number of the failures to access the replicas on the sites they are placed.………………

SE1SE1

1+1 CFSarchive1.ziparchive1.zip

SE2SE2

SE1SE1

SEN2SEN2

SEN1SEN1

SE3SE3

archiveN.ziparchiveN.zip

… 1 CFSarchive1.ziparchive1.zip SE2SE2

SEN2SEN2archiveN.ziparchiveN.zip

archive1.ziparchive1.zip

1+1 CFS SE2SE2

SE1SE1

SEN2SEN2

SEN1SEN1

archiveN.ziparchiveN.zip

A problem: Which one of the replicas of an archive.zip to remove?A problem: Which one of the replicas of an archive.zip to remove?

Page 9: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 20139

Since the EFSs entering CFS are dispersed by different SEs, an entire removal of a CFS is not possible in the local approach. Removal of a CFS can be made possible only in a correlated (with the other SEs) manner.No ordering of replicas of the same archive.zip for removal is needed. A single restriction- Custodial replicas are not removed.

The drawback of local approach!If we give priority to local freeing but not to the removal of an entire CFS,then only part of the EFSs entering a CFS can be removed. As a result, we will have ‘amputated’ CFSs.

Since the EFSs entering CFS are dispersed by different SEs, an entire removal of a CFS is not possible in the local approach. Removal of a CFS can be made possible only in a correlated (with the other SEs) manner.No ordering of replicas of the same archive.zip for removal is needed. A single restriction- Custodial replicas are not removed.

The drawback of local approach!If we give priority to local freeing but not to the removal of an entire CFS,then only part of the EFSs entering a CFS can be removed. As a result, we will have ‘amputated’ CFSs.

1+1 CFSarchive1.ziparchive1.zip

SE2SE2

SE1SE1

SEN2SEN2

SEN1SEN1

archiveN.ziparchiveN.zip

…1’ CFS archive1.ziparchive1.zip

…SE2SE2

1 CFSarchive1.ziparchive1.zip SE1SE1

SEN1SEN1archiveN.ziparchiveN.zip

(amputated CFS)

Page 10: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201310

Page 11: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201311

Dataset: The largest, homogeneous collection of files, related to a same trigger stream, processing campaign, output data format. The replica removal in CMS is done by datasets - Analog of CFS. (Note: One entire replica of a dataset is stored on one SE)

Dataset: The largest, homogeneous collection of files, related to a same trigger stream, processing campaign, output data format. The replica removal in CMS is done by datasets - Analog of CFS. (Note: One entire replica of a dataset is stored on one SE)

Analog of EFS - Block: The smallest unit in computing space which corresponds to a group of files likely to be accessed together. Files are grouped in blocks for bulk data management reasons

Analog of EFS - Block: The smallest unit in computing space which corresponds to a group of files likely to be accessed together. Files are grouped in blocks for bulk data management reasons

Replica removal in CMS: - Local removal scheme

Only Tier-2 sites are cleaned, with the following conditions: 1 The removal procedure starts if the site has less than 10% (or 15 TB for small

SEs) of free space.2. The removal procedure ends when 30 % (or 25 TB) of site space is freed.

Only Tier-2 sites are cleaned, with the following conditions: 1 The removal procedure starts if the site has less than 10% (or 15 TB for small

SEs) of free space.2. The removal procedure ends when 30 % (or 25 TB) of site space is freed.

Page 12: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201312

The ordering of CFSs for removal can be done using the Popularity notion. An example of popularity construction for a CFS is the summary number of calls to Data Files (summary number of occurrences of their LFNs) entering that CFS.

Realistic verification of our models and final construction is possible only on the base of fine-grained monitoring data on EFSs.Realistic verification of our models and final construction is possible only on the base of fine-grained monitoring data on EFSs.

We are considering several (popularity - based) models for the solution of the ordering problem .

Page 13: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201313

The purpose of the File Access Monitoring Service is to monitor the frequency of the accesses to the files in the AliEn File Catalogue. The purpose of the File Access Monitoring Service is to monitor the frequency of the accesses to the files in the AliEn File Catalogue.

• The work on the development of FAMoS has been started on July 2012;

•The code has been included in AliEn v2.20 (on October 2012) and AliEn v2.21 (on January 2013);

•A part of FAMoS, which records the details on the file accesses has been deployed on all AliEn central servers on 6th of August 2013. Thanks to Miguel.

• The work on the development of FAMoS has been started on July 2012;

•The code has been included in AliEn v2.20 (on October 2012) and AliEn v2.21 (on January 2013);

•A part of FAMoS, which records the details on the file accesses has been deployed on all AliEn central servers on 6th of August 2013. Thanks to Miguel.

Page 14: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201314

Attribute DescriptionFile name The LFN of a fileSE name Name of SE from where file was accessedUser name Name of user by whom file was accessedAccess Time Time and date when the file was accessedOperation type Read or write accessOperation result Successful or failed access

The source:

Your comments and suggestions would be highly appreciated !!!

Since all the accesses to the files within File Catalogue are authenticated by the AliEn Authentication (Authen) service, a plugin (called attributes) has been included in the Authen service in order to record the values of specified attributes (into “Authen_ops" daily log files).

Page 15: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201315

Data typeCategory Files within this paths in AliEn File Catalogue

<Period>_AOD Real data:/alice/data/…/<Period>/…/ESDs/…/AOD<*>/…/alice/data/…/<Period>/…/AliAOD<*>.rootSimulated data:/alice/sim/…/<AcceleratorPeriod>/…/AliAOD<*>.root

<Period>_ESD Real data:/alice/data/…/<AcceleratorPeriod>/…/ESDs/… (except files of AOD category)/alice/data/…/<AcceleratorPeriod>/…/AliESD<*>.rootSimulated data:/alice/sim/…/<AcceleratorPeriod>/…/ALiESD<*>.root

COND /alice/…/OCDB/…/alice/…/CDB/…

USER /alice/cern.ch/user/…OTHER other paths

Page 16: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201316

/alice/packages/* *.par

*/bin/*

*.log *log_archive*

*validation.sh *.rc

*.sh *.jdl *.xml

*.C *.h *.cxx

/alice/packages/* *.par

*/bin/*

*.log *log_archive*

*validation.sh *.rc

*.sh *.jdl *.xml

*.C *.h *.cxx

They are ~94 % of accesses!!!

Suggested filters:

Page 17: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201317

Page 18: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 201318

Page 19: Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory

Offline weekly meeting 12 August 2013