Armenuhi Abramyan, Narine Manukyan ALICE team of A.I. Alikhanian National Scientific Laboratory {aabramya, nmanukya}@mail.yerphi.am
Jan 18, 2016
Armenuhi Abramyan, Narine Manukyan
ALICE team of A.I. Alikhanian National Scientific Laboratory{aabramya, nmanukya}@mail.yerphi.am
Offline weekly meeting 12 August 20132
Solution: Regular (partial) cleaning of SEs by removing certain portion of replicasSolution: Regular (partial) cleaning of SEs by removing certain portion of replicas
Step one: Definition of data files (DFs) and data file sets (FSs), subjects to removal
Step two: Construction of the removal and cleaning general scheme
Problem: Replication of data files ->
Repletion of SEs ->Prevention of further inflow of files to these SEs
Problem: Replication of data files ->
Repletion of SEs ->Prevention of further inflow of files to these SEs
Step three: Determination of the order of the FSs’ removal (replicas killing queuing )
Offline weekly meeting 12 August 20133
Offline weekly meeting 12 August 20134
‘Centrally’ created DFs:
•<*>ESD<*>.root;•<*>AOD<*>.root;•<*>QA<*>.root; •OCDB.root
‘Centrally’ created DFs:
•<*>ESD<*>.root;•<*>AOD<*>.root;•<*>QA<*>.root; •OCDB.root
Volumes occupied by DFs in LHC12c directory:
LHC12c 286 TB (100.00 %)__________________________________________RAWS 205.0 TB (71.7 %)__________________________________________
ESDs 64.4 TB (22.5 %)AODs 13.0 TB (4.54 %)QAs 3.55 TB (1.23 %)OCDBs 0.03 TB (0.02 %)__________________________________________
PWG directories 0.02 TB (0.01%)
Volumes occupied by DFs in LHC12c directory:
LHC12c 286 TB (100.00 %)__________________________________________RAWS 205.0 TB (71.7 %)__________________________________________
ESDs 64.4 TB (22.5 %)AODs 13.0 TB (4.54 %)QAs 3.55 TB (1.23 %)OCDBs 0.03 TB (0.02 %)__________________________________________
PWG directories 0.02 TB (0.01%)•PWG-created root files(?)
•PWG-created root files(?)
Data File (DF) = any .root fileData File (DF) = any .root file
What about user-created root files(?). How much volume do they occupy?
Offline weekly meeting 12 August 20135
Elementary File Sets (EFSs) = zip archives of DFs ( Σ of output .root files of each job). In AliEn, the data replication is done by these units.
Elementary File Sets (EFSs) = zip archives of DFs ( Σ of output .root files of each job). In AliEn, the data replication is done by these units.
Types of archives in AliEn:root_archive.zip = set of ESD | AOD | QA | PWG-created | user-created DFs
aod_archive.zip = set of AOD type DFs QA_archive.zip = set of QA related DFslog_archive | log_archive.zip = set of log files
Types of archives in AliEn:root_archive.zip = set of ESD | AOD | QA | PWG-created | user-created DFs
aod_archive.zip = set of AOD type DFs QA_archive.zip = set of QA related DFslog_archive | log_archive.zip = set of log files
Suggestions of CFSs containing the same types of EFSs:Run_AOD, Period_AOD - Full set of AOD type EFSs in Run, PeriodRun_ESD, Period_ESD - Full set of ESDs type EFS in Run, Period
Suggestions of CFSs containing the same types of EFSs:Run_AOD, Period_AOD - Full set of AOD type EFSs in Run, PeriodRun_ESD, Period_ESD - Full set of ESDs type EFS in Run, Period
To discuss and decide: The removal is convenient to be performed not on the level of the EFSs themselves, but in their (meaningful) combinations = Composite File Sets (CFSs)
To discuss and decide: The removal is convenient to be performed not on the level of the EFSs themselves, but in their (meaningful) combinations = Composite File Sets (CFSs)
Offline weekly meeting 12 August 20136
Offline weekly meeting 12 August 20137
1. How many custodial replicas (CMS parlance) of CFSs should be kept?
2. Should the number of custodial replicas be dependent on the type of
CFS (AOD,…. OCDB,..)?
3. Where to keep custodial replicas? ( on Tier1s? Tier2s with low CPU?)
1. How many custodial replicas (CMS parlance) of CFSs should be kept?
2. Should the number of custodial replicas be dependent on the type of
CFS (AOD,…. OCDB,..)?
3. Where to keep custodial replicas? ( on Tier1s? Tier2s with low CPU?)
Offline weekly meeting 12 August 2013
Global removal = removal from AliEn, without taking care of the state of individual SEs.
In Global removal approach, any CFS can be removed entirely!In Global removal approach, any CFS can be removed entirely!
An ordered removal of replicas of the same archive.zip:1.One keeps custodial replica(s)2.One takes into account additional criteria related to the individual sites:
a) Repletion state of the sites where the replicas are placed;b) Number of the failures to access the replicas on the sites they are placed.………………
An ordered removal of replicas of the same archive.zip:1.One keeps custodial replica(s)2.One takes into account additional criteria related to the individual sites:
a) Repletion state of the sites where the replicas are placed;b) Number of the failures to access the replicas on the sites they are placed.………………
SE1SE1
1+1 CFSarchive1.ziparchive1.zip
SE2SE2
SE1SE1
SEN2SEN2
SEN1SEN1
SE3SE3
archiveN.ziparchiveN.zip
… 1 CFSarchive1.ziparchive1.zip SE2SE2
SEN2SEN2archiveN.ziparchiveN.zip
…
archive1.ziparchive1.zip
1+1 CFS SE2SE2
SE1SE1
SEN2SEN2
SEN1SEN1
archiveN.ziparchiveN.zip
…
A problem: Which one of the replicas of an archive.zip to remove?A problem: Which one of the replicas of an archive.zip to remove?
Offline weekly meeting 12 August 20139
Since the EFSs entering CFS are dispersed by different SEs, an entire removal of a CFS is not possible in the local approach. Removal of a CFS can be made possible only in a correlated (with the other SEs) manner.No ordering of replicas of the same archive.zip for removal is needed. A single restriction- Custodial replicas are not removed.
The drawback of local approach!If we give priority to local freeing but not to the removal of an entire CFS,then only part of the EFSs entering a CFS can be removed. As a result, we will have ‘amputated’ CFSs.
Since the EFSs entering CFS are dispersed by different SEs, an entire removal of a CFS is not possible in the local approach. Removal of a CFS can be made possible only in a correlated (with the other SEs) manner.No ordering of replicas of the same archive.zip for removal is needed. A single restriction- Custodial replicas are not removed.
The drawback of local approach!If we give priority to local freeing but not to the removal of an entire CFS,then only part of the EFSs entering a CFS can be removed. As a result, we will have ‘amputated’ CFSs.
1+1 CFSarchive1.ziparchive1.zip
SE2SE2
SE1SE1
SEN2SEN2
SEN1SEN1
archiveN.ziparchiveN.zip
…1’ CFS archive1.ziparchive1.zip
…SE2SE2
1 CFSarchive1.ziparchive1.zip SE1SE1
SEN1SEN1archiveN.ziparchiveN.zip
…
(amputated CFS)
Offline weekly meeting 12 August 201310
Offline weekly meeting 12 August 201311
Dataset: The largest, homogeneous collection of files, related to a same trigger stream, processing campaign, output data format. The replica removal in CMS is done by datasets - Analog of CFS. (Note: One entire replica of a dataset is stored on one SE)
Dataset: The largest, homogeneous collection of files, related to a same trigger stream, processing campaign, output data format. The replica removal in CMS is done by datasets - Analog of CFS. (Note: One entire replica of a dataset is stored on one SE)
Analog of EFS - Block: The smallest unit in computing space which corresponds to a group of files likely to be accessed together. Files are grouped in blocks for bulk data management reasons
Analog of EFS - Block: The smallest unit in computing space which corresponds to a group of files likely to be accessed together. Files are grouped in blocks for bulk data management reasons
Replica removal in CMS: - Local removal scheme
Only Tier-2 sites are cleaned, with the following conditions: 1 The removal procedure starts if the site has less than 10% (or 15 TB for small
SEs) of free space.2. The removal procedure ends when 30 % (or 25 TB) of site space is freed.
Only Tier-2 sites are cleaned, with the following conditions: 1 The removal procedure starts if the site has less than 10% (or 15 TB for small
SEs) of free space.2. The removal procedure ends when 30 % (or 25 TB) of site space is freed.
Offline weekly meeting 12 August 201312
The ordering of CFSs for removal can be done using the Popularity notion. An example of popularity construction for a CFS is the summary number of calls to Data Files (summary number of occurrences of their LFNs) entering that CFS.
Realistic verification of our models and final construction is possible only on the base of fine-grained monitoring data on EFSs.Realistic verification of our models and final construction is possible only on the base of fine-grained monitoring data on EFSs.
We are considering several (popularity - based) models for the solution of the ordering problem .
Offline weekly meeting 12 August 201313
The purpose of the File Access Monitoring Service is to monitor the frequency of the accesses to the files in the AliEn File Catalogue. The purpose of the File Access Monitoring Service is to monitor the frequency of the accesses to the files in the AliEn File Catalogue.
• The work on the development of FAMoS has been started on July 2012;
•The code has been included in AliEn v2.20 (on October 2012) and AliEn v2.21 (on January 2013);
•A part of FAMoS, which records the details on the file accesses has been deployed on all AliEn central servers on 6th of August 2013. Thanks to Miguel.
• The work on the development of FAMoS has been started on July 2012;
•The code has been included in AliEn v2.20 (on October 2012) and AliEn v2.21 (on January 2013);
•A part of FAMoS, which records the details on the file accesses has been deployed on all AliEn central servers on 6th of August 2013. Thanks to Miguel.
Offline weekly meeting 12 August 201314
Attribute DescriptionFile name The LFN of a fileSE name Name of SE from where file was accessedUser name Name of user by whom file was accessedAccess Time Time and date when the file was accessedOperation type Read or write accessOperation result Successful or failed access
The source:
Your comments and suggestions would be highly appreciated !!!
Since all the accesses to the files within File Catalogue are authenticated by the AliEn Authentication (Authen) service, a plugin (called attributes) has been included in the Authen service in order to record the values of specified attributes (into “Authen_ops" daily log files).
Offline weekly meeting 12 August 201315
Data typeCategory Files within this paths in AliEn File Catalogue
<Period>_AOD Real data:/alice/data/…/<Period>/…/ESDs/…/AOD<*>/…/alice/data/…/<Period>/…/AliAOD<*>.rootSimulated data:/alice/sim/…/<AcceleratorPeriod>/…/AliAOD<*>.root
<Period>_ESD Real data:/alice/data/…/<AcceleratorPeriod>/…/ESDs/… (except files of AOD category)/alice/data/…/<AcceleratorPeriod>/…/AliESD<*>.rootSimulated data:/alice/sim/…/<AcceleratorPeriod>/…/ALiESD<*>.root
COND /alice/…/OCDB/…/alice/…/CDB/…
USER /alice/cern.ch/user/…OTHER other paths
Offline weekly meeting 12 August 201316
/alice/packages/* *.par
*/bin/*
*.log *log_archive*
*validation.sh *.rc
*.sh *.jdl *.xml
*.C *.h *.cxx
/alice/packages/* *.par
*/bin/*
*.log *log_archive*
*validation.sh *.rc
*.sh *.jdl *.xml
*.C *.h *.cxx
They are ~94 % of accesses!!!
Suggested filters:
Offline weekly meeting 12 August 201317
Offline weekly meeting 12 August 201318
Offline weekly meeting 12 August 2013