The ATLAS Computing & Analysis The ATLAS Computing & Analysis Model Model Roger Jones Roger Jones Lancaster University Lancaster University ATLAS UK 06 ATLAS UK 06 IPPP, 20/9/2006 IPPP, 20/9/2006
Jan 19, 2018
The ATLAS Computing & Analysis The ATLAS Computing & Analysis ModelModel
Roger JonesRoger JonesLancaster UniversityLancaster University
ATLAS UK 06ATLAS UK 06
IPPP, 20/9/2006IPPP, 20/9/2006
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 2
ATLAS Facilities (Steady ATLAS Facilities (Steady State)State)
• Tier 0 Center at CERNTier 0 Center at CERN• Raw data Mass storage at CERN and to Tier 1 centers• Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD)• Ship ESD, AOD to Tier 1 centers Mass storage at CERN
• Tier 1 Centers distributed worldwide (10 centers)Tier 1 Centers distributed worldwide (10 centers)• Re-reconstruction of raw data, producing new ESD, AOD (~2 months after arrival and at
year end)• Scheduled, group access to full ESD and AOD
• Tier 2 Centers distributed worldwide (approximately 30 centers)Tier 2 Centers distributed worldwide (approximately 30 centers)• On demand user physics analysis of shared datasets• Monte Carlo Simulation, producing ESD, AOD, ESD, AOD Tier 1 centers
• CERN Analysis FacilityCERN Analysis Facility• Heightened access to ESD and RAW/calibration data on demand• Calibration, detector optimization, some analysis - vital in early stages
• Tier 3 Centers distributed worldwideTier 3 Centers distributed worldwide• Physics analysis
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 3
New Straw Man ProfileNew Straw Man Profile
year energy luminosity physics beam time
2007 450+450
GeV
5x1030 protons - 26 days at 30% overall efficiency
0.7*106 seconds
2008 7+7 TeV 0.5x1033 protons - starting beginning July 4*106 seconds
ions - end of run - 5 days at 50% overall efficiency
0.2*106 seconds
2009 7+7 TeV 1x1033 protons:50% better than 2008 6*106 seconds
ions: 20 days of beam at 50% efficiency
106 seconds
2010 7+7 TeV 1x1034 TDR targets:
protons: 107 seconds
ions: 2*106 seconds
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 4
EvolutionEvolution
New T0 Evolution
0
5000
10000
15000
20000
25000
Total Disk (TB)Total Tape (TB)Total CPU (kSI2k)
Total Disk (TB) 75.14785714 152.4621429 277.3242857 472.3528571 472.3528571 472.3528571
Total Tape (TB) 343.5075 2064.711 4590.2345 10414.158 16238.0815 22062.005
Total CPU (kSI2k) 1910 3705 4058 6105 6105 6105
2007 2008 2009 2010 2011 2012
New T1 Evolution
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
Total Disk (TB)Total Tape (TB)Total CPU (kSI2k)
Total Disk (TB) 1282.6332 7573.554071 15769.27507 34584.36907 50327.3945 66070.41993
Total Tape (TB) 625.8 6005.32976 11961.95009 24207.43053 38600.13108 55140.05174
Total CPU (kSI2k) 3573.823529 16716.73529 27485.62353 45026.62353 62567.62353 80108.62353
2007 2008 2009 2010 2011 2012
New CAF Evolution
0
1000
2000
3000
4000
5000
6000
7000
8000
Total Disk (TB)Total Tape (TB)Total CPU (kSI2k)
Total Disk (TB) 213.0040952 993.241619 1638.16381 2584.326667 3552.494524 4457.405238
Total Tape (TB) 57.8035 314.2376667 558.1696667 893.1896667 1161.809667 1430.429667
Total CPU (kSI2k) 827 2076 2502 4158 5649 7139
2007 2008 2009 2010 2011 2012
New T2 Evolution
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Disk (TB)CPU (kSI2k)
Disk (TB) 868.4017714 6096.782224 11160.8019 27798.41335 44374.90338 52135.87483
CPU (kSI2k) 2775.983333 18939.98144 30722.35589 61035.83737 75996.81886 90957.80034
2007 2008 2009 2010 2011 2012
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 5
ObservationsObservations
• The T2s tend to have too high a cpu/disk ratioThe T2s tend to have too high a cpu/disk ratio• Optimal use of the T2 resources delivers lots of simulation with
network and T1 disk consequences (although the higher cpu/event will reduce this)
• The T2 disk only allows about ~60% of the required analysis• Other models would seriously increase network traffic
• GridPP planned disk/cpu balanace is right of courseGridPP planned disk/cpu balanace is right of course• But not the current values • And plans are plans until funded!
• Simulation time is crippling - need a real asessment of what is Simulation time is crippling - need a real asessment of what is *need**need*
• Bigger ESD means few ESD events accessedBigger ESD means few ESD events accessed
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 6
StreamingStreaming
• This is an optimisation issueThis is an optimisation issue• All discussions are about optimisation of data access
• TDR had 4 streams from event filter TDR had 4 streams from event filter • Primary physics, calibration, express, problem events• Calibration stream has split at least once since!
• Now envisage ~10 streams of RAW, ESD, AODNow envisage ~10 streams of RAW, ESD, AOD• Based on trigger bits (immutable)
• Optimizes access for detector optimisation• Straw man streaming schemes to be tested in large-scale exercises
• Debates between inclusive and exclusive streams (access vs data Debates between inclusive and exclusive streams (access vs data
management) - inclusive may add ~10% to data volumesmanagement) - inclusive may add ~10% to data volumes
• (Some of) All streams to all Tier 1s(Some of) All streams to all Tier 1s• Raw to archive blocked by stream and time for efficient reprocessing
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 7
TAG in TiersTAG in Tiers
• File-based TAGs allow you to access events withing File-based TAGs allow you to access events withing
files directlyfiles directly
• Full relational database TAG for selections over large Full relational database TAG for selections over large
datasetsdatasets• Full relational database too demanding for most Tier 2s• Expect Tier 2 to hold file-based tag for every local dataset
• Supports event access and limited dataset definition• Tier 1 will be expected to hold full database TAG as well
as file formats (for distribution)• Tentative plans for queued access to full database version
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 8
Getting GoingGetting Going
• Every group should have a Grid User InterfaceEvery group should have a Grid User Interface• Ideally one on every desktop• This was presented about a year ago to HEP SYSMAN• But many groups do not seem to have one
• Pressure needed from the grass roots?Pressure needed from the grass roots?
• Users needUsers need• A Grid certificate• Join the ATLAS Virtual Organisation• http://www.gridpp.ac.uk/deployment/users/
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 9
Analysis ResourcesAnalysis Resources
• In terms of non-local UK resources, we are already in In terms of non-local UK resources, we are already in the Grid Erathe Grid Era
• UK resources are asked for centrally via GridPP• These are dominated by production tasks for ATLAS• Some additional capacity for analysis and group activity
• All of this is Grid based - no nfs disk, no local submission• If UK groups have identified needs that are not in the
ATLAS central planning, please justify it and send it to me• We need to know >3 months in advance
• Quota and fair-share technologies are being rolled-out, but at present people must be responsible
• This is not an infinite resource• Using large amounts of storage can block the production
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 10
Computing System Computing System CommissioningCommissioning
• This is a staged series of computing exercisesThis is a staged series of computing exercises
• Analysis is a vital componentAnalysis is a vital component• Need people doing realistic analysis by the Spring• If we don’t get the bugs found then, physics will
suffer• Interesting events mixed with background• Data dispersed across sites
RWL Jones 13 Sept 2006 GenevaRWL Jones 13 Sept 2006 Geneva 11
ConclusionsConclusions• Computing Model Data well evolved for placing Raw, ESD and Computing Model Data well evolved for placing Raw, ESD and
AOD at Tiered centersAOD at Tiered centers• Still need to understand all the implications of Physics Analysis• Distributed Analysis and Analysis Model Progressing well
• But at present, data access is not fit for purpose (action underway)• A large ESD blows-up the model• CPU/Disk imbalances really distort the model• The large simulation time per event is crippling in the long term
• SC4/Computing System Commissioning in 2006 is vital.SC4/Computing System Commissioning in 2006 is vital.
• Some issues will only be resolved with real data in 2007-8Some issues will only be resolved with real data in 2007-8