The ATLAS Computing Model The ATLAS Computing Model Roger Jones Roger Jones Lancaster University Lancaster University CHEP06 CHEP06 Mumbai 13 Feb. 2006 Mumbai 13 Feb. 2006
Mar 28, 2015
The ATLAS Computing ModelThe ATLAS Computing Model
Roger JonesRoger Jones
Lancaster UniversityLancaster University
CHEP06CHEP06
Mumbai 13 Feb. 2006Mumbai 13 Feb. 2006
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 2
OverviewOverview
• Brief summary ATLAS Facilities and their Brief summary ATLAS Facilities and their
rolesroles
• Growth of resourcesGrowth of resources
• CPU, Disk, Mass Storage
• Network requirementsNetwork requirements
• CERN↔ Tier 1 ↔ Tier 2
• Operational Issues and Hot TopicsOperational Issues and Hot Topics
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 3
Computing ResourcesComputing Resources
• Computing Model fairly well evolved, documented in C-TDRComputing Model fairly well evolved, documented in C-TDR• Externally reviewed • http://doc.cern.ch//archive/electronic/cern/preprints/lhcc/public/lh
cc-2005-022.pdf
• There are (and will remain for some time) many unknownsThere are (and will remain for some time) many unknowns• Calibration and alignment strategy is still evolving• Physics data access patterns MAY be exercised from June
• Unlikely to know the real patterns until 2007/2008!
• Still uncertainties on the event sizes , reconstruction time
• Lesson from the previous round of experiments at CERN Lesson from the previous round of experiments at CERN
(LEP, 1989-2000)(LEP, 1989-2000)• Reviews in 1988 underestimated the computing requirements by
an order of magnitude!
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 4
ATLAS FacilitiesATLAS Facilities
• Event Filter Farm at CERN Event Filter Farm at CERN • Located near the Experiment, assembles data into a stream to the Tier 0 Center
• Tier 0 Center at CERNTier 0 Center at CERN• Raw data Mass storage at CERN and to Tier 1 centers• Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD)• Ship ESD, AOD to Tier 1 centers Mass storage at CERN
• Tier 1 Centers distributed worldwide (10 centers)Tier 1 Centers distributed worldwide (10 centers)• Re-reconstruction of raw data, producing new ESD, AOD• Scheduled, group access to full ESD and AOD
• Tier 2 Centers distributed worldwide (approximately 30 centers)Tier 2 Centers distributed worldwide (approximately 30 centers)• Monte Carlo Simulation, producing ESD, AOD, ESD, AOD Tier 1 centers• On demand user physics analysis
• CERN Analysis FacilityCERN Analysis Facility
• Analysis
• Heightened access to ESD and RAW/calibration data on demand
• Tier 3 Centers distributed worldwideTier 3 Centers distributed worldwide• Physics analysis
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 5
ProcessingProcessing
• Tier-0:Tier-0:• Prompt first pass processing on express/calibration
physics stream
• 24-48 hours later, process full physics data stream with
reasonable calibrations Implies large data movement from T0 →T1s
• Tier-1:Tier-1:• Reprocess 1-2 months after arrival with better calibrations
• Reprocess all resident RAW at year end with improved
calibration and software Implies large data movement from T1↔T1 and T1 → T2
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 6
Analysis modelAnalysis model
Analysis model broken into two componentsAnalysis model broken into two components• Scheduled central production of augmented AOD,
tuples & TAG collections from ESD
Derived files moved to other T1s and to T2s
• Chaotic user analysis of augmented AOD streams,
tuples, new selections etc and individual user
simulation and CPU-bound tasks matching the
official MC production
Modest job traffic between T2s
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 7
Inputs to the ATLAS Inputs to the ATLAS Computing Model (1)Computing Model (1)
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 8
Inputs to the ATLAS Inputs to the ATLAS Computing Model (2)Computing Model (2)
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 9
Tier 2 view
Tier 0 view
Data FlowData Flow
• EF farm EF farm T0 T0• 320 MB/s continuous
• T0 Raw data T0 Raw data Mass Storage at CERN Mass Storage at CERN
• T0 Raw data T0 Raw data Tier 1 centers Tier 1 centers
• T0 ESD, AOD, TAG T0 ESD, AOD, TAG Tier 1 centers Tier 1 centers • 2 copies of ESD distributed worldwide
• T1 T1 T2 T2• Some RAW/ESD, All AOD, All TAG• Some group derived datasets
• T2 T2 T1 T1• Simulated RAW, ESD, AOD, TAG
• T0 T0 T2 Calibration processing? T2 Calibration processing?
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 10
ATLAS partial &“average” T1 Data Flow (2008)ATLAS partial &“average” T1 Data Flow (2008)
Tier-0
CPUfarm
T1T1OtherTier-1s
diskbuffer
RAW
1.6 GB/file0.02 Hz1.7K f/day32 MB/s2.7 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AOD2
10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day
AODm2
500 MB/file0.004 Hz0.34K f/day2 MB/s0.16 TB/day
RAW
ESD2
AODm2
0.044 Hz3.74K f/day44 MB/s3.66 TB/day
RAW
ESD (2x)
AODm (10x)
1 Hz85K f/day720 MB/s
T1T1OtherTier-1s
T1T1EachTier-2
Tape
RAW
1.6 GB/file0.02 Hz1.7K f/day32 MB/s2.7 TB/day
diskstorage
AODm2
500 MB/file0.004 Hz0.34K f/day2 MB/s0.16 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AOD2
10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AODm2
500 MB/file0.036 Hz3.1K f/day18 MB/s1.44 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AODm2
500 MB/file0.036 Hz3.1K f/day18 MB/s1.44 TB/day
ESD1
0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day
AODm1
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
AODm1
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
AODm2
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
Plus simulation and Plus simulation and analysis data flowanalysis data flow
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 11
Total ATLAS Total ATLAS Requirements in for 2008Requirements in for 2008
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 12
Important points:Important points:
• Discussion on disk vs tape storage at Tier-1’sDiscussion on disk vs tape storage at Tier-1’s
• Tape in this discussion means low-access secure storage
• No ‘disk buffers’ included except input to Tier 0
• Storage of Simulation data from Tier 2’sStorage of Simulation data from Tier 2’s
• Assumed to be at T1s
• Need partnerships to plan networking
• Must have fail-over to other sites
• CommissioningCommissioning
• Requirement of flexibility in the early stages
• Simulation is a tunable parameter in T2 numbers!Simulation is a tunable parameter in T2 numbers!
• Heavy Ion running still under discussion.Heavy Ion running still under discussion.
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 13
ATLAS T0 ResourcesATLAS T0 Resources
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 14
ATLAS T1 ResourcesATLAS T1 Resources
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 15
ATLAS T2 ResourcesATLAS T2 Resources
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 16
Required Network Required Network BandwidthBandwidth
• CaveatsCaveats• No safety factors
• No headroom
• Just sustained average numbers
• Assumes no years/datasets are ‘junked’
• Physics analysis pattern still under study…
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 17
T1 ↔ CERN Bandwidth I+OT1 ↔ CERN Bandwidth I+O
0
500
1000
1500
Jul
Aug
Sep Oct
Nov
Dec Jan
Feb
Mar
Apr
May Ju
nJu
lA
ugS
ep Oct
Nov
Dec Jan
Feb
Mar
Apr
May Ju
nJu
lA
ugS
ep Oct
Nov
Dec Jan
Feb
Mar
Apr
May Ju
nJu
lA
ugS
ep Oct
Nov
Dec
2007 2008 2009 2010
Month
MB
/s (n
omin
al)
ATLAS HI
ATLAS
•Mainly outward data movement
The projected time profile of the nominal bandwidth required between CERN and the Tier-1 cloud.
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 18
T1 ↔ T1 Bandwidth I+OT1 ↔ T1 Bandwidth I+O
0
100
200
300
400
500
600
Jul
Aug
Sep Oct
Nov
Dec Ja
nFe
bM
arA
prM
ay Jun
Jul
Aug
Sep Oct
Nov
Dec Ja
nFe
bM
arA
prM
ay Jun
Jul
Aug
Sep Oct
Nov
Dec Ja
nFe
bM
arA
prM
ay Jun
Jul
Aug
Sep Oct
Nov
Dec
2007 2008 2009 2010Month
MB
/s (n
omin
al)
ATLAS HI
ATLAS
•About half is scheduled analysis
The projected time profile of the nominal bandwidth required T1 and T1 cloud.
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 19
T1↔ T2 Bandwidth I+OT1↔ T2 Bandwidth I+O
0
50
100
150
200
250
Jul
Aug Sep
Oct
Nov Dec Jan
Feb
Mar
Apr
May Ju
n
Jul
Aug Sep
Oct
Nov Dec Jan
Feb
Mar
Apr
May Ju
n
Jul
Aug Sep
Oct
Nov Dec Jan
Feb
Mar
Apr
May Ju
n
Jul
Aug Sep
Oct
Nov Dec
2007 2008 2009 2010
Month
MB
/s (
no
min
al)
ATLAS HI
ATLAS
The projected time profile of the nominal aggregate bandwidth expected for an average ATLAS Tier- 1 and its three associated Tier-2s.
•Dominated by AOD
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 20
Issues 1: T1 ReprocessingIssues 1: T1 Reprocessing
• Reprocessing at Tier 1s is understood in Reprocessing at Tier 1s is understood in
principleprinciple• In practice, requires efficient recall of data from
archive and processing• Pinning, pre-staging, DAGs all required?
• Requires the different storage roles to be well
understood
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 21
Issues 2: StreamingIssues 2: Streaming
• This is *not* a theological issueThis is *not* a theological issue• All discussions are about optimisation of data access
• TDR has 4 streams from event filter TDR has 4 streams from event filter • primary physics, calibration, express, problem events
• Calibration stream has split at least once since!
• At AOD, envisage ~10 streamsAt AOD, envisage ~10 streams
• ESD streaming?ESD streaming?• Straw man streaming schemes (trigger based) being agreed
• Will explore the access improvements in large-scale exercises
• Will also look at overlaps, bookkeeping etc
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 22
TAG AccessTAG Access
• TAG is a keyed list of variables/eventTAG is a keyed list of variables/event
• Two rolesTwo roles
• Direct access to event in file via pointer
• Data collection definition function
• Two formats, file and databaseTwo formats, file and database
• Now believe large queries require full databaseNow believe large queries require full database
• Restricts it to Tier1s and large Tier2s/CAF
• Ordinary Tier2s hold file-based TAG corresponding to
locally-held datasets
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 23
ConclusionsConclusions
• Computing Model Data Flow understood for Computing Model Data Flow understood for
placing Raw, ESD and AOD at Tiered centersplacing Raw, ESD and AOD at Tiered centers• Still need to understand data flow implications of
Physics Analysis
• SC4/Computing System Commissioning in SC4/Computing System Commissioning in
2006 is vital.2006 is vital.
• Some issues will only be resolved with real Some issues will only be resolved with real
data in 2007-8data in 2007-8
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 24
Backup SlidesBackup Slides
RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 25
Heavy Ion RunningHeavy Ion Running