DOE/NSF Revi ew (Nov. 15, 2000) Paul Avery (LHC Data Grid) 1 LHC Data Grid The GriPhyN Perspective DOE/NSF Baseline Review of US-CMS Software and Computing Brookhaven National Lab Nov. 15, 2000 Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected]
22
Embed
DOE/NSF Review (Nov. 15, 2000)Paul Avery (LHC Data Grid)1 LHC Data Grid The GriPhyN Perspective DOE/NSF Baseline Review of US-CMS Software and Computing.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 1
LHC Data GridThe GriPhyN Perspective
DOE/NSF Baseline Review of US-CMSSoftware and ComputingBrookhaven National Lab
Nov. 15, 2000
Paul AveryUniversity of Floridahttp://www.phys.ufl.edu/~avery/[email protected]
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 2
Fundamental IT Challenge
“Scientific communities of thousands, distributed globally, and served by networks with bandwidths varying by orders of magnitude, need to extract small signals from enormous backgrounds via computationally demanding (Teraflops-Petaflops) analysis of datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale.”
US-CMS High Energy PhysicsUS-ATLAS High Energy PhysicsLIGO/LSC Gravity wave researchSDSS Sloan Digital Sky Survey
Strong partnership with computer scientistsDesign and implement production-scale grids
Maximize effectiveness of large, disparate resourcesDevelop common infrastructure, tools and servicesBuild on existing foundations: PPDG project, Globus tools Integrate and extend capabilities of existing facilities
$70M total cost NSF$12M: R&D$39M: Tier 2 center hardware, personnel$19M: Networking
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 4
Importance of GriPhyNFundamentally alters conduct of scientific research
Old: People, resources flow inward to labsNew: Resources, data flow outward to universities
Strengthens universitiesCouples universities to data intensive scienceCouples universities to national & international labsBrings front-line research to studentsExploits intellectual resources of formerly isolated schoolsOpens new opportunities for minority and women
researchers
Builds partnerships to drive new IT/science advancesPhysics AstronomyApplication Science Computer ScienceUniversities LaboratoriesFundamental sciences IT infrastructureResearch Community IT industry
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 5
GriPhyN R&D FundedNSF/ITR results announced Sep. 13
$11.9M from Information Technology Research Program$ 1.4M in matching from universitiesLargest of all ITR awardsJoint NSF oversight from CISE and MPS
Scope of ITR fundingMajor costs for people, esp. students, postdocsNo hardware or professional staff for operations !2/3 CS + 1/3 application science Industry partnerships being developed
Why a Data Grid: PhysicalUnified system: all computing resources part of grid
Efficient resource use (manage scarcity)Resource discovery / scheduling / coordination truly possible“The whole is greater than the sum of its parts”
Optimal data distribution and proximity Labs are close to the (raw) data they needUsers are close to the (subset) data they needMinimize bottlenecks
Efficient network use local > regional > national > oceanicNo choke points
Scalable growth
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 10
Why a Data Grid: PoliticalCentral lab cannot manage / help 1000s of users
Easier to leverage resources, maintain control, assert priorities at regional / local level
Cleanly separates functionalityDifferent resource types in different TiersOrganization vs. flexibilityFunding complementarity (NSF vs DOE), targeted initiatives
New IT resources can be added “naturally”Matching resources at Tier 2 universitiesLarger institutes can join, bringing their own resourcesTap into new resources opened by IT “revolution”
Broaden community of scientists and studentsTraining and educationVitality of field depends on University / Lab partnership
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 11
GriPhyN Research AgendaVirtual Data technologies
Derived data, calculable via algorithm (e.g., 90% of HEP data) Instantiated 0, 1, or many timesFetch data vs. execute algorithmVery complex (versions, consistency, cost calculation, etc)
Planning and schedulingUser requirements (time vs cost)Global and local policies + resource availabilityComplexity of scheduling in dynamic environment (hierarchy)Optimization and ordering of multiple scenariosRequires simulation tools, e.g. MONARC
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 12
Virtual Datain Action
?
Major Archive Facilities
Network caches & regional centers
Local sites
Data request may Compute locally Compute remotely Access local data Access remote data
Scheduling based on Local policies Global policies
Local autonomy
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 13
Research Agenda (cont.)Execution management
Co-allocation of resources (CPU, storage, network transfers)Fault tolerance, error reportingAgents (co-allocation, execution)Reliable event service across Grid Interaction, feedback to planning
Performance analysis Instrumentation and measurement of all grid componentsUnderstand and optimize grid performance
Virtual Data Toolkit (VDT)VDT = virtual data services + virtual data toolsOne of the primary deliverables of R&D effortOngoing activity + feedback from experiments (5 year plan)Technology transfer mechanism to other scientific domains
Schedules, milestonesSenior computing members in Project Coordination GroupFormal liaison with experimentsFormal presence in Application area
GriPhyN PMP:“…the schedule, milestones and deliverables of the [GriPhyN]Project must correspond to those of the experiments to whichthey belong.”
DOE/NSF Review (Nov. 15, 2000)
Paul Avery (LHC Data Grid) 21
Long-Term Grid ManagementCMS Grid is a complex, distributed entity
International scopeMany sites from Tier 0 – Tier 3Thousands of boxesRegional, national, international networks
Necessary to managePerformance monitoringFailuresBottleneck identification and resolutionOptimization of resource useChange policies based on resources, CMS priorities
Can effectively separate USA Grid componentUS Grid management coordinated with CMS
ToolsStochastic optimization with incomplete informationVisualizationDatabase performanceMany tools provided by GriPhyN performance monitoringExpect substantial help from IT industry
Timescale and scopeProto-effort starting in 2004, no later than 200546 experts needed to oversee US GridExpertise: ODBMS, networks, scheduling, grids, etc.