Outreach Wor kshop (Mar. 1, 2002) Paul Avery 1 Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected]Global Data Grids for 21 st Century Science GriPhyN/iVDGL Outreach Workshop University of Texas, Brownsville March 1, 2002
41
Embed
Outreach Workshop (Mar. 1, 2002)Paul Avery1 University of Florida avery/ [email protected] Global Data Grids for 21 st Century.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Learn about real world problemsDeploymentTestingApplications
Diverse global services
Coreservice
s
Diverse resources
Applications
Outreach Workshop (Mar. 1, 2002)
Paul Avery 14
Data Intensive Science: 2000-2015Scientific discovery increasingly driven by IT
Computationally intensive analysesMassive data collectionsData distributed across networks of varying capabilityGeographically distributed collaboration
Dominant factor: data growth (1 Petabyte = 1000 TB)
Medical dataX-Ray, mammography data, etc. (many petabytes)Digitizing patient records (ditto)
X-ray crystallographyBright X-Ray sources, e.g. Argonne Advanced Photon
Source
Molecular genomics and related disciplinesHuman Genome, other genome databasesProteomics (protein structure, activities, …)Protein interactions, drug delivery
Brain scans (3-D, time dependent)Virtual Population Laboratory (proposed)
Database of populations, geography, transportation corridors
Simulate likely spread of disease outbreaks
Craig Venter keynote@SC2001
Outreach Workshop (Mar. 1, 2002)
Paul Avery 17
Example: High Energy Physics“Compact” Muon Solenoid
at the LHC (CERN)
Smithsonianstandard man
Outreach Workshop (Mar. 1, 2002)
Paul Avery 18
1800 Physicists150 Institutes32 Countries
LHC Computing ChallengesComplexity of LHC interaction environment & resulting dataScale: Petabytes of data per year (100 PB by ~2010-12)GLobal distribution of people and resources
Outreach Workshop (Mar. 1, 2002)
Paul Avery 19
Tier0 CERNTier1 National LabTier2 Regional Center (University, etc.)Tier3 University workgroupTier4 Workstation
Global LHC Data Grid
Tier 1
T2
T2
T2
T2
T2
3
3
3
3
3
3
3
3
3
3
3
Tier 0 (CERN)
4 4 4 4
3 3
Key ideas:Hierarchical structureTier2 centers
Outreach Workshop (Mar. 1, 2002)
Paul Avery 20
Global LHC Data Grid
Tier2 Center
Online System
CERN Computer Center > 20
TIPS
USA CenterFrance Center
Italy Center UK Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations,other portals
~100 MBytes/sec
2.5 Gbits/sec
100 - 1000
Mbits/sec
Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute has ~10 physicists working on one or more channels
Physics data cache
~PBytes/sec
2.5 Gbits/sec
Tier2 CenterTier2 CenterTier2 Center
~622 Mbits/sec
Tier 0 +1
Tier 1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1
Outreach Workshop (Mar. 1, 2002)
Paul Avery 21
Sloan Digital Sky Survey Data Grid
Outreach Workshop (Mar. 1, 2002)
Paul Avery 22
LIGO (Gravity Wave) Data Grid
HanfordObservatory
LivingstonObservatory
Caltech
MIT
INet2Abilene
Tier1 LSCLSC
LSCLSC
LSCTier2
OC3
OC48
OC3
OC12
OC48
Outreach Workshop (Mar. 1, 2002)
Paul Avery 23
Data Grid Projects Particle Physics Data Grid (US, DOE)
Data Grid applications for HENP expts. GriPhyN (US, NSF)
Petascale Virtual-Data Grids iVDGL (US, NSF)
Global Grid lab TeraGrid (US, NSF)
Dist. supercomp. resources (13 TFlops) European Data Grid (EU, EC)
Data Grid technologies, EU deployment CrossGrid (EU, EC)
Data Grid technologies, EU DataTAG (EU, EC)
Transatlantic network, Grid applications Japanese Grid Project (APGrid?) (Japan)
Grid deployment throughout Japan
Collaborations of application scientists & computer scientists
Infrastructure devel. & deployment
Globus based
Outreach Workshop (Mar. 1, 2002)
Paul Avery 24
Coordination of U.S. Grid ProjectsThree U.S. projects
PPDG: HENP experiments, short term tools, deployment
GriPhyN: Data Grid research, Virtual Data, VDT deliverable
iVDGL: Global Grid laboratory
Coordination of PPDG, GriPhyN, iVDGLCommon experiments + personnel, management integration iVDGL as “joint” PPDG + GriPhyN laboratory Joint meetings (Jan. 2002, April 2002, Sept. 2002) Joint architecture creation (GriPhyN, PPDG)Adoption of VDT as common core Grid infrastructureCommon Outreach effort (GriPhyN + iVDGL)
New TeraGrid project (Aug. 2001)13MFlops across 4 sites, 40 Gb/s networkingGoal: integrate into iVDGL, adopt VDT, common Outreach
Outreach Workshop (Mar. 1, 2002)
Paul Avery 25
Worldwide Grid CoordinationTwo major clusters of projects
“US based” GriPhyN Virtual Data Toolkit (VDT)“EU based” Different packaging of similar components
Outreach Workshop (Mar. 1, 2002)
Paul Avery 26
GriPhyN = App. Science + CS + Grids
GriPhyN = Grid Physics NetworkUS-CMS High Energy PhysicsUS-ATLAS High Energy PhysicsLIGO/LSC Gravity wave researchSDSS Sloan Digital Sky SurveyStrong partnership with computer scientists
Design and implement production-scale gridsDevelop common infrastructure, tools and services (Globus
based) Integration into the 4 experimentsBroad application to other sciences via “Virtual Data
Toolkit”Strong outreach program
Multi-year projectR&D for grid architecture (funded at $11.9M +$1.6M) Integrate Grid infrastructure into experiments through VDT
GriPhyN Research AgendaVirtual Data technologies (fig.)
Derived data, calculable via algorithm Instantiated 0, 1, or many times (e.g., caches)“Fetch value” vs “execute algorithm”Very complex (versions, consistency, cost calculation, etc)
LIGO example“Get gravitational strain for 2 minutes around each of 200
gamma-ray bursts over the last year”
For each requested data value, need toLocate item location and algorithm Determine costs of fetching vs calculatingPlan data movements & computations required to obtain
results Execute the plan
Outreach Workshop (Mar. 1, 2002)
Paul Avery 30
Virtual Data in Action
Data request may Compute locally Compute remotely Access local data Access remote data
Scheduling based on Local policies Global policies Cost
Major facilities, archives
Regional facilities, caches
Local facilities, cachesFetch item
Outreach Workshop (Mar. 1, 2002)
Paul Avery 31
GriPhyN Research Agenda (cont.)Execution management
Co-allocation of resources (CPU, storage, network transfers)
Fault tolerance, error reporting Interaction, feedback to planning
Performance analysis (with PPDG) Instrumentation and measurement of all grid componentsUnderstand and optimize grid performance
Virtual Data Toolkit (VDT)VDT = virtual data services + virtual data toolsOne of the primary deliverables of R&D effortTechnology transfer mechanism to other scientific
domains
Outreach Workshop (Mar. 1, 2002)
Paul Avery 32
GriPhyN/PPDG Data Grid Architecture
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG
DAG
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI, CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
= initial solution is operational
Outreach Workshop (Mar. 1, 2002)
Paul Avery 33
Transparency wrt materialization
Id Trans F ParamName …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …
Trans Prog Cost …F URL:f 10 …G URL:g 20 …
Program storage
Trans. name
URLs for program location
Derived Data Catalog
Transformation Catalog
Update uponmaterialization
App specific attr. id …… i2,i10……
Derived Metadata Catalog
id
Id Trans Param Name …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …
International Virtual-Data Grid LaboratoryA global Grid laboratory (US, EU, South America, Asia, …)A place to conduct Data Grid tests “at scale”A mechanism to create common Grid infrastructureA facility to perform production exercises for LHC
experimentsA laboratory for other disciplines to perform Data Grid testsA focus of outreach efforts to small institutions
Funded for $13.65M by NSF
“We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.”
Grid Operations Center (GOC) Indiana (2 people) Joint work with TeraGrid on GOC development
Computer Science support teamsSupport, test, upgrade GriPhyN Virtual Data Toolkit
Outreach effort Integrated with GriPhyN
Coordination, interoperability
Outreach Workshop (Mar. 1, 2002)
Paul Avery 36
Current iVDGL Participants Initial experiments (funded by NSF proposal)
CMS, ATLAS, LIGO, SDSS, NVO
U.S. Universities and laboratories(Next slide)
PartnersTeraGridEU DataGrid + EU national projects Japan (AIST, TITECH)Australia
Complementary EU project: DataTAG2.5 Gb/s transatlantic network
Outreach Workshop (Mar. 1, 2002)
Paul Avery 37
U Florida CMSCaltech CMS, LIGOUC San Diego CMS, CS Indiana U ATLAS, GOCBoston U ATLASU Wisconsin, Milwaukee LIGOPenn State LIGO Johns Hopkins SDSS, NVOU Chicago/Argonne CSU Southern California CSU Wisconsin, Madison CSSalish Kootenai Outreach, LIGOHampton U Outreach, ATLASU Texas, Brownsville Outreach, LIGOFermilab CMS, SDSS, NVOBrookhaven ATLASArgonne LabATLAS, CS
U.S. iVDGL Proposal Participants
T2 / Software
CS support
T3 / Outreach
T1 / Labs(funded elsewhere)
Outreach Workshop (Mar. 1, 2002)
Paul Avery 38
Initial US-iVDGL Data Grid
Tier1 (FNAL)Proto-Tier2Tier3 university
UCSDFlorida
Wisconsin
FermilabBNL
Indiana
BU
Other sites to be added in
2002
SKC
Brownsville
Hampton
PSU
JHUCaltech
Outreach Workshop (Mar. 1, 2002)
Paul Avery 39
iVDGL Map (2002-2003)
Tier0/1 facility
Tier2 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
Tier3 facility
DataTAG
Surfnet
LaterBrazilPakistanRussiaChina
Outreach Workshop (Mar. 1, 2002)
Paul Avery 40
SummaryData Grids will qualitatively and quantitatively
change the nature of collaborations and approaches to computing
The iVDGL will provide vast experience for new collaborations
Many challenges during the coming transitionNew grid projects will provide rich experience and lessonsDifficult to predict situation even 3-5 years ahead