Grid Physics Network & Intl Virtual Data Grid Lab Ian Foster* For the GriPhyN & iVDGL Projects SCI PI Meeting February 18- 20, 2004 *Argonne, U.Chicago, Globus; [email protected]
Jan 03, 2016
Grid Physics Network&
Intl Virtual Data Grid Lab
Ian Foster*For the GriPhyN & iVDGL Projects
SCI PI Meeting February 18-20, 2004
*Argonne, U.Chicago, Globus; [email protected]
2
Cyberinfrastructure
“A new age has dawned in scientific & engineering research, pushed by continuing progress in computing, information, and communication technology, & pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific & engineering knowledge environments & organizations and to pursue research in new ways & with increased efficacy.” [Blue Ribbon Panel report, 2003]
But how will we learn how to build, operate, & use it?
3
Our Approach:Experimental & Collaborative
Experimental procedure: Mix together, and shake well:
Physicists* with an overwhelming need to pool resources to solve fundamental science problems
Computer scientists with a vision of a Grid that will enable virtual communities to share resources
Monitor byproducts Heat: sometimes incendiary Light: considerable, in eScience, computer science, &
cyberinfrastructure engineering Operational cyberinfrastructure (hardware &
software), with an enthusiastic and knowledgeable user community, and real scientific benefits
* We use “physicist” as a generic term indicating a non-computer scientist
4
Who are the “Physicists”? –GriPhyN/iVDGL Science Drivers
US-ATLAS, US-CMS (LHC expts) Fundamental nature of matter 100s of Petabytes
LIGO observatory Gravitational wave search 100s of Terabytes
Sloan Digital Sky Survey Astronomical research 10s of Terabytes
Data
gro
wth
Com
mu
nit
y g
row
th
2007
2005
2003
2001
2009
+ a growing number of biologists & other scientists+ computer scientists needing experimental apparatus
5
Common Underlying Problem:Data-Intensive Analysis
Users & resources in many institutions … 1000s of users, 100s of institutions, petascale resources
… engage in collaborative data analysis Both structured/scheduled
& interactive Many overlapping
virtual orgs must Define activities Pool resources Prioritize tasks Manage data …
6
Vision & Goals
Develop the technologies & tools needed to exploit a distributed cyberinfrastructure
Apply and evaluate those technologies & tools in challenging scientific problems
Develop the technologies & procedures to support a persistent cyberinfrastructure
Create and operate a persistent cyberinfrastructure in support of diverse discipline goals
GriPhyN + iVDGL + DOE Particle Physics Data Grid (PPDG) = Trillium
End-to-end
7
Two Distinctbut Integrated Projects
Both NSF-funded, overlapping periods GriPhyN: $11.9M (NSF) + $1.6M (match) (2000–2005) CISE iVDGL: $13.7M (NSF) + $2M (match) (2001–2006) MPS
Basic composition GriPhyN: 12 universities, SDSC, 3 labs (~80 people) iVDGL: 18 institutions, SDSC, 4 labs (~100
people) Large overlap in people, institutions, experiments, software
GriPhyN (Grid research) vs iVDGL (Grid deployment) GriPhyN: 2/3 “CS” + 1/3 “physics” ( 0% H/W) iVDGL: 1/3 “CS” + 2/3 “physics” (20% H/W)
Many common elements: Directors, Advisory Committee, linked management Virtual Data Toolkit (VDT), Grid testbeds, Outreach effort
Build on the Globus Toolkit, Condor, and other technologies
8
Project Specifics:GriPhyN
Develop the technologies & tools needed to exploit a distributed cyberinfrastructure
Apply and evaluate those technologies & tools in challenging scientific problems
Develop the technologies & procedures to support a persistent cyberinfrastructure
Create and operate a persistent cyberinfrastructure in support of diverse discipline goals
ScienceReview
ProductionManager
Researcher
instrument
Applications
storageelement
Grid
Grid Fabric
storageelement
storageelement
data
ServicesServices
discovery
discovery
sharing
VirtualData
Production Analysisparams
exec.
data
composition
VirtualData
planning
Planning
Production Analysisparams
exec.
data
Planning
Execution
planning
Execution
Virtual DataToolkit
Chimera virtual data
system
Pegasus planner
DAGman
Globus ToolkitCondor
Ganglia, etc.
Gri
PhyN
Overv
iew
pythia_input
pythia.exe
cmsim_input
cmsim.exe
writeHits
writeDigis
begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_fileend
begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_fileend
begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_fileend
begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_fileend
begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_dbend
begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_dbend
(Early) Virtual Data Language
CMS “Pipeline”
mass = 200decay = WWstability = 1LowPt = 20HighPt = 10000
mass = 200decay = WWstability = 1event = 8
mass = 200decay = WWstability = 1plot = 1
mass = 200decay = WWplot = 1
mass = 200decay = WWevent = 8
mass = 200decay = WWstability = 1
mass = 200decay = WWstability = 3
mass = 200
mass = 200decay = WW
mass = 200decay = ZZ
mass = 200decay = bb
mass = 200plot = 1
mass = 200event = 8
...The scientistadds a new derived data branch...
...and continues toinvestigate…
Search forWW decays of the Higgs Bosonand where onlystable, final stateparticles are recorded: stability = 1
Scientist discoversan interesting result – wants to know howit was derived.
Virtual Data Example:High Energy Physics
Work and slide byRick Cavanaugh andDimitri Bourilkov,University of Florida
12
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
TaskGraph
Virtual Data Example:Sloan Galaxy Cluster Analysis
Sloan Data
Jim Annis, Steve Kent, Vijay Sehkri, Neha Sharma, Fermilab, Michael
Milligan, Yong Zhao, Chicago
13
Virtual Data Example:NVO/NASA Montage
A small (1200 node) workflow
Construct custom mosaics on demand from multiple data sources
User specifies projection, coordinates, size, rotation, spatial sampling
Work by Ewa Deelman et al., USC/ISI and Caltech
14
Virtual Data Example: Education
(Work in Progress)
“We uploaded the data to the Grid & used the grid analysis tools to find the shower”
15
Project Specifics:iVDGL
Develop the technologies & tools needed to exploit a distributed cyberinfrastructure
Apply and evaluate those technologies & tools in challenging scientific problems
Develop the technologies & procedures to support a persistent cyberinfrastructure
Create and operate a persistent cyberinfrastructure in support of diverse discipline goals
16
iVDGL Goals
Deploy a Grid laboratory Support research mission of data-intensive expts Computing & personnel resources at university sites Provide platform for computer science development Prototype and deploy a Grid Operations Center
Integrate Grid software tools Into computing infrastructures of the experiments
Support delivery of Grid technologies Harden VDT & other middleware technologies developed
by GriPhyN and other Grid projects Education and Outreach
Enable underrepresented groups & remote regions to participate in international science projects
17
Virtual Data Toolkit
Sources(CVS)
Patching
GPT srcbundles
NMI
Build & TestCondor pool
(37 computers)
…
Build
Test
Package
VDT
Build
Contributors (VDS, etc.)
Build
Pacman cache
RPMs
Binaries
Binaries
Binaries Test
Will use NMI processes soon
A unique laboratory for managing, testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software!
18
Virtual Data Toolkit: Tools in VDT 1.1.12
Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds
Globus Alliance Grid Security Infrastructure (GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS)
EDG & LCG Make Gridmap Cert. Revocation List Updater Glue Schema/Info provider
ISI & UC Chimera & related tools Pegasus
NCSA MyProxy GSI OpenSSH
LBL PyGlobus Netlogger
Caltech MonaLisa
VDT VDT System Profiler Configuration software
Others KX509 (U. Mich.)
19
VDT Growth
02468
101214161820
Number of Components
VDT 1.0Globus 2.0bCondor 6.3.1
VDT 1.1.3,1.1.4 & 1.1.5 pre-SC 2002
VDT 1.1.7Switch to Globus 2.2
VDT 1.1.11Grid2003
VDT 1.1.8First real use by LCG
20
Grid2003: An Operational Grid28 sites (2100-2800 CPUs) & growing400-1300 concurrent jobs7 substantial applications + CS
experimentsRunning since October 2003
Korea
http://www.ivdgl.org/grid2003
21
Grid2003 Components Computers & storage at 28 sites (to date)
2800+ CPUs Uniform service environment at each site
Globus Toolkit provides basic authentication, execution management, data movement
Pacman installation system enables installation of numerous other VDT and application services
Global & virtual organization services Certification & registration authorities, VO membership
services, monitoring services Client-side tools for data access & analysis
Virtual data, execution planning, DAG management, execution management, monitoring
IGOC: iVDGL Grid Operations Center
22
Grid2003 Metrics
Metric Target Achieved
Number of CPUs 400 2762 (28 sites)
Number of users > 10 102 (16)
Number of applications > 4 10 (+CS)
Number of sites running concurrent apps
> 10 17
Peak number of concurrent jobs 1000 1100
Data transfer per day > 2-3 TB 4.4 TB max
23
Grid2003 Applications To Date
CMS proton-proton collision simulation ATLAS proton-proton collision simulation LIGO gravitational wave search SDSS galaxy cluster detection ATLAS interactive analysis BTeV proton-antiproton collision simulation SnB biomolecular analysis GADU/Gnare genone analysis Various computer science experiments
www.ivdgl.org/grid2003/applications
25
10M events produced: largest ever contribution
Almost double the number of events during first 25 days vs. 2002: with half the manpower Production run with 1
person working 50% 400 jobs at once vs.
200 previous year Multi-VO sharing
Grid2003 Scientific Impact: E.g., U.S. CMS 2003 Production
Continuing at an accelerating rate into 2004Many issues remain: e.g., scaling, missing functionality
26
Grid2003 as CS Research Lab: E.g., Adaptive Scheduling
Adaptive data placementin a realistic environment(K. Ranganathan)
Enables comparisonswith simulations
0
10
20
30
40
50
60
70
80
22-Jan 23-Jan 24-Jan 25-Jan 26-Jan 27-Jan 28-Jan
Aver
age
num
ber o
f Idl
e C
PUs
FNAL_CMS
IU_ATLAS_Tier2
0
500
1000
1500
2000
2500
BN
L_
AT
LA
S
Ca
lTe
ch
-Gri
d3
UFlo
rid
a-P
G
IU_
AT
LA
S_
Tie
r2
UB
uff
alo
-CC
R
UC
Sa
nD
ieg
oP
G
UFlo
rid
a-G
rid
3
UM
_A
TLA
S
Va
nd
erb
ilt
AN
L_
HE
P
Ca
lTe
ch
-PG
Tota
l I/O
tra
ffic
(M
Byte
s)
0
200
400
600
800
Least Loaded At Data Cost Function
Tot
al R
espo
nse
Tim
e (s
econ
ds)
27
Grid2003 Lessons Learned How to operate a Grid
Add sites, recover from errors, provide information,update software, test applications, …
Tools, services, procedures, docs, organization Need reliable, intelligent, skilled people
How to scale algorithms, software, process “Interesting” failure modes as scale increases Increasing scale must not overwhelm human resources
How to delegate responsibilities At Project, Virtual Org., service, site, appln level Distribution of responsibilities for future growth
How to apply distributed cyberinfrastructure
28
Summary: We Are Building Cyberinfrastructure …
GriPhyN/iVDGL (+ DOE PPDG & LHC, etc.) are Creating an (inter)national-scale, multi-disciplinary
infrastructure for distributed data-intensive science; Demonstrating the utility of such infrastructure via a broad
set of applications (not just physics!); Learning many things about how such infrastructures
should be created, operated, and evolved; and Capturing best practices in software & procedures,
including VDT, Pacman, monitoring tools, etc.
Unique scale & application breadth Grid3: 10 apps (science & CS), 28 sites, 2800 CPUs, 1300
jobs, and growing rapidly CS-applications-operations partnership
Having a strong impact on all three
29
… And Are Open for Business
Virtual Data Toolkit Distributed workflow & data management & analysis Data replication, data provenance, etc. Virtual organization management Globus Toolkit, Condor, and other good stuff
Grid2003 Adapt your applications to use VDT mechanisms and
obtain order-of-magnitude increases in performance Add your site to Grid2003 & join a national-scale
cyberinfrastructure Propose computer science experiments in a unique
environment Write an NMI proposal to fund this work