Cyberinfrastructure From Dreams to Reality Deborah L. Crawford Deputy Assistant Director of NSF for Computer & Information Science & Engineering Workshop for eInfrastructure Rome, December 9, 2003
Dec 18, 2015
CyberinfrastructureFrom Dreams to Reality
Deborah L. CrawfordDeputy Assistant Director of NSF for
Computer & Information Science & Engineering
Workshop for eInfrastructuresRome, December 9, 2003
2
Daniel E. Atkins, Chair, University of Michigan Kelvin K. Droegemeier, University of Oklahoma Stuart I. Feldman, IBM Hector Garcia-Molina, Stanford University Michael L. Klein, University of Pennsylvania David G. Messerschmitt, University of California at Berkeley Paul Messina, California Institute of Technology Jeremiah P. Ostriker, Princeton University Margaret H. Wright,New York University http://www.communitytechnology.org/nsf_ci_report/
3
In summary then, the opportunity is here to create cyberinfrastructure that enables more ubiquitous, comprehensive knowledge environments that become functionally complete .. in terms of people, data, information, tools, and instruments and that include unprecedented capacity for computational, storage, and communication.
They can serve individuals, teams and organizations in ways that revolutionize what they do, how they do it, and who can participate.
- The Atkins Report
Setting the Stage
4
Desired Characteristics
• Science- and engineering-driven• Enabling discovery, learning and innovation• Promising economies of scale and scope • Supporting data-, instrumentation-,
compute- and collaboration-intensive applications
• High-end to desktop• Heterogeneous• Interoperable-enabled by collection of
reusable, common building blocks
5
Hardware
Integrated Cyberinfrastructure meeting the needs of a community of communities
CI Services & Middleware
Applications • Environmental Science• High Energy Physics• Proteomics/Genomics• Learning
Science Gateways
Science Gateways
CI Commons
CI Commons
Distributed Resources
Distributed Resources
Training & Workforce Development
Discovery, Learning & Innovation
Science of CI
6
Overarching Principles
• Enrich the portfolio• Demonstrate transformative power of CI across S&E enterprise• Empower range of CI users – current and emerging • System-wide evaluation and CI-enabling research informs progress
• Develop intellectual capital• Catalyze community development and support• Enable training and professional development • Broaden participation in the CI enterprise
• Enable integration and interoperability• Develop shared vision, integrating architectures, common investments• Promote collaboration, coordination and communication across fields• Share promising technologies, practices and lessons learned
7
S&E Gateways
CI Commons
Core Activities
- Compute-centric- Information-intensive- Instrumentation-enabling- Interactive-intensive
Integrative CI“system of systems”
CI Planning - A Systems Approach
CI-enabling Research
Domain-specific strategic plans-Technology/human capital roadmaps-Gaps and barrier analyses (policy, funding, ..)
System-wide activities-Education, training-(Inter)national networks-Capacity computing-Science of CI
8
Baselining NSF CI Investments
• Core (examples)• Protein Databank• Network for Earthquake Engineering Simulation• International Integrated Microdata Access System• Partnerships, Advanced Computational Infrastructure • Circumarctic Environmental Observatory Network• National Science Digital Library• Pacific Rim GRID Middleware
• Priority Areas (examples)• Geosciences Network • international Virtual Data Grid Laboratory • Grid Research and Applications Development
.. and others too numerous to mention (~$400M in FY’04)
9
CI Building Blocks
Partnerships for Advanced Computational Infrastructure (PACI)
– Science Gateways (Alpha projects, Expeditions)
– Middleware Technologies (NPACKage, ATG, Access Grid in a Box, OSCAR )
– Computational Infrastructure
NSF Middleware Initiative (NMI)– Production software releases– GridsCenter Software Suite, etc.
Early Adopters– Grid Physics Network (GriPhyN),
international Virtual Data Grid Laboratory (iVDGL)
– National Virtual Observatory– Network for Earthquake
Engineering Simulation (NEES)– Bio-Informatics Research Network
(BIRN)
Extensible Terascale Facility (TERAGRID)
– Science Gateways (value-added of integrated system approach)
– Common Teragrid Software Stack (CTSS)
– Compute engines, Data, Instruments, Visualization
10
Extensible Terascale Facility (TERAGRID) A CI Pathfinder
• Pathfinder Role– integrated with extant CI capabilities – clear value-added
•supporting a new class of S&E applications• Deploy a balanced, distributed system
– not a “distributed computer” but rather– a distributed “system” using Grid technologies
•computing and data management•visualization and scientific application analysis•remote instrumentation access
• Define an open and extensible infrastructure– an “enabling cyberinfrastructure” demonstration– extensible beyond original sites with additional funding
•NCSA, SDSC, ANL, Caltech and PSC•ORNL, TACC, Indiana University, Purdue University and Atlanta hub
11
Resource Providers + 4 New Sites
NCSA: Compute IntensiveSDSC: Data Intensive PSC: Compute Intensive
IA64
IA64 Pwr4EV68
IA32
IA32
EV7
IA64
10 TF IA-64128 large memory nodes
230 TB Disk Storage3 PB Tape Storage
GPFS and data mining
4 TF IA-64DB2, Oracle Servers500 TB Disk Storage6 PB Tape Storage1.1 TF Power4
6 TF EV6871 TB Storage
0.3 TF EV7 shared-memory150 TB Storage Server
1.25 TF IA-6496 Viz nodes
20 TB Storage
0.4 TF IA-64IA32 Datawulf80 TB Storage
Extensible Backplane NetworkLA
HubChicago
Hub
IA32
Storage Server
Disk Storage
Cluster
Shared Memory
VisualizationCluster
LEGEND
30 Gb/s
IA64
30 Gb/s
30 Gb/s30 Gb/s
30 Gb/s
Sun
Sun
ANL: VisualizationCaltech: Data collection analysis
40 Gb/s
Backplane Router
12
Common Teragrid Software Stack (CTSS)
• Linux Operating Environment• Basic and Core Globus Services
– GSI (Grid Security Infrastructure)
– GSI-enabled SSH and GSIFTP
– GRAM (Grid Resource Allocation & Management)
– GridFTP– Information Service– Distributed accounting– MPICH-G2– Science Portals
• Advanced and Data Services– Replica
Management Tools
– GRAM-2 (GRAM extensions)
– CAS (Community Authorization Service)
– Condor-G (as brokering “super scheduler”)
– SDSC SRB (Storage Resource Broker)
– APST user middleware, etc.
13
TERAGRID as a Pathfinder
• Science Drivers - Gateways-On-demand computing-Remote visual steering-Data-intensive computing
• Systems Integrator/Manager-Common TERAGRID Software Stack-User training & services-TERAGRID Operations Center
• Resource Providers-Data resources, compute engines, viz, user services
14
Focus on Policy and Social Dynamics
• Policy issues must be considered up front
• Social engineering will be at least as important as software engineering
• Well-defined interfaces will be critical for successful software development
• Application communities will need to participate from the beginning
Fran Berman, SDSC
15
CI Building Blocks
Partnerships for Advanced Computational Infrastructure (PACI)
– Science Gateways (Alpha projects, Expeditions)
– Middleware Technologies (NPACKage, ATG, Access Grid in a Box, OSCAR )
– Computational Infrastructure
NSF Middleware Initiative (NMI)– Production software releases– GridsCenter Software Suite, etc.
Early Adopters– Grid Physics Network (GriPhyN),
international Virtual Data Grid Laboratory (iVDGL)
– National Virtual Observatory– Network for Earthquake
Engineering Simulation (NEES)– Bio-Informatics Research Network
(BIRN)
Extensible Terascale Facility (TERAGRID)
– Science Gateways (value-added of integrated system approach)
– Common Teragrid Software Stack (CTSS)
– Compute engines, Data, Instruments, Visualization
16
17
CI Commons
Goals• Commercial-grade software – stable, well-supported and well-
documented• User surveys and focus groups inform priority-setting• Development of “Commons roadmap”
Unanswered questions• What role does industry play in development and support of
products• In what timeframe will software and services be available• How will customer satisfaction be assessed and by whom• What role do standards play – and does an effective standards
process exist today
18
CI CommonsCommunity Development Approach
• End-user communities willing and able to modify code• Adds features, repairs defects, improves code• Customizes common building blocks for domain applications• Leads to higher quality code, enhances diversity• Natural way to set prioritiesRequires• Education, training in community development methodologies• Effective Commons governance plan• Strong, sustained interaction between Commons developers
and community code enhancers
19
Challenging Context
• Cyberinfrastructure Ecology– Technological change more rapid than institutional
change– Disruptive technology promises unforeseen opportunity
• Seamless Integration of New and Old– Balancing upgrades of existing and creation of new
resources– Legacy instruments, models, data, methodologies
• Broadening Participation• Community-Building• Requires Effective Migration Strategy
20
21
Kelvin Droegemeier, Center for Analysis and Prediction of Storms (CAPS) University of Oklahoma
On-Demand: Severe Weather Forecasting
Several times a week, need multiple hours dedicated access to amulti-Teraflops system.
22
On Demand: Brain Data Grid
DukeUCLA
Cal Tech
StanfordU. Of MN
Harvard
NCRR Imaging and Computing Resources UCSD
Cal-(IT)2SDSC
Deep Web
Surface Web
Cyberinfrastructure Linking Tele-instrumentation, Data Intensive Computing, and Multi-scale Brain Databases. Wireless “Pad”
Web Interface
Objective: Form a National Scale Testbed for Federating Large Databases Using NIH High Field NMR Centers
Mark Ellisman, Larry Smarr, UCSD
23
• membrane potential (s)
• bath concentration (s) – Inside / Outside
• bath diffusion constant
• channel diffusion constant
• time step-size
• number of time steps
• channel diameter
• channel length
• force profile
• ion trajectory
• ion type
•temperature
• ionic strength
• protein dielectric
• water dielectric
• protein 3-d structure coordinates
• technical specifications *
• partial charges of titratable residues • pH of bath
• interaction potentials between titratable groups in protein
• temperature
• ionic strength
• protein dielectric
• water dielectric
• protein 3-d struct coord
• technical specifications *
Related by sampling method used for calculation of diffusion constant
in MD simulations
Hole Profile analysis
• protein 3-d structure coordinates
• one position in channel
• approximate channel direction
• technical specifications***
• protein/lipid 3-d struct coord and topology
• force field sets
• ion-water ratio
• ion type/initial positions
• simulation time step-size
• simulation methodology specifications **
Hole AnalysisHole Analysis
Electrostatics - IElectrostatics - I
Molecular DynamicsMolecular Dynamics
Ele
ctro
stat
ics
- II
Ele
ctro
stat
ics
- II
Bro
wni
an D
ynam
ics
Bro
wni
an D
ynam
ics
UserUser
Web PortalWeb Portal
TeraGrid ResourcesTeraGrid Resources
DataData WorkflowManager
WorkflowManager
Globus Client
Globus Client
Molecular Biology Simulation
Eric Jakobsson, UIUC