1 Computational Grids: Current Status and Future Directions Rick Stevens Director, Mathematics and Computer Science Division Argonne National Laboratory Professor, Department of Computer Science Director, Computation Institute The University of Chicago Outline • Brief History of Grid Computing • Overview of some US based grid projects • The NSF supported TeraGrid project • Global Grid Forum • Access Grid and Collaboration Technologies • The Emerging BioGrid • An Invitation • Conclusions
37
Embed
Computational Grids: Current Status and Future Directions · Computational Grids: Current Status and Future Directions Rick Stevens Director, Mathematics and Computer Science Division
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Computational Grids:Current Status and Future Directions
Rick StevensDirector, Mathematics and Computer Science Division
Argonne National Laboratory
Professor, Department of Computer ScienceDirector, Computation Institute
The University of Chicago
Outline
• Brief History of Grid Computing• Overview of some US based grid projects• The NSF supported TeraGrid project• Global Grid Forum• Access Grid and Collaboration Technologies• The Emerging BioGrid• An Invitation• Conclusions
2
A Brief History of Grid Computing:I-WAY and the Value of Stunt Science
“Internet” circa 1969 Internet circa 1999
“I-WAY”
20021995
Evolution of the Grid Concept
• Metacomputing: late 80s• Focus on distributed computation
• Gigabit testbeds: early 90s• Research, primarily on networking
• I-WAY: 1995• Demonstration of application feasibility
• PACIs (National Technology Grid): 1997• NASA Information Power Grid: 1999• ASCI DISCOM: 1999 • DOE Science Grid: 2001• Commercial Startups
• Applied Metasystems, Entropia, Distributed Science, etc.
Applications
Apps Tools
Middleware
Networking
Resources
3
SIGGRAPH 89Science by Satellite
“Using satellite technology…demo ofWhat It might be like to have high-speed fiber-optic links between advanced computers in two different geographic locations.”―――― Al Gore, Senator
Chair, US Senate Subcommittee on Science, Technology and Space
“What we really have to do is eliminate distance between individuals who want to interact with other people and with other computers.”―――― Larry Smarr, Director
National Center for Supercomputing Applications, UIUC
SIGGRAPH 92 in Chicago Showcase: Science in the 21st Century
“From the atom to the Universe…it’s all here. Three dozen projects can now come through the network and appear to be in McCormick Place…Simulations on remote supercomputers or created from data gathered from far away instruments, these visualizations demonstrate the power of distributed computing, doing computing where the resources are and not necessarily on a single machine.”―――― Larry Smarr, Director, National Center for Supercomputing Applications, UIUC
“We have to develop the technology and techniques――――and the sociology――――to go along with group activities.”―――― Sid Karin, Director, San Diego Supercomputer Center, UCSD
UCSD NCMIRin San Diego
UCSD NCMIRin Chicago
UCSD National Center for Microscopy and Imaging Research (NCMIR) http://www-ncmir.ucsd.eduUIC
4
SIGGRAPH 92 in Chicago Showcase: Science in the 21st Century
“VR is a mode of scientific visualization. It’s something that lets you get inside of the data. Now, with most computer screens you’re outside looking in. In this, you’re inside looking out.”―――― Tom DeFanti, Director, Electronic Visualization Laboratory, UIC
“In a few years, the network is the computer…It doesn’t matter where your supercomputer is, your data resources, your sensors, scanners or satellite data. It can come from anywhere, be stored anywhere, but you can access it, at your fingertips, on your desktop.”―――― Maxine Brown, Associate Director, Electronic Visualization Laboratory, UIC
“It’s the real start of humans being able to immerse themselves inside the brains of computers――――seeing what the computers are seeing.”―――― Larry Smarr, Director,
National Center for Supercomputing Applications, UIUC
“See things you’ve never seen before.”―――― Tom DeFanti, Director, Electronic Visualization Laboratory, UIC
“Virtual prototyping of new products, from small to large.”―――― Rick Stevens, Director, Math and Computer
Science Division, Argonne National Lab
“Next year――――SuperVROOM…Get rid of the Machine Farm and put gigabit networks in place to talk to computers at remote sites――――awhole new level of interaction and communication.”―――― Maxine Brown, Associate Director,
The Internet evolved from ARPAnet, a research network built in 1969 that primarily was a communications tool of the research community until the invention of the World Wide Web-and later Mosaic-that opened it up to the wider community.
I-WAY creators Larry Smarr, Rick Stevens and Tom DeFanti, believe the next great wave of evolution on the Internet will be unleashed by I-WAY.
Supercomputing 95I-WAY: Information Wide Area Year
UIC
“We definitely pushed the envelope. There’s a whole community of people now who have a different way of thinking about how to do science and how to do visualization, and have been a part of an experience that will guide or influence how they think about science over the next few years.”―――― Rick Stevens, Director, Math and Computer Science Division, Argonne National Lab
“I-PoP machines uniformly configure gateways to supercomputers. I-Soft software creates a necessary standard operating environment.”―――― Ian Foster, Associate Director, Math and Computer Science
Division, Argonne National Laboratory
“One of the reasons we’ve been working on virtual-reality technology is because it’s an excellent test for this sort of technology. We need the supercomputers to give us the realism and the simulations, and we need the high-speed networks to give us the feel of telepresence――――of being somewhere else.”―――― Tom DeFanti, Director, Electronic Visualization Lab, UIC
“VR is an intelligent user interface into the whole electronic superhighway. How are people going to talk to computers in the future?”―――― Maxine Brown, Associate Director, Electronic Visualization Lab, UIC
6
The Emerging Concept of a Computational Grid
Prototyping America’s 21st Century Information InfrastructureThe NSF PACI National Technology Grid Prototype, 1997
7
iGrid 1998 at SC’98November 7-13, 1998, Orlando, Florida, USA
• 22 demonstrations featured technical innovations and application advancementsrequiring high-speed networks, with emphasis on remote instrumentation control, tele-immersion, real-time client server systems, multimedia, tele-teaching, digital video, distributed computing, and high-throughput, high-priority data transfers
www.startap.net/igrid98
UIC
iGrid 2000 at INET 2000July 18-21, 2000, Yokohama, Japan
• 14 regions: Canada, CERN, Germany, Greece, Japan, Korea, Mexico, Netherlands, Singapore, Spain, Sweden, Taiwan, United Kingdom, USA
• 24 demonstrations featuring technical innovations in tele-immersion, large datasets, distributed computing, remote instrumentation, collaboration, streaming media, human/computer interfaces, digital video and high-definition television, and grid architecture development, and application advancements in science, engineering, cultural heritage, distance education, media communications, and art and architecture
www.startap.net/igrid2000
UIC
8
Proposed iGrid 2002 Demonstrations • To date, 14 countries/locations proposing 28 demonstrations: Canada, CERN, France,
Germany, Greece, Italy, Japan, The Netherlands, Singapore, Spain, Sweden, Taiwan, United Kingdom, United States
• Applications to be demonstrated: art, bioinformatics, chemistry, cosmology, cultural heritage, education, high-definition media streaming, manufacturing medicine, neuroscience, physics, tele-science
• Grid technologies to be demonstrated: Major emphasis on grid middleware, data management grids, data replication grids, visualization grids, data/visualization grids, computational grids, access grids, grid portals
• Other technologies to be demonstrated: optical networks as a data storage medium, logistical networking
iGrid 2002September 24-26, 2002, Amsterdam, The Netherlands
www.startap.net/igrid2002UIC
Overview of Some US Based Grid ProjectsGriPhyn, NVO/BIMA, iVDGL, NEESgrid, etc.
Sloan DSS
LIGO
ATLAS and CMS
ALMA
9
BIMA Image Pipeline
• Data is transferred from the telescope to NCSA in real time• Data is ingested into the BIMA Data Archive automatically• Astronomers use Web front-end to search, browse, and retrieve data• Raw data is automatically processed by the pipeline using AIPS++
• Use Grid technologies to distribute the processing
Radio Astronomy Imaging
Web Interface
BIMAData Archive
BIMAImage Pipeline
AIPS++
The Grid
Pipeline ComponentsRadio Astronomy Imaging
Archive System
Event ServerIngestEngine
Script Generator
Queue Manager
DataManager
Signal the arrivalof new data
Determine what processing can take place.
Match data to processing recipes
Submit and monitorjobs on multiple platforms
Serial ParallelSM
ParallelCluster
4-processorLinux box Origin 2000 NCSA Linux
Clusters
10
NEES Grid
• Network For Earthquake Engineering Simulation (NEES)• Integrate Seamless Testing And Simulation• Enable New Earthquake Hazard Mitigation
• Structural, Geotechnical And Tsunami• Collaboratory Integration
• Physical Testing Sites• Simulation Codes And Data Stores• Research Engineers And Practitioners
• National IT Infrastructure• Computational Facilities And Networks Via “Middleware”• Telepresence (Remote Observation/Operation)• Data Sharing• Numerical Simulation And Modeling
• NCSA And Civil Engineering Leadership• ANL Collaboratory And Grid Leadership
• Two Widely Separated U.S. Detectors• Hanford, WA And Livingston, LA
• Challenges• Small Signal-to-noise Ratio
• Length Changes Less Than 1000th A Nucleus Diameter
• Extremely Rare Events• Less Than One Per Year
• Leaders• Caltech And MIT
Particle Physics Grid
• New CERN particle detectors• Over 1 petabyte (PB) of data per year• Rare events from decay of massive new particles
• CMS and ATLAS experiments at CERN• CMS (Compact Muon Solenoid)• ATLAS (Toroidal LHC ApparatuS)
CERN Simulated Higgs Decay
14
Grid Middleware [toolkits for building Grids]
• PKI Based Security Infrastructure• Distributed Directory Services• Reservations Services• Meta-Scheduling and Co-Scheduling• Quality of Service Interfaces• Grid Policy and Brokering Services• Common I/O and Data Transport Services• Meta-Accounting and Allocation Services
The NSF supported TeraGrid projectPrototyping cyberinfrastructure for the future• Computational Grid integrating computing
environments at• SDSC, NCSA, Caltech, Argonne and PSC (via ETF)
• Common operating environment based on Globus• Open Grid Services Architecture
• Security, data services, scheduling, directories, etc.• Applications oriented services
• Data access, Visualization, On Demand Access, Compute
• GigE to everything• Fibrechannel to everything• Myrinet to everything
18
0.1 sec/file250+ MB/s
NVO and TeraGrid
1. FilterRawData
3. Retrieve,Analyze data
4. Outputimage
MCAT (update indexes)
� Significant reduction in time to ingest data; less than 5 days� Web-based query interface for on-line access to all data� Support for multiple surveys; distributed join of data across surveys� Storage and Network works together
With TeraGrid:
1. Filter
SRB Containers (store sorted, processed data
into HPSS)
RawData 3. Retrieve,
Analyze data4. Output
image9.5TB raw data5 million files
MCAT (update indexes)
16MB/s
1 sec/file
� 60 days to ingest (sort, index, store) data� Restricted web-based access to a maximum of 100 MB file (1 SRB container)
User Services, User Services, ManagementManagement
(GUS)(GUS) Various Various other other WGsWGs
Grid AppsGrid Apps
CollaborationCollaborationToolsTools(ACE)(ACE)
Higher-Level language tools, middleware & components
21
Access Grid and Collaboration TechnologiesConnecting People and Applications via the Grid
Access Grid � Integrating Group to Group Collaboration and Visualization
AG Project Goals• Enable Group-to-Group Interaction and Collaboration
• Connecting People and Teams via the Grid• Improve the User Experience: Go Beyond Teleconferencing
• Provide a Sense of Presence• Support Natural Interaction Modalities
• Use Quality but Affordable Digital IP Based Audio/video• Leverage IP Open Source Tools
• Enable Complex Multisite Visual and Collaborative Experiences• Integrate With High-end Visualization Environments• ActiveMural, Powerwall, CAVE Family, Workbenches
• Build on Integrated Grid Services Architecture• Develop New Tools Specifically Support Group Collaboration
22
Long Distance Collaboration
Distributed Exploratory Data Analysis
23
Access Grid � Distance Visualization
ArgonneBerkeley LabLos AlamosPrincetonUniversity of IllinoisUniversity of Utah
Corridor One ArchitectureDistributing the Visualization Pipeline
DATA
Data Analy
sis
Visualization Visualiz
ationClients
DisplayEnviron
ment
24
GeoWall
• Low cost passive stereo for Geosciences research and teaching
• Working with U of Michigan, U of Minnesota, US Geological Survey, UIC Earth and Environmental Science, and others
• Also in use at SciTech Museum in Aurora and Abraham Lincoln Elementary School in Oak Park
TeraVision• Specialized hardware for streaming graphics
over GigE.• Takes VGA or DVI plug as input, digitizes it
at 1024x768 @ 30fps and streams it to remote sites for viewing.
• Can be ganged and synched to stream an entire tiled display.
• Demo at IGrid streams from Greece to Amsterdam and EVL to Amsterdam.
www.evl.uic.edu/cavern/teranode
25
The Continuum at EVL and TRECCAmplified Work Environment
Passive stereo display AccessGrid Digital white board
Tiled display
The Emerging BioGridChallenges of Biology are well Suited for Grids• EUROGRID BioGRID• Asia Pacific BioGRID• NC BioGrid• Bioinformatics Research Network• Osaka University Biogrid• Indiana University BioArchive BioGrid• Many more under development…
Baxevanis, A.D. 2002. Nucleic Acids Research 30: 1-12.
32
Software Infrastructure in Drug Discovery
TMOntologies and Domain Specific Integration
From Richard Gardner(InCellico)
Future Vision
• Theory and Computation for Systems Biology• A focus on what makes things biological
• Integrated Modeling and Prototyping Tools• A Matlab for biological modeling• Portals and interfaces to existing simulation resources
• Integrated and Federated Databases• Frameworks and schema (e.g. discovery link, AfCS)• Xchange infrastructure (e.g. SBML, CellML, etc.)
• International “BioGrids” to Support Analysis, Modeling and Simulation• Beyond genomics and molecular modeling
33
What the BioGrid Needs To Provide?• Scalable compute and data beyond that available locally
• One to two orders of magnitude improvement• Distributed infrastructure available 24x7 worldwide
• Biology is a worldwide 24 hour enterprize• Integration with local bioinformatics systems for seamless
computing and data management• Empower local biologists by dramatically extending their power
• Enables access to state of the art facilities at fraction of the cost (SPs just add more servers)• Purpose built systems and special purpose devices
• Centralized support of tools and data• Improve access to latest data
• Bottom line � enables biologists to focus on biology
What We Need to Create• Grid Bio applications enablement
software layer• Provide application’s access to
Grid services• Provides OS independent services
• Grid enabled version of bioinformatics data management tools (e.g. DL, SRS, etc.)• Need to support virtual databases
via Grid services• Grid support for commercial
databases • Bioinformatics applications “plug-
in” modules• End user tools for a variety of
domains• Support major existing Bio IT
platforms
Compute Data
BioGrid “Vocabulary”
BioGrid Applications
Grid Resources
34
BioGrid Services Model
Domain Oriented Services
Grid Resource Services
Basic BioGrid Services
• Drug Discovery• Microbial Engineering• Molecular Ecology• Oncology Research
• Integrated Databases• Sequence Analysis• Protein Interactions• Cell Simulation
• Compute Services• Pipeline Services• Data Archive Service• Database Hosting
Architecture Requirements for Biology
• Computational Biology is as Diverse as Biology itself
• Need for access to a variety of future systems
• Capacity Computing • Clusters for high-throughput support• Automation of experimental laboratories
• Capability Computing• Current: Protein science and
Bioengineering• Future: cell modeling and virtual
organisms• Data Intensive Computing
• Data mining (genomes, expression data, imaging, etc.)
• Annotation pipelines• Purpose built devices for well understood
problems• Sequence analysis, imaging and perhaps
protein folding
35
A Proposed International Systems Biology Grid
• A Data, Experiment and Simulation Grid Linking:• People [biologists, computer scientists, mathematicians, etc.]• Experimental systems [arrays, detectors, MS, MRI, EM, etc.]• Databases [data centers, curators, analysis servers]• Simulation Resources [supercomputers, visualization, desktops]• Discovery Resources [optimized search servers]• Education and Teaching Resources [classrooms, labs, etc.]
• Potentially finer grain than current Grid Projects• More laboratory integration [small laboratory interfaces]• Many participants will be experimentalists [workflow, visualization]• More diversity of data sources and databases [integration, federation]• More portals to simulation environments [ASP models]
• Global Grid Forum• Life Science Grid research group formed to investigate requirements• First meeting at GGF6 in Chicago last week
An InvitationWe wish to extend the TeraGrid to Japan!• The executive committee of the
TeraGrid project would like to propose to RIKEN to join the TeraGrid
• Interconnect high-performance computing resources in the US with those at leading Japanese research laboratories
• Deploy middleware to enable the rapid sharing of data resources and applications services
• We would also welcome discussions on the feasibility of connecting the Earth Systems Simulator to the Grid
• Sharing of climate modeling expertise• Access to large-scale climate databases
and repositories• Create new Japanese/US research
collaborations in HPC
36
STAR TAP: Enabling the International Grid
Australia
China
Japan
Korea
Singapore
Taiwan
Canada
US Networks: vBNS, Abilene, ESnet, DREN, NREN/NISN
Chile Russia
CERN
Denmark
Finland
France
Iceland
Israel
Netherlands
Norway
Sweden
www.startap.net
STAR TAP, the Interconnect for International High-Performance Networks
Distributed Terascale Facility Backplane
Vancouver
Seattle(U Wash; Microsoft Research)
Washington, DC(Multiple NASA, DOE,
other sites)
Berkeley(LBNL, UCB)
Los Angeles(Caltech, NASA JPL, ISI)
San Diego(UCSD, CalIT2, SDSC)
Chicago
CANARIE CA*Net4(Bell Canada fiber)
IndianapolisSt. LouisUrbana
NCSA/UIUC
ANL
Chicago crossconnect
UICNW Univ (Chicago) STARLIGHT Hub
Ill Inst of Tech
Univ of ChicagoIndianapolis (Abilene NOC)StL Gigapop
growth and development• Grids will enable new classes
of applications, particularly data intensive applications
• Collaboration technology will give Grids a human face
• Global Grid Forum provides needed standards for international interoperability
• Biology maybe the new driver for long-term Grid developments
Acknowledgements
• DOE, NSF, ANL, UC, Microsoft and IBM for support• John Wooley (UCSD), Mike Colvin(LLNL/DOE), Richard Gardner
(InCellico), Chris Johnson (Utah), Dan Reed (NCSA), Dick Crutcher (NCSA), Fran Berman (SDSC), Ralph Roskies (PSC), Horst Simon (NERSC), Ian Foster (ANL/UC), Larry Smarr (UCSD), Tom DeFanti (EVL/UIC) and others contributed to this talk