Grid Challenges Grid Challenges It’s the It’s the vision vision , stupid , stupid …but it NEEDS TO be followed by …but it NEEDS TO be followed by operational standards operational standards based on based on real applications real applications … … The Global Grid Forum The Global Grid Forum 25 June 2003 25 June 2003 Gordon Bell Gordon Bell Microsoft Corporation Microsoft Corporation
43
Embed
Grid Challenges It’s the vision, stupid …but it NEEDS TO be followed by operational standards based on real applications… The Global Grid Forum 25 June.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Grid ChallengesGrid ChallengesIt’s the It’s the visionvision, stupid, stupid…but it NEEDS TO be followed …but it NEEDS TO be followed by by operational standardsoperational standards based on based on real applicationsreal applications……The Global Grid ForumThe Global Grid Forum25 June 200325 June 2003Gordon BellGordon BellMicrosoft CorporationMicrosoft Corporation
A quick look at A quick look at some past visionssome past visionsand a challengeand a challenge
NREN >> InternetNREN >> Internet
WWWWWW
Challenge: Will match any Grid enabled Challenge: Will match any Grid enabled application that wins a Gordon Bell application that wins a Gordon Bell Prize for parallelismPrize for parallelism
3
FCCSET NREN Plan 11/1987
1988 1990 1992 1994 1996 1998 2000
10G-
1G-
100M-
10M-
1M-
100K-
10K-
56K
1.5 M
Phase 1
45 M
Phase 2
3 G
Optical
a factor of 1000 makes a difference
4
Originating Bandwidth (Gb/s)U.S. Interstate Comm. traffic L Roberts ’92
ARPAnet Goals c1972 = Grid Goals10,000-
1,000-
100-
10-
1-
|1990 | |2000 | |2010 | |2020
Broadcast TV
Email
Voice
FAX
Video on Demand
Video Conf.
NSF bb•
5
Growth in hype vs reality
Infoway speculation“how great it’ll be” (politicians , telecoms & futurists)
Infoway regulation
conferences
WWW
Infoway addiction
lawsuits
c 1995 Data from Gordon’s WAG
books,newspapers
6
Articles per newspaper versusorders per second sent via Internet
orders per second
articles per newspaper
c 1995 Data from Gordon’s WAG
7
Articles about security, privacy, & fraud versus commerce ($M)
Progress...a review Progress...a review Grid started out with great promise…c1998Grid started out with great promise…c1998Interesting use at NASA for coupled programsInteresting use at NASA for coupled programs
NMI (National Middleware Infrastructure)NMI (National Middleware Infrastructure)…State_Tools.gov, funded by NSF.gov…State_Tools.gov, funded by NSF.govclearly open, clearly not “free” clearly open, clearly not “free” notnot IETF model IETF model
Tools vs. standards & evolving working codeTools vs. standards & evolving working code
Some examples: Some examples: C1980: Seti@home, folding@home, >> Napster p2pC1980: Seti@home, folding@home, >> Napster p2p
New Feature RequestsNew Feature Requests Programmable access to meta-dataProgrammable access to meta-data User selectable image sizes, i.e. “a map server”User selectable image sizes, i.e. “a map server” Permission to use TerraServer data within server Permission to use TerraServer data within server
NSF ITR project, “Building the Framework for the National Virtual Observatory” is a collaboration of 17 funded and 3 unfunded organizations
Astronomy data centersNational observatoriesSupercomputer centersUniversity departmentsComputer science/information technology specialists
PI and project director: Alex Szalay (JHU)CoPI: Roy Williams (Caltech/CACR)
Scientific Data Exploration
1. Thousand years ago: science was empirical describing natural phenomena
2. Last few hundred years: theoretical branch using models, generalizations
3. Last few decades: a computational branch simulating complex phenomena
4. Today: data exploration is emerging synthesizing theory, experiment and computation with advanced data management and statistics
Living in an Exponential World
Astronomers have a few hundred TB now1 pixel (byte) / sq arc second ~ 4TBMulti-spectral, temporal, … → 1PB
They mine it looking for new (kinds of) objects, more of interesting ones (quasars), density variations in 400-D space, correlations in 400-D space
Data doubles every yearData is public after 1 yearSo, 50% of the data is publicSame trend appears in all sciences
Why Is Astronomy Special?
It has no commercial value No privacy concerns, freely share results with others Great for experimenting with algorithms
It is real and well documented High-dimensional (with confidence intervals) Spatial, temporal
Diverse and distributed Many different instruments from
many different places and many different times
The questions are interesting There is a lot of it (soon petabytes)GB: It is not over-funded aka it’s poor
IRAS 100
ROSAT ~keV
DSS Optical
2MASS 2
IRAS 25
NVSS 20cm
WENSS 92cm
GB 6cm
Making Discoveries
When and where are discoveries made?Always at the edges and boundariesGoing deeper, collecting more data, using more colors….
Metcalfe’s lawUtility of computer networks grows as the number of possible connections: O(N2)
VO: Federation of N archivesPossibilities for new discoveries grow as O(N2)
Current sky surveys have proven thisVery early discoveries from SDSS, 2MASS, DPOSS
What can be learned from Sky Server?
It’s about data, not about harvesting flops1-2 hr. query programs versus 1 wk programs based on grep10 minute runs versus 3 day compute & searchesDatabase viewpoint. 100x speed-ups
Avoid costly re-computation and searchesUse indices and PARALLEL I/O. Read / Write >>1. Parallelism is automatic, transparent, and just depends on the number of computers/disks.
Limited experience and talent to use dbases.
Soon: The Virtual Observatory
Many new surveys are comingSDSS is a dry run for the next onesLSST will be 5TB/night
All the data will be on the Internetftp, web services…
Data and applications will be associated with the instruments
Distributed world wide, cross-indexedFederation is a must
Will be the best telescope in the worldWorld Wide Telescope
Finds the “needle in the haystack”Successful demonstrations in Jan’03
Emerging Concepts
Standardizing distributed data accessWeb Services, supported on all platformsXML: Extensible Markup LanguageSOAP: Simple Object Access ProtocolWSDL: Web Services Description Language
Standardizing distributed computingGrid ServicesCustom configure remote computing dynamicallyBuild your own remote computer, and discardVirtual Data: new data sets on demand
Both needed for Data Exploration
Computational Science Computational Science Simulations based on Simulations based on Web ServicesWeb Services
Gerd HeberGerd HeberCornell Theory CenterCornell Theory [email protected]@tc.cornell.edu
International Conference on Computational Science 2003International Conference on Computational Science 2003
Three Flavors of AdaptivityThree Flavors of Adaptivity
Application-levelApplication-level Mathematical modelMathematical model High/low confidenceHigh/low confidence
International Conference on Computational Science 2003International Conference on Computational Science 2003
The ProblemThe Problem Do Do distributed,distributed, coupledcoupled and and adaptiveadaptive
multi-physics simulations ofmulti-physics simulations of Mechanics of chemically-reacting flowsMechanics of chemically-reacting flows (Damage) Thermo-Mechanics of solids(Damage) Thermo-Mechanics of solids
ComponentsComponents provided as provided as Web ServicesWeb Services
International Conference on Computational Science 2003International Conference on Computational Science 2003
International Conference on Computational Science 2003International Conference on Computational Science 2003
GeographyGeography
Cornell UniversityCornell University Theory CenterTheory Center Department of Computer ScienceDepartment of Computer Science Department of Civil EngineeringDepartment of Civil Engineering
University of AlabamaUniversity of Alabama Mississippi State UniversityMississippi State University College of William and MaryCollege of William and Mary
International Conference on Computational Science 2003International Conference on Computational Science 2003
WorkflowWorkflow
International Conference on Computational Science 2003International Conference on Computational Science 2003
International Conference on Computational Science 2003International Conference on Computational Science 2003
Web ServicesWeb Services
““Web Services are self-contained, Web Services are self-contained, modular applications that can be modular applications that can be described, published, located, and described, published, located, and invoked over a network, …” invoked over a network, …” (IBM)(IBM)
Service oriented architecture: Service oriented architecture: Publish, Publish, find, bindfind, bind
XML, SOAP, UDDI, WSDLXML, SOAP, UDDI, WSDL
International Conference on Computational Science 2003International Conference on Computational Science 2003
Features and RequirementsFeatures and Requirements Distributed expertiseDistributed expertise
No portingNo porting Network accessibility (“firewall Network accessibility (“firewall
compliant”)compliant”) Platform and language neutralityPlatform and language neutrality
SecuritySecurity Industry standardsIndustry standards MetadataMetadata StateState Students shouldn’t waste too much Students shouldn’t waste too much
time with coding!time with coding!
GrADS Vision
• Build a National Problem-Solving System on the Grid—Transparent to the user, who sees a problem-solving
system
• Software Support for Application Development on Grids—Goal: Design and build programming systems for the Grid
that broaden the community of users who can develop and run applications in this complex environment
• Challenges:—Presenting a high-level application development interface*
– If programming is hard, the Grid will not not reach its potential
—Designing and constructing applications for adaptability—Late mapping of applications to Grid resources—Monitoring and control of performance
– When should the application be interrupted and remapped?
*GB note: This is a superset of the previously unsolved clusters programming problem!
GrADSoft Architecture
Whole-ProgramCompiler
LibrariesBinder
Real-timePerformance
Monitor
PerformanceProblem
ResourceNegotiator
Scheduler
GridRuntimeSystem
SourceAppli-cation
Config-urableObject
Program
SoftwareComponents
Performance Feedback
Negotiation
From Carl Kesselman, ISI
Network for Earthquake Eng. Simulation
NEESgrid: US national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
On-demand access to experiments, data streams, computing, archives, collaboration
Goal: Enable technical communities to create and take Goal: Enable technical communities to create and take responsibility for their own computing environments responsibility for their own computing environments of personal, data, and program collaboration and of personal, data, and program collaboration and distributiondistributionDesign based on technology and cost, e.g. networking, Design based on technology and cost, e.g. networking, apps programs maintenance, databases, and providing apps programs maintenance, databases, and providing 24x7 web and other services24x7 web and other servicesMany alternative styles and locations are possibleMany alternative styles and locations are possible
Service from existing centers, including many state centersService from existing centers, including many state centersSoftware vendors could be encouraged to supply apps web Software vendors could be encouraged to supply apps web servicesservicesNCAR style center based on shared data and appsNCAR style center based on shared data and appsInstrument- and model-based databases. Both central & Instrument- and model-based databases. Both central & distributed when multiple viewpoints create the whole.distributed when multiple viewpoints create the whole.Wholly distributed services supplied by many individual Wholly distributed services supplied by many individual groupsgroups
Community/Data Centric: “web service”Community/Data Centric: “web service”Community is responsibleCommunity is responsible
Planned & budget as resourcesPlanned & budget as resourcesResponsible for its infrastructureResponsible for its infrastructureApps are from communityApps are from communityComputing is integral to workComputing is integral to work
In sync with technologiesIn sync with technologies1-3 Tflops/$M; 1-3 PBytes/$M 1-3 Tflops/$M; 1-3 PBytes/$M to buy smallish Tflops & PBytes.to buy smallish Tflops & PBytes.
New scalables are “centers” New scalables are “centers” Community can afford and evolveCommunity can afford and evolveDedicated to a communityDedicated to a communityProgram, data & database centricProgram, data & database centricMay be aligned with instruments or other community May be aligned with instruments or other community activitiesactivities
Output = web service; Output = web service; Can communities form Can communities form that can supply servicesthat can supply services??
Commitment to standardsCommitment to standards
A general architecture comes much from A general architecture comes much from understanding the problemsunderstanding the problems
Understanding the problems comes from Understanding the problems comes from actually solving such problemsactually solving such problems
This is bottom-up, based on experience This is bottom-up, based on experience
Microsoft is committed to develop Microsoft is committed to develop community-wide web services community-wide web services standards…standards…
Is the Grid Forum equally committed?Is the Grid Forum equally committed?
The End
How can GRIDs become a real, useful, computer structure?
Get a life.Use the standards and tools. Adopt an application and/or community…now!