Grid Computing: Concepts, Applications, and Technologies Dheeraj Bhardwaj Department of Computer Science and Engineering Indian Institute of Technology, Delhi
Dec 23, 2015
Grid Computing:Concepts, Applications, and
Technologies
Dheeraj BhardwajDepartment of Computer Science and Engineering
Indian Institute of Technology, Delhi
[email protected] IIT DELHI
2
Outline
The technology landscape Grid computing The Globus Toolkit Applications and technologies
– Data-intensive; distributed computing; collaborative; remote access to facilities
Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions
[email protected] IIT DELHI
3
Outline
The technology landscape Grid computing The Globus Toolkit Applications and technologies
– Data-intensive; distributed computing; collaborative; remote access to facilities
Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions
[email protected] IIT DELHI
4Living in an Exponential World
(1) Computing & Sensors
Moore’s Law: transistor count doubles each 18 months
Magnetohydro-dynamics
star formation
[email protected] IIT DELHI
5
Living in an Exponential World:(2) Storage
Storage density doubles every 12 months Dramatic growth in online data (1 petabyte =
1000 terabyte = 1,000,000 gigabyte)– 2000 ~0.5 petabyte
– 2005 ~10 petabytes
– 2010 ~100 petabytes
– 2015 ~1000 petabytes? Transforming entire disciplines in physical and,
increasingly, biological sciences; humanities next?
[email protected] IIT DELHI
6
Data Intensive Physical Sciences
High energy & nuclear physics– Including new experiments at CERN
Gravity wave searches– LIGO, GEO, VIRGO
Time-dependent 3-D systems (simulation, data)– Earth Observation, climate modeling
– Geophysics, earthquake modeling
– Fluids, aerodynamic design
– Pollutant dispersal scenarios Astronomy: Digital sky surveys
[email protected] IIT DELHI
7
Ongoing Astronomical Mega-Surveys
Large number of new surveys– Multi-TB in size, 100M objects or larger
– In databases
– Individual archives planned and under way Multi-wavelength view of the sky
– > 13 wavelength coverage within 5 years Impressive early discoveries
– Finding exotic objects by unusual colors> L,T dwarfs, high redshift quasars
– Finding objects by time variability> Gravitational micro-lensing
MACHO2MASSSDSSDPOSSGSC-IICOBE MAPNVSSFIRSTGALEXROSATOGLE...
MACHO2MASSSDSSDPOSSGSC-IICOBE MAPNVSSFIRSTGALEXROSATOGLE...
[email protected] IIT DELHI
8
Coming Floods of Astronomy Data
The planned Large Synoptic Survey Telescope will produce over 10 petabytes per year by 2008!– All-sky survey every few days, so will have
fine-grain time series for the first time
[email protected] IIT DELHI
9Data Intensive Biology and Medicine
Medical data– X-Ray, mammography data, etc. (many petabytes)
– Digitizing patient records (ditto) X-ray crystallography Molecular genomics and related disciplines
– Human Genome, other genome databases
– Proteomics (protein structure, activities, …)
– Protein interactions, drug delivery Virtual Population Laboratory (proposed)
– Simulate likely spread of disease outbreaks Brain scans (3-D, time dependent)
[email protected] IIT DELHI
10
And comparisons must bemade among many
We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns – Grids will help get us there and further
A Brainis a Lotof Data!
(Mark Ellisman, UCSD)
[email protected] IIT DELHI
11An Exponential World: (3) Networks
(Or, Coefficients Matter …) Network vs. computer performance
– Computer speed doubles every 18 months
– Network speed doubles every 9 months
– Difference = order of magnitude per 5 years 1986 to 2000
– Computers: x 500
– Networks: x 340,000 2001 to 2010
– Computers: x 60
– Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
[email protected] IIT DELHI
12
Outline
The technology landscape Grid computing The Globus Toolkit Applications and technologies
– Data-intensive; distributed computing; collaborative; remote access to facilities
Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions
[email protected] IIT DELHI
13
Evolution of the Scientific Process
Pre-electronic– Theorize &/or experiment, alone or in small
teams; publish paper Post-electronic
– Construct and mine very large databases of observational or simulation data
– Develop computer simulations & analyses
– Exchange information quasi-instantaneously within large, distributed, multidisciplinary teams
[email protected] IIT DELHI
14
Evolution of Business
Pre-Internet– Central corporate data processing facility
– Business processes not compute-oriented Post-Internet
– Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B)
– Outsourcing becomes feasible => service providers of various sorts
– Business processes increasingly computing- and data-rich
[email protected] IIT DELHI
15
The Grid
“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
[email protected] IIT DELHI
16
A ComparisonSERIAL
Fetch/Store
Compute
PARALLEL
Fetch/Store
Compute/ communicate
Cooperative game
GRID
Fetch/Store
Discovery of Resources
Interaction with remote application
Authentication / Authorization
Security
Compute/Communicate
Etc
[email protected] IIT DELHI
17
A ComparisonSERIAL
Fetch/Store
Compute
PARALLEL
Fetch/Store
Compute/ communicate
Cooperative game
GRID
Fetch/Store
Discovery of Resources
Interaction with remote application
Authentication / Authorization
Security
Compute/Communicate
Etc
[email protected] IIT DELHI
18
Distributed Computing vs. GRID
Grid is an evolution of distributed computing– Dynamic– Geographically independent – Built around standards– Internet backbone
Distributed computing is an “older term”– Typically built around proprietary
software and network– Tightly couples systems/organization
[email protected] IIT DELHI
19Web vs. GRID
Web– Uniform naming access to documents
Grid - Uniform, high performance access to computational resources
Colleges/R&D Labs
Software Catalogs Sensor
nets
http://
http://
[email protected] IIT DELHI
20Is the World Wide
Web a Grid ? Seamless naming? Yes Uniform security and Authentication? No Information Service? Yes or No Co-Scheduling? No Accounting & Authorization ? No User Services? No Event Services? No Is the Browser a Global Shell ? No
[email protected] IIT DELHI
21
What does the World Wide Web bring to the Grid ?
Uniform Naming A seamless, scalable information
service A powerful new meta-data language:
XML– XML will be standard language for
describing information in the grid– SOAP – simple object access protocol
> Uses XML for encoding. HTML for protocol
– SOAP may become a standard RPC mechanism for Grid services
> Uses XML for encoding. HTML for protocol
Portal Ideas
[email protected] IIT DELHI
22
The Ultimate Goal
In future I will not know or care where my application will be executed as I will acquire and pay to use these resources as I need them
[email protected] IIT DELHI
23
Why Grids? Large-scale science and engineering are done
through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed.
The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and Engineering.
[email protected] IIT DELHI
24
An Example Virtual Organization: CERN’s Large Hadron Collider
1800 Physicists, 150 Institutes, 32 Countries
100 PB of data by 2010; 50,000 CPUs?
[email protected] IIT DELHI
25Grid Communities & Applications:Data Grids for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
www.griphyn.org www.ppdg.net www.eu-datagrid.org
[email protected] IIT DELHI
26Intelligent Infrastructure:Distributed Servers and Services
[email protected] IIT DELHI
28
Early 90s– Gigabit testbeds, metacomputing
Mid to late 90s– Early experiments (e.g., I-WAY), academic software
projects (e.g., Globus, Legion), application experiments 2002
– Dozens of application communities & projects– Major infrastructure deployments– Significant technology base (esp. Globus ToolkitTM)– Growing industrial interest – Global Grid Forum: ~500 people, 20+ countries
The Grid:A Brief History
[email protected] IIT DELHI
33
The Grid World: Current Status Dozens of major Grid projects in scientific &
technical computing/research & education– www.mcs.anl.gov/~foster/grid-projects
Considerable consensus on key concepts and technologies– Open source Globus Toolkit™ a de facto standard for
major protocols & services Industrial interest emerging rapidly
– IBM, Platform, Microsoft, Sun, Compaq, … Opportunity: convergence of eScience and
eBusiness requirements & technologies
[email protected] IIT DELHI
34
Outline
The technology landscape Grid computing The Globus Toolkit Applications and technologies
– Data-intensive; distributed computing; collaborative; remote access to facilities
Grid infrastructure Open Grid Services Architecture Global Grid Forum Summary and conclusions
[email protected] IIT DELHI
35Grid Technologies:
Resource Sharing Mechanisms That …
Address security and policy concerns of resource owners and users
Are flexible enough to deal with many resource types and sharing modalities
Scale to large number of resources, many participants, many program components
Operate efficiently when dealing with large amounts of data & computation
[email protected] IIT DELHI
36
Aspects of the Problem
1) Need for interoperability when different groups want to share resources– Diverse components, policies, mechanisms
– E.g., standard notions of identity, means of communication, resource descriptions
2) Need for shared infrastructure services to avoid repeated development, installation– E.g., one port/service/protocol for remote access to
computing, not one per tool/appln
– E.g., Certificate Authorities: expensive to run A common need for protocols & services
[email protected] IIT DELHI
37
The Hourglass Model
Focus on architecture issues– Propose set of core services
as basic infrastructure– Use to construct high-level,
domain-specific solutions Design principles
– Keep participation cost low– Enable local control– Support for adaptation– “IP hourglass” model
Diverse global services
Coreservices
Local OS
A p p l i c a t i o n s
[email protected] IIT DELHI
38
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
[email protected] IIT DELHI
39
Globus Toolkit™
A software toolkit addressing key technical problems in the development of Grid-enabled tools, services, and applications– Offer a modular set of orthogonal services
– Enable incremental development of grid-enabled tools and applications
– Implement standard Grid protocols and APIs
– Available under liberal open source license
– Large community of developers & users
– Commercial support
[email protected] IIT DELHI
40
General Approach
Define Grid protocols & APIs– Protocol-mediated access to remote resources
– Integrate and extend existing standards
– “On the Grid” = speak “Intergrid” protocols Develop a reference implementation
– Open source Globus Toolkit
– Client and server SDKs, services, tools, etc. Grid-enable wide variety of tools
– Globus Toolkit, FTP, SSH, Condor, SRB, MPI, … Learn through deployment and applications
[email protected] IIT DELHI
41
Key Protocols
The Globus Toolkit™ centers around four key protocols– Connectivity layer:
> Security: Grid Security Infrastructure (GSI)
– Resource layer:> Resource Management: Grid Resource Allocation Management
(GRAM)
> Information Services: Grid Resource Information Protocol (GRIP) and Index Information Protocol (GIIP)
> Data Transfer: Grid File Transfer Protocol (GridFTP)
Also key collective layer protocols– Info Services, Replica Management, etc.