GRID COMPUTING Sandeep Kumar Poonia Head Of Dept. CS/IT B.E., M.Tech., UGC-NET LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
May 10, 2015
GRID COMPUTING
Sandeep Kumar PooniaHead Of Dept. CS/IT
B.E., M.Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
WHY GRID COMPUTING?
40% Mainframes are idle
90% Unix servers are idle
95% PC servers are idle
0-15% Mainframes are idle in peak-hour
70% PC servers are idle in peak-hour
Source: “Grid Computing” Dr Daron G Green
Sandeep K
um
ar P
oonia
OUTLINE
Introduction to Grid Computing
Methods of Grid computing
Grid Middleware
Grid Architecture
Sandeep K
um
ar P
oonia
Sandeep K
um
ar P
oonia
ELECTRICAL POWER GRID
ANALOGY
Electrical power
grid
users (or electrical
appliances) get access to
electricity through wall
sockets with no care or
consideration for where or
how the electricity is
actually generated.
“The power grid” links
together power plants of
many different kinds
The Grid users (or client applications) gain
access to computing resources (processors, storage, data, applications, and so on) as needed with little or no knowledge of where those resources are located or what the underlying technologies, hardware, operating system, and so on are
"the Grid" links together computing resources (PCs, workstations, servers, storage elements) and provides the mechanism needed to access them.
Sandeep Kumar Poonia
WHY NEED GRID COMPUTING?
Core networking technology now accelerates at a much
faster rate than advances in microprocessor speeds
Exploiting under utilized resources
Parallel CPU capacity
Virtual resources and virtual organizations for
collaboration
Access to additional resources
Sandeep Kumar Poonia
WHO NEEDS GRID COMPUTING?
Not just computer scientists…
scientists “hit the wall” when faced with situations: The amount of data they need is huge and the data is stored in
different institutions.
The amount of similar calculations the scientist has to do is huge.
Other areas: Government
Business
Education
Industrial design
……
LIVING IN AN EXPONENTIAL WORLD
(1) COMPUTING & SENSORS
Moore‘s Law: transistor count doubles each 18 months
Magnetohydro-dynamicsstar formation
Sandeep K
um
ar P
oonia
LIVING IN AN EXPONENTIAL WORLD:
(2) STORAGE
Storage density doubles every 12 months
Dramatic growth in online data (1 petabyte =
1000 terabyte = 1,000,000 gigabyte)
2000 ~0.5 petabyte
2005 ~10 petabytes
2010 ~100 petabytes
2015 ~1000 petabytes?
Transforming entire disciplines in physical and,
increasingly, biological sciences; humanities
next?
Sandeep K
um
ar P
oonia
DATA INTENSIVE PHYSICAL SCIENCES
High energy & nuclear physics
Including new experiments at CERN
Gravity wave searches
LIGO, GEO, VIRGO
Time-dependent 3-D systems (simulation, data)
Earth Observation, climate modeling
Geophysics, earthquake modeling
Fluids, aerodynamic design
Pollutant dispersal scenarios
Astronomy: Digital sky surveys
Sandeep K
um
ar P
oonia
ONGOING ASTRONOMICAL MEGA-SURVEYS
Large number of new surveys
Multi-TB in size, 100M objects or larger
In databases
Individual archives planned and under way
Multi-wavelength view of the sky
> 13 wavelength coverage within 5 years
Impressive early discoveries
Finding exotic objects by unusual colors
L,T dwarfs, high redshift quasars
Finding objects by time variability
Gravitational micro-lensing
MACHO
2MASS
SDSS
DPOSS
GSC-II
COBE MAP
NVSS
FIRST
GALEX
ROSAT
OGLE
...
Sandeep K
um
ar P
oonia
COMING FLOODS OF ASTRONOMY DATA
The planned Large Synoptic Survey Telescope
will produce over 10 petabytes per year by 2008!
All-sky survey every few days, so will have fine-grain
time series for the first time
Sandeep K
um
ar P
oonia
DATA INTENSIVE BIOLOGY AND MEDICINE
Medical data
X-Ray, mammography data, etc. (many petabytes)
Digitizing patient records
X-ray crystallography
Molecular genomics and related disciplines
Human Genome, other genome databases
Proteomics (protein structure, activities, …)
Protein interactions, drug delivery
Virtual Population Laboratory (proposed)
Simulate likely spread of disease outbreaks
Brain scans (3-D, time dependent)
Sandeep K
um
ar P
oonia
And comparisons must bemade among many
We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns – Grids will help get us there and further
A BRAIN
IS A LOT
OF DATA!(MARK ELLISMAN, UCSD)
Sandeep K
um
ar P
oonia
Fastest virtual supercomputersSandeep K
um
ar P
oonia
As of April 2013, Folding@home – 11.4 x86-equivalent(5.8 "native") PFLOPS.As of March 2013, BOINC – processing on average 9.2PFLOPS.As of April 2010, MilkyWay@Home computes at over1.6 PFLOPS, with a large amount of this work coming fromGPUs.As of April 2010, SETI@Home computes data averagesmore than 730 TFLOPS.As of April 2010, Einstein@Home is crunching more than210 TFLOPS.As of June 2011, GIMPS is sustaining 61 TFLOPS.
HOW GRID COMPUTING WORKS
Super computer, Big mainframe…
Idol timeIdol CPU
Idol CPUIdol time
Source: “The Evolving Computing Model: Grid Computing” Michael Teyssedre
Sandeep K
um
ar P
oonia
HOW GRID COMPUTING WORKS
Virtual machineVirtual CPU…
Idol timeIdol CPU
Idol CPUIdol time
Source: “The Evolving Computing Model: Grid Computing” Michael Teyssedre
Sandeep K
um
ar P
oonia
HOW GRID COMPUTING WORKS
Grid Computing
0% idol0% idol
0% idol0% idol
Source: “The Evolving Computing Model: Grid Computing” Michael Teyssedre
Sandeep K
um
ar P
oonia
GRID ARCHITECTURE
Autonomous, globally distributed computers/clusters
Sandeep K
um
ar P
oonia
WHAT IS A GRID? Many definitions exist in the literature
Early defs: Foster and Kesselman, 1998
―A computational grid is a hardware and software
infrastructure that provides dependable, consistent,
pervasive, and inexpensive access to high-end
computational facilities‖
Kleinrock 1969:
―We will probably see the spread of ‗computer utilities‘,
which, like present electric and telephone utilities, will
service individual homes and offices across the country.‖
Sandeep K
um
ar P
oonia
3-POINT CHECKLIST (FOSTER 2002)
1. Coordinates resources not subject to
centralized control
2. Uses standard, open, general purpose protocols
and interfaces
3. Deliver nontrivial qualities of service
• e.g., response time, throughput, availability,
security
Sandeep K
um
ar P
oonia
DEFINITION
Grid computing is…
A distributed computing system
Where a group of computers are connected
To create and work as one large virtual
computing power, storage, database, application,
and service
Sandeep K
um
ar P
oonia
DEFINITION
Grid computing…
Allows a group of computers to share the system
securely and
Optimizes their collective resources to meet
required workloads
By using open standards
Sandeep K
um
ar P
oonia
GRID COMPUTINGGrid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely coupled computers, acting in concert to perform very large tasks.
Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates the executions of large-scale resource intensive applications on geographically distributed computing resources.
Facilitates flexible, secure, coordinated large scale resource sharing among dynamic collections of individuals, institutions, and resource
Enable communities (―virtual organizations‖) to share geographically distributed resources as they pursue common goals
Ian Foster and Carl Kesselman
Sandeep K
um
ar P
oonia
A COMPARISON
SERIAL
Fetch/Store
Compute
PARALLEL
Fetch/Store
Compute/
communicate
Cooperative game
GRID
Fetch/Store
Discovery of Resources
Interaction with remote
application
Authentication /
Authorization
Security
Compute/Communicate
Etc
Sandeep K
um
ar P
oonia
DISTRIBUTED COMPUTING VS. GRID
Grid is an evolution of distributed computing
Dynamic
Geographically independent
Built around standards
Internet backbone
Distributed computing is an ―older term‖
Typically built around proprietary software and network
Tightly couples systems/organization
Sandeep K
um
ar P
oonia
WEB VS.
GRID
Web
Uniform naming access to documents
Grid - Uniform, high performance access to computational
resources
Colleges/R&D
Labs
Software
Catalogs
Sensor nets
http://
http://
Sandeep K
um
ar P
oonia
IS THE WORLD WIDE WEB A
GRID ?
Seamless naming? Yes
Uniform security and Authentication? No
Information Service? Yes or No
Co-Scheduling? No
Accounting & Authorization ? No
User Services? No
Event Services? No
Is the Browser a Global Shell ? No
Sandeep K
um
ar P
oonia
WHAT DOES THE WORLD WIDE WEB BRING TO
THE GRID ?
Uniform Naming
A seamless, scalable information service
A powerful new meta-data language: XML
XML will be standard language for
describing information in the grid
SOAP – simple object access protocol
Uses XML for encoding. HTML for protocol
SOAP may become a standard RPC
mechanism for Grid services
Uses XML for encoding. HTML for protocol
Portal Ideas
Sandeep K
um
ar P
oonia
THE ULTIMATE GOAL
In future I will not know or care
where my application will be
executed as I will acquire and pay
to use these resources as I need
them
Sandeep K
um
ar P
oonia
WHY GRIDS? Large-scale science and engineering are done
through the interaction of people, heterogeneous
computing resources, information systems, and
instruments, all of which are geographically and
organizationally dispersed.
The overall motivation for ―Grids‖ is to facilitate
the routine interactions of these resources in order
to support large-scale science and Engineering.
Sandeep K
um
ar P
oonia
AN EXAMPLE VIRTUAL ORGANIZATION:
CERN‘S LARGE HADRON COLLIDER
1800 Physicists, 150 Institutes, 32 Countries
100 PB of data by 2010; 50,000 CPUs?
Sandeep K
um
ar P
oonia
GRID COMMUNITIES & APPLICATIONS:
DATA GRIDS FOR HIGH ENERGY PHYSICS
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0
Tier 1
Tier 2
Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
www.griphyn.org www.ppdg.net www.eu-datagrid.org
Sandeep K
um
ar P
oonia
INTELLIGENT INFRASTRUCTURE:
DISTRIBUTED SERVERS AND SERVICES
Sandeep K
um
ar P
oonia
Early 90s
Gigabit testbeds, metacomputing
Mid to late 90s
Early experiments (e.g., I-WAY), academic software projects (e.g., Globus, Legion), application experiments
2002
Dozens of application communities & projects
Major infrastructure deployments
Significant technology base (esp. Globus ToolkitTM)
Growing industrial interest
Global Grid Forum: ~500 people, 20+ countries
THE GRID:
A BRIEF HISTORY
Sandeep K
um
ar P
oonia
HOW IT EVOLVES
Utility computing
Service grid
Data grid
Processing grid
VirtualizationService-orientedOpen standard
Sandeep K
um
ar P
oonia
EARLY ADOPTERS
Academic
Big science
Life science
Nuclear engineering
Simulation…
Sandeep K
um
ar P
oonia
MARKET POTENTIAL
Financial services:
risk management and compliance
Automotive:
acceleration of product development
Petroleum:
discovery of oils
Source: “Perspectives on grid: Grid computing - next-generation distributed computing" Matt Haynos, 01/27/04
Sandeep K
um
ar P
oonia
Criteria for a Grid:
Coordinates resources that are not subject to
centralized control.
Uses standard, open, general-purpose protocols
and interfaces.
Delivers nontrivial qualities of service.e.g., response time, throughput, availability, security
Benefits
Exploit Underutilized resources
Resource load Balancing
Virtualize resources across an enterpriseData Grids, Compute Grids
Enable collaboration for virtual organizations
Sandeep K
um
ar P
oonia
WHY DO WE NEED GRIDS?
Many large-scale problems cannot be solved by a
single computer
Globally distributed data and resources
Sandeep K
um
ar P
oonia
GRID APPLICATIONSData and computationally intensive applications:
This technology has been applied to computationally-intensive scientific, mathematical, and academic problemslike drug discovery, economic forecasting, seismic analysisback office data processing in support of e-commerce
A chemist may utilize hundreds of processors to screenthousands of compounds per hour.
Teams of engineers worldwide pool resources to analyze terabytes of structural data.
Meteorologists seek to visualize and analyze petabytes of climate data with enormous computational demands.
Resource sharing
Computers, storage, sensors, networks, …
Sharing always conditional: issues of trust, policy, negotiation, payment, …
Coordinated problem solving
distributed data analysis, computation, collaboration, …
Sandeep K
um
ar P
oonia
GRID TOPOLOGIES
• Intragrid
– Local grid within an organisation
– Trust based on personal contracts
• Extragrid
– Resources of a consortium of organisations
connected through a (Virtual) Private Network
– Trust based on Business to Business contracts
• Intergrid
– Global sharing of resources through the internet
– Trust based on certification
Sandeep K
um
ar P
oonia
COMPUTATIONAL GRID
―A computational grid is a hardware and software infrastructure
that provides dependable, consistent, pervasive, and inexpensive
access to high-end computational capabilities.‖
‖The Grid: Blueprint for a New Computing Infrastructure‖,
Kesselman & Foster
Example : Science Grid (US Department of Energy)
Sandeep K
um
ar P
oonia
DATA GRID
A data grid is a grid computing system that deals with data — the controlled sharing and management of large amounts of distributed data.
Data Grid is the storage component of a grid environment. Scientific and engineering applications require access to large amounts of data, and often this data is widely distributed. A data grid provides seamless access to the local or remote data required to complete compute intensive calculations.
Example :
Biomedical informatics Research Network (BIRN),
the Southern California earthquake Center (SCEC).
Sandeep K
um
ar P
oonia
BACKGROUND: RELATED
TECHNOLOGIES
Cluster computing
Peer-to-peer computing
Internet computing
Sandeep K
um
ar P
oonia
CLUSTER COMPUTING
Idea: put some PCs together and get them to
communicate
Cheaper to build than a mainframe
supercomputer
Different sizes of clusters
Scalable – can grow a cluster by adding more PCs
Sandeep K
um
ar P
oonia
CLUSTER ARCHITECTURE
Sandeep K
um
ar P
oonia
PEER-TO-PEER COMPUTING
Connect to other computers
Can access files from any computer on the
network
Allows data sharing without going through
central server
Decentralized approach also useful for Grid
Sandeep K
um
ar P
oonia
PEER TO PEER ARCHITECTURE
Sandeep K
um
ar P
oonia
METHODS OF GRID COMPUTING
Distributed Supercomputing
High-Throughput Computing
On-Demand Computing
Data-Intensive Computing
Collaborative Computing
Logistical Networking
Sandeep K
um
ar P
oonia
DISTRIBUTED SUPERCOMPUTING
Combining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer.
Tackle problems that cannot be solved on a single system.
Examples: climate modeling, computational chemistry
Challenges include: Scheduling scarce and expensive resources
Scalability of protocols and algorithms
Maintaining high levels of performance across heterogeneous systems
Sandeep K
um
ar P
oonia
HIGH-THROUGHPUT COMPUTING
Uses the grid to schedule large numbers of
loosely coupled or independent tasks, with the
goal of putting unused processor cycles to
work.
Schedule large numbers of independent tasks
Goal: exploit unused CPU cycles (e.g., from
idle workstations)
Unlike distributed computing, tasks loosely
coupled
Examples: parameter studies, cryptographic
problems
Sandeep K
um
ar P
oonia
On-Demand Computing
Uses grid capabilities to meet short-term
requirements for resources that are not
locally accessible.
Models real-time computing demands.
Use Grid capabilities to meet short-term requirements for resources that cannot conveniently be located locally
Unlike distributed computing, driven by cost-performance concerns rather than absolute performance
Dispatch expensive or specialized computations to remote servers
Sandeep K
um
ar P
oonia
COLLABORATIVE COMPUTING
Concerned primarily with enabling and
enhancing human-to-human interactions.
Enable shared use of data archives and
simulations
Applications are often structured in terms of a
virtual shared space.
Examples:
Collaborative exploration of large geophysical data sets
Challenges:
Real-time demands of interactive applications
Rich variety of interactions
Sandeep K
um
ar P
oonia
Data-Intensive Computing
The focus is on synthesizing new information
from data that is maintained in geographically
distributed repositories, digital libraries, and
databases.
Particularly useful for distributed data mining. Examples:
•High energy physics generate terabytes of distributed data, need complex queries to detect “interesting” events•Distributed analysis of Sloan Digital Sky Survey data
Sandeep K
um
ar P
oonia
LOGISTICAL NETWORKING
Logistical networks focus on exposing storage resources inside networks by optimizing the global scheduling of data transport, and data storage.
Contrasts with traditional networking, which does not explicitly model storage resources in the network.
high-level services for Grid applications
Called "logistical" because of the analogy it bears with the systems of warehouses, depots, and distribution channels.
Sandeep K
um
ar P
oonia
P2P COMPUTING VS GRID
COMPUTING
Differ in Target Communities
Grid system deals with more complex, more
powerful, more diverse and highly interconnected
set of resources than
P2P.
Sandeep K
um
ar P
oonia
A TYPICAL VIEW OF GRID
ENVIRONMENT
UserResource Broker
Grid Resources
Grid Information Service
A User sends computation or data intensive application to Global Grids in order to speed up the execution of the application.
A Resource Broker distribute the jobs in an application to the Grid resources based on user’s QoS requirements and details of available Grid resources for further executions.
Grid Resources (Cluster, PC, Supercomputer, database, instruments, etc.) in the Global Grid execute the user jobs.
Grid Information Servicesystem collects the details of the available Grid resources and passes the information to the resource broker.
Computation result
Grid application
Computational jobs
Details of Grid resources
Processed jobs
1
2
3
4
Sandeep K
um
ar P
oonia
GRID MIDDLEWARE
Grids are typically managed by grid ware -
a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance)
Software that connects other software components or applications to provide the following functions:
Run applications on suitable available resources– Brokering, Scheduling
Provide uniform, high-level access to resources– Semantic interfaces– Web Services, Service Oriented Architectures
Address inter-domain issues of security, policy, etc.– Federated Identities
Provide application-level statusmonitoring and control
Sandeep K
um
ar P
oonia
MIDDLEWARES
Globus –chicago Univ
Condor – Wisconsin Univ – High throughput computing
Legion – Virginia Univ – virtual workspaces-collaborative computing
IBP – Internet back pane – Tennesse Univ –logistical networking
NetSolve – solving scientific problems in heterogeneous env – high throughput & data intensive
Sandeep K
um
ar P
oonia
TWO KEY GRID COMPUTING GROUPS
The Globus Alliance (www.globus.org) Composed of people from:
Argonne National Labs, University of Chicago, University of Southern California Information Sciences Institute, University of Edinburgh and others.
OGSA/I standards initially proposed by the Globus Group
The Global Grid Forum (www.ggf.org) Heavy involvement of Academic Groups and Industry
(e.g. IBM Grid Computing, HP, United Devices, Oracle, UK e-Science Programme, US DOE, US NSF, Indiana University, and many others)
Process Meets three times annually Solicits involvement from industry, research groups, and
academics
Sandeep K
um
ar P
oonia
GRID USERS
Many levels of users
Grid developers
Tool developers
Application developers
End users
System administrators
Sandeep K
um
ar P
oonia
SOME GRID CHALLENGES
Data movement
Data replication
Resource management
Job submission
Sandeep K
um
ar P
oonia
SOME OF THE MAJOR GRID PROJECTS
Name URL/Sponsor Focus
EuroGrid, Grid
Interoperability
(GRIP)
eurogrid.org
European Union
Create tech for remote access to super
comp resources & simulation codes; in
GRIP, integrate with Globus Toolkit™
Fusion Collaboratory fusiongrid.org
DOE Off. Science
Create a national computational
collaboratory for fusion research
Globus Project™ globus.org
DARPA, DOE,
NSF, NASA, Msoft
Research on Grid technologies;
development and support of Globus
Toolkit™; application and deployment
GridLab gridlab.org
European Union
Grid technologies and applications
GridPP gridpp.ac.uk
U.K. eScience
Create & apply an operational grid within the
U.K. for particle physics research
Grid Research
Integration Dev. &
Support Center
grids-center.org
NSF
Integration, deployment, support of the NSF
Middleware Infrastructure for research &
education
Sandeep K
um
ar P
oonia
Sandeep K
um
ar P
oonia
Grid in India-GARUDA
•GARUDA is India's Grid Computinginitiative connecting 17 cities across thecountry.•The 45 participating institutes in thisnationwide project include all the IITs andC-DAC centers and other major institutesin India.
GLOBUS GRID TOOLKIT
Open source toolkit for building Grid systems and
applications
Enabling technology for the Grid
Share computing power, databases, and other tools securely
online
Facilities for:
Resource monitoring
Resource discovery
Resource management
Security
File management
Sandeep K
um
ar P
oonia
DATA MANAGEMENT IN GLOBUS
TOOLKIT
Data movement
GridFTP
Reliable File Transfer (RFT)
Data replication
Replica Location Service (RLS)
Data Replication Service (DRS)
Sandeep K
um
ar P
oonia
GRIDFTP High performance, secure, reliable data transfer protocol
Optimized for wide area networks
Superset of Internet FTP protocol
Features:
Multiple data channels for parallel transfers
Partial file transfers
Third party transfers
Reusable data channels
Command pipelining
Sandeep K
um
ar P
oonia
MORE GRIDFTP FEATURES
Auto tuning of parameters
Striping
Transfer data in parallel among multiple senders and
receivers instead of just one
Extended block mode
Send data in blocks
Know block size and offset
Data can arrive out of order
Allows multiple streams
Sandeep K
um
ar P
oonia
STRIPING ARCHITECTURE
Use ―Striped‖ servers
Sandeep K
um
ar P
oonia
LIMITATIONS OF GRIDFTP
Not a web service protocol (does not employ
SOAP, WSDL, etc.)
Requires client to maintain open socket
connection throughout transfer
Inconvenient for long transfers
Cannot recover from client failures
Sandeep K
um
ar P
oonia
GRIDFTP
Sandeep K
um
ar P
oonia
RELIABLE FILE TRANSFER (RFT)
Web service with ―job-scheduler‖ functionality for data
movement
User provides source and destination URLs
Service writes job description to a database and moves
files
Service methods for querying transfer status
Sandeep K
um
ar P
oonia
RFT
Sandeep K
um
ar P
oonia
REPLICA LOCATION SERVICE (RLS)
Registry to keep track of where replicas exist on physical
storage system
Users or services register files in RLS when files created
Distributed registry
May consist of multiple servers at different sites
Increase scale
Fault tolerance
Sandeep K
um
ar P
oonia
REPLICA LOCATION SERVICE (RLS)
Logical file name – unique identifier for contents of file
Physical file name – location of copy of file on storage system
User can provide logical name and ask for replicas
Or query to find logical name associated with physical file location
Sandeep K
um
ar P
oonia
DATA REPLICATION SERVICE (DRS) Pull-based replication capability
Implemented as a web service
Higher-level data management service built on top of RFT
and RLS
Goal: ensure that a specified set of files exists on a storage
site
First, query RLS to locate desired files
Next, creates transfer request using RFT
Finally, new replicas are registered with RLS
Sandeep K
um
ar P
oonia
CONDOR
Original goal: high-throughput computing
Harvest wasted CPU power from other machines
Can also be used on a dedicated cluster
Condor-G – Condor interface to Globus resources
Sandeep K
um
ar P
oonia
CONDOR
Provides many features of batch systems:
job queueing
scheduling policy
priority scheme
resource monitoring
resource management
Users submit their serial or parallel jobs
Condor places them into a queue
Scheduling and monitoring
Informs the user upon completion
Sandeep K
um
ar P
oonia
NIMROD-G Tool to manage execution of parametric studies across
distributed computers
Manages experiment
Distributing files to remote systems
Performing the remote computation
Gathering results
User submits declarative plan file
Parameters, default values, and commands necessary for
performing the work
Nimrod-G takes advantage of Globus toolkit features
Sandeep K
um
ar P
oonia
NIMROD-G ARCHITECTURE
Sandeep K
um
ar P
oonia
GRID CASE STUDIES
Earth System Grid
LIGO
TeraGrid
Sandeep K
um
ar P
oonia
EARTH SYSTEM GRID
Provide climate studies scientists with access to
large datasets
Data generated by computational models –
requires massive computational power
Most scientists work with subsets of the data
Requires access to local copies of data
Sandeep K
um
ar P
oonia
ESG INFRASTRUCTURE
Archival storage systems and disk storage systems at
several sites
Storage resource managers and GridFTP servers to
provide access to storage systems
Metadata catalog services
Replica location services
Web portal user interface
Sandeep K
um
ar P
oonia
EARTH SYSTEM GRID
Sandeep K
um
ar P
oonia
EARTH SYSTEM GRID INTERFACE
Sandeep K
um
ar P
oonia
LASER INTERFEROMETER
GRAVITATIONAL WAVE
OBSERVATORY (LIGO)
Instruments at two sites to detect gravitational waves
Each experiment run produces millions of files
Scientists at other sites want these datasets on local storage
LIGO deploys RLS servers at each site to register local
mappings and collect info about mappings at other sites
Sandeep K
um
ar P
oonia
LARGE SCALE DATA REPLICATION
FOR LIGO
Goal: detection of gravitational waves
Three interferometers at two sites
Generate 1 TB of data daily
Need to replicate this data across 9 sites to make
it available to scientists
Scientists need to learn where data items are,
and how to access them
Sandeep K
um
ar P
oonia
LIGO
Sandeep K
um
ar P
oonia
LIGO SOLUTION
Lightweight data replicator (LDR)
Uses parallel data streams, tunable TCP windows, and
tunable write/read buffers
Tracks where copies of specific files can be found
Stores descriptive information (metadata) in a
database
Can select files based on description rather than filename
Sandeep K
um
ar P
oonia
TERAGRID
NSF high-performance computing facility
Nine distributed sites, each with different
capability , e.g., computation power, archiving
facilities, visualization software
Applications may require more than one site
Data sizes on the order of gigabytes or terabytes
Sandeep K
um
ar P
oonia
TERAGRID
Sandeep K
um
ar P
oonia
TERAGRID
Solution: Use GridFTP and RFT with front end
command line tool (tgcp)
Benefits of system:
Simple user interface
High performance data transfer capability
Ability to recover from both client and server software
failures
Extensible configuration
Sandeep K
um
ar P
oonia
TGCP DETAILS
Idea: hide low level GridFTP commands from users
Copy file smallfile.dat in a working directory to another
system:
tgcp smallfile.dat tg-login.sdsc.teragrid.org:/users/ux454332
GridFTP command:
globus-url-copy -p 8 -tcp-bs 1198372 \
gsiftp://tg-gridftprr.uc.teragrid.org:2811/home/navarro/smallfile.dat
\
gsiftp://tg-login.sdsc.teragrid.org:2811/users/ux454332/smallfile.dat
Sandeep K
um
ar P
oonia
GRID ARCHITECTURE
Sandeep K
um
ar P
oonia
THE HOURGLASS MODEL
Focus on architecture issues
Propose set of core services as
basic infrastructure
Used to construct high-level,
domain-specific solutions
(diverse)
Design principles
Keep participation cost low
Enable local control
Support for adaptation
―IP hourglass‖ model
Diverse global services
Coreservices
Local OS
A p p l i c a t i o n s
Sandeep K
um
ar P
oonia
LAYERED GRID ARCHITECTURE
(BY ANALOGY TO INTERNET ARCHITECTURE)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
Internet
Transport
Application
Link
Inte
rnet P
roto
col A
rchite
ctu
re
Sandeep K
um
ar P
oonia
EXAMPLE:
DATA GRID ARCHITECTURE
Discipline-Specific Data Grid Application
Coherency control, replica selection, task management, virtual data catalog, virtual data code catalog, …
Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs,
Access to data, access to computers, access to network performance data, …
Communication, service discovery (DNS), authentication, authorization, delegation
Storage systems, clusters, networks, network caches, …
Collective(App)
App
Collective(Generic)
Resource
Connect
Fabric
Sandeep K
um
ar P
oonia
SIMULATION TOOLS
GridSim – job scheduling
SimGrid – single client multiserver
scheduling
Bricks – scheduling
GangSim- Ganglia VO
OptoSim – Data Grid Simulations
G3S – Grid Security services Simulator –
security services
Sandeep K
um
ar P
oonia
SIMULATION TOOL
GridSim is a Java-based toolkit for modeling,and simulation of distributed resourcemanagement and scheduling for conventionalGrid environment.
GridSim is based on SimJava, a generalpurpose discrete-event simulation packageimplemented in Java.
All components in GridSim communicate witheach other through message passing operationsdefined by SimJava.
Sandeep K
um
ar P
oonia
SALIENT FEATURES OF THE GRIDSIM
It allows modeling of heterogeneous types of resources.
Resources can be modeled operating under space-or time-shared mode.
Resource capability can be defined (in the form of MIPS (Million Instructions Per Second) benchmark.
Resources can be located in any time zone.
Weekends and holidays can be mapped depending on resource‘s local time to model non-Grid (local) workload.
Resources can be booked for advance reservation.
Applications with different parallel applicationmodels can be simulated.
Sandeep K
um
ar P
oonia
SALIENT FEATURES OF THE GRIDSIM Application tasks can be heterogeneous and they can
be CPU or I/O intensive.
There is no limit on the number of application jobsthat can be submitted to a resource.
Multiple user entities can submit tasks for executionsimultaneously in the same resource, which may betime-shared or space-shared. This feature helps inbuilding schedulers that can use different market-driven economic models for selecting servicescompetitively.
Network speed between resources can be specified.
It supports simulation of both static and dynamicschedulers.
Statistics of all or selected operations can be recordedand they can be analyzed using GridSim statisticsanalysis methods.
Sandeep K
um
ar P
oonia
A MODULAR ARCHITECTURE FOR GRIDSIM
PLATFORM AND COMPONENTS.
Appn Conf Res Conf User Req Grid Sc Output
Application, User, Grid Scenario’s input and Results
Grid Resource Brokers or Schedulers
…
Appn
modeling
Res entity Info serv Job mgmt Res alloc Statis
GridSim Toolkit
Single
CPU
SMPs Clusters Load Netw Reservation
Resource Modeling and Simulation
SimJava Distributed SimJava
Basic Discrete Event Simulation Infrastructure
PCs Workstation ClustersSMPs Distributed
Resources
Virtual Machine
Sandeep K
um
ar P
oonia
Sandeep K
um
ar P
oonia
Sandeep Kumar Poonia