Top Banner
Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia www.gridbus.org WW Grid
44

Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

Jan 11, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

Parallel and Distributed Computing:

Clusters and GridsInformation Session

Subject Code: 433-498

Rajkumar BuyyaGrid Computing and Distributed Systems

(GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org

WW Grid

Page 2: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

2

2100

2100 2100 2100 2100

2100 2100 2100 2100

Desktop SMPs or SuperComputers

LocalCluster

GlobalCluster/Grid

PERFORMANCE

Inter PlanetaryGrid!

•Individual•Group•Department•Campus•State•National•Globe•Inter Planet•Galaxy

Administrative Barriers

EnterpriseCluster/Grid

?

Scalable HPC: Breaking Administrative Barriers & new challenges

Page 3: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

3

Why SC? Large Scale Explorations need them—Killer Applications.

Solving grand challenge applications using modeling, simulation and analysis

Life Sciences

CAD/CAM

Aerospace

Military ApplicationsDigital Biology Military ApplicationsMilitary Applications

Internet & Ecommerce

Page 4: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

4

Page 5: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

5

PART 2: Cluster Architectures

The promise of supercomputing to the average PC User ?

Page 6: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

6

HPCC Books, 2 Volumes - Prentice Hall,

1999 Edited by R.Buyya with contributions from over 100 leading researchers

(www.buyya.com/cluster/)(www.buyya.com/cluster/)

Page 7: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

7

Agenda

• Cluster ? Enabling Tech. & Motivations

• Cluster Architecture• Cluster Components• Single System Image• Next Section (after break)

• Case Studies • Cluster Programming and Application

Design• Resources and Conclusions

Page 8: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

8

Rise and Fall of Computer Architectures

Vector Computers (VC) - proprietary system: provided the breakthrough needed for the emergence of

computational science, buy they were only a partial answer. Massively Parallel Processors (MPP) -

proprietary systems: high cost and a low performance/price ratio.

Symmetric Multiprocessors (SMP): suffers from scalability

Distributed Systems: difficult to use and hard to extract parallel performance.

Clusters - gaining popularity: High Performance Computing - Commodity Supercomputing High Availability Computing - Mission Critical Applications

Page 9: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

9

Cluster computing: Past, Present, Future

1960 1990 1995+1980s 2000+

PDAClusters

Page 10: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

10

Definition: What is a Cluster?

A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource.

“stand-alone” (whole computer) computer that can be used on its own (full hardware and OS).

Page 11: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

11

So What’s So Different about Clusters?

Commodity Parts? Communications Packaging? Incremental Scalability? Independent Failure? Intelligent Network Interfaces? Complete System on every node

virtual memory scheduler files …

Nodes can be used individually or combined...

Page 12: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

12

Cluster Computer Architecture

Sequential Applications

Parallel Applications

Parallel Programming Environment

Cluster Middleware

(Single System Image and Availability Infrastructure)

Cluster Interconnection Network/Switch

PC/Workstation

Network Interface Hardware

Communications

Software

PC/Workstation

Network Interface Hardware

Communications

Software

PC/Workstation

Network Interface Hardware

Communications

Software

PC/Workstation

Network Interface Hardware

Communications

Software

Sequential Applications

Sequential Applications

Parallel ApplicationsParallel

Applications

Page 13: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

13

• Enhanced Performance (performance @ low cost)

• Enhanced Availability (failure management)

• Single System Image (look-and-feel of one system)

• Size Scalability (physical & application)

• Fast Communication (networks & protocols)

• Load Balancing (CPU, Net, Memory, Disk)

• Security and Encryption (clusters of clusters)

• Distributed Environment (Social issues)

• Manageability (admin. And control)

• Programmability (simple API if required)

• Applicability (cluster-aware and non-aware app.)

A major issues in Cluster design

Page 14: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

14

Scalability Vs. Single System Image

UP

Page 15: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

15

Cluster Applications

Numerous Scientific & engineering Apps. Business Applications:

E-commerce Applications (Amazon, eBay ….); Database Applications (Oracle on clusters).

Internet Applications: ASPs (Application Service Providers); Computing Portals; E-commerce and E-business.

Mission Critical Applications: command control systems, banks, nuclear reactor

control, star-wars, and handling life threatening situations.

Page 16: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

16

Science Portals - e.g., Papia system

Papia PC Cluster

• Pentiums.• Myrinet.• NetBSD/Linuux.• PM.• Score-D.• MPC++.

RWCP - http://www.rwcp.or.jp/papia/

Page 17: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

17

Adoption of the Approach

Page 18: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

18

2100

2100 2100 2100 2100

2100 2100 2100 2100

Desktop SMPs or SuperComputers

LocalCluster

GlobalCluster/Grid

PERFORMANCE

Inter PlanetaryGrid!

•Individual•Group•Department•Campus•State•National•Globe•Inter Planet•Galaxy

Administrative Barriers

EnterpriseCluster/Grid

?

Scalable HPC: Breaking Administrative Barriers & new challenges

Page 19: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

19

Towards Grid Computing

Page 20: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

20

What is Grid ?

A paradigm/infrastructure that enabling the sharing, selection, & aggregationof geographically distributed resources:

Computers – PCs, workstations, clusters, supercomputers, laptops, notebooks, mobile devices, PDA, etc;

Software – e.g., ASPs renting expensive special purpose applications on demand;

Catalogued data and databases – e.g. transparent access to human genome database;

Special devices/instruments – e.g., radio telescope – SETI@Home searching for life in galaxy.

People/collaborators.

[depending on their availability, capability, cost, and user QoS requirements]

for solving large-scale problems/applications. Thus enabling the creation of “virtual enterprises” (VEs)

Widearea

Page 21: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

21

P2P/Grid Applications-Drivers

Distributed HPC (Supercomputing): Computational science.

High-Capacity/Throughput Computing: Large scale simulation/chip design & parameter studies.

Content Sharing (free or paid) Sharing digital contents among peers (e.g., Napster)

Remote software access/renting services: Application service provides (ASPs) & Web services.

Data-intensive computing: Drug Design, Particle Physics, Stock Prediction...

On-demand, realtime computing: Medical instrumentation & Mission Critical.

Collaborative Computing: Collaborative design, Data exploration, education.

Service Oriented Computing (SOC): Computing as Competitive Utility: New paradigm, new

industries, and new business.

Page 22: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

22

A Typical Grid Computing Environment

Grid Resource Broker

Resource Broker

Application

Grid Information Service

Grid Resource Broker

databaseR2R3

RN

R1

R4

R5

R6

Grid Information Service

Page 23: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

23

Need Grid tools for managing

Security

Resource Allocation & Scheduling

Data locality

Network Management

System Management

Resource Discovery

Uniform Access

Computational Economy

Application Development Tools

Page 24: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

24

mix-and-match

Object-oriented

Internet/partial-P2P

Network enabled Solvers

Market/Computational Economy

Page 25: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

25

Many Grid Projects & Initiatives

Australia Nimrod-G GridSim Virtual Lab Active Sheets DISCWorld ..new coming up

Europe UNICORE MOL UK eScience Poland MC Broker EU Data Grid EuroGrid MetaMPI Dutch DAS XW, JaWSJapan Ninf DataFarm

Korea...N*Grid

USA Globus Legion OGSA Javelin AppLeS NASA IPG Condor-G Jxta NetSolve AccessGrid and many more...

Cycle Stealing & .com Initiatives Distributed.net SETI@Home, …. Entropia, UD, Parabon,….

Public Forums Global Grid Forum P2P Working Group IEEE TFCC Grid & CCGrid conferences

http://www.gridcomputing.com

Page 26: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

Grid Computing Projects

GRIDS Lab @ Melbourne

Page 27: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

27

The Gridbus Vision: To Enable Service Oriented Grid Computing & Bus iness!

WW Grid

World Wide Grid!

Nimrod-G

Page 28: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

28

GRIDS Lab @ the U. of Melbourne, The Gridbus Project:

www.gridbus.org Grid Economy & Distributed Scheduling (via Nimrod-G

Broker) http://www.buyya.com/ecogrid

GridSim Toolkit: Grid Modeling and Simulation (Java based): http://www.buyya.com/gridsim/

Libra: Economic Cluster Scheduler http://www.buyya.com/libra/

Grid Bank: Accounting, Payment, Enforcement Mechanisms World Wide Grid (WWG) testbed:

http://www.buyya.com/ecogrid/wwg/ Application Enabler Projects:

Virtual Laboratory Toolset for Drug Design High-Energy Physics and the Grid Network (HEPGrid) Brain Activity Analysis on the Grid

Cluster and Grid Info Centres: www.buyya.com/cluster/ ||

www.gridcomputing.com

Page 29: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

29

A resource broker for managing, steering, and executing task farming (parameter sweep/SPMD model) applications on Grid based on deadline and computational economy.

Based on users’ QoS requirements, our Broker dynamically leases services at runtime depending on their quality, cost, and availability.

Key Features A single window to manage & control experiment Persistent and Programmable Task Farming Engine Resource Discovery Resource Trading Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & results Steering & data management Accounting

Nimrod/G : A Grid Resource Broker

Page 30: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

30

Drug Design: Data Intensive Computing on Grid

It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates.

Protein

Molecules

Chemical Databases(legacy, in .MOL2 format)

Page 31: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

31

MEG(MagnetoEncephaloGraphy) Data Analysis on the Grid: Brain Activity

Analysis

Life-electronics laboratory,AIST

Data Analysis

•Provision of expertise in the analysis of brain function•Provision of MEG analysis

Data Generation

Nimrod-G

64 sensors MEG

Results

Analysis All pairs (64x64) of MEG data by shifting the temporal region of MEG data over time: 0 to 29750: 64x64x29750 jobs

World-Wide Grid•[deadline, budget, optimization preference]

1

5

4

3

2

[Collaboration with Osaka University, Japan]

Page 32: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

32

A Glance at Nimrod-G Broker

Grid Middleware

Nimrod/G Client Nimrod/G ClientNimrod/G Client

Grid Information Server(s)

Schedule Advisor

Trading Manager

Nimrod/G Engine

GridStore

Grid Explorer

GE GISTM TS

RM & TS

Grid Dispatcher

RM: Local Resource Manager, TS: Trade Server

Globus, Legion, Condor, etc.

G

G

CL

Globus enabled node.Legion enabled node.

GL

Condor enabled node.

RM & TSRM & TS

C LSee HPCAsia 2000 paper!

Page 33: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

33

Active Sheet:Microsoft Excel Spreadsheet Processing on Grid

NimrodNimrodProxyProxy

Nimrod-GNimrod-G

World-Wide Grid

Page 34: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

34

Page 35: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

35

GridSim ToolkitA Java based tool for Grid Scheduling

Simulations

Basic Discrete Event Simulation Infrastructure

Virtual Machine (Java, cJVM, RMI)

PCs ClustersWorkstations

. . .

SMPs Distributed Resources

GridSim Toolkit

Application Modeling

InformationServices

Resource Allocation

Grid Resource Brokers or Schedulers

Statistics

Resource Modeling and Simulation (with Time and Space shared schedulers)

Job Management

ClustersSingle CPU ReservationSMPs Load Pattern

Application Configuration

Resource Configuration

User Requirements

Grid Scenario

Network

SimJava Distributed SimJava

Resource Entities

Output

Application, User, Grid Scenario’s Input and Results

Page 36: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

36

Selected GridSim Users!

Page 37: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

Alessandro Volta in Paris in 1801 inside French National Institute shows the battery

while in the presence of Napoleon I

Fresco by N. Cianfanelli (1841) (Zoological Section "La Specula" of National History Museum of Florence

University)

Page 38: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

38

….and in the future, I imagine a WorldwidePower (Electrical) Grid …...

What ?!?!This is a mad man…

Oh, monDieu !

Page 39: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

39

2002 - 1801 = 201 Years

Page 40: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

40

Download Software & Information

Nimrod & Parameteric Computing: http://www.csse.monash.edu.au/~davida/nimrod/

Economy Grid & Nimrod/G: http://www.buyya.com/ecogrid/

Virtual Laboratory Toolset for Drug Design: http://www.buyya.com/vlab/

Grid Simulation (GridSim) Toolkit (Java based): http://www.buyya.com/gridsim/

World Wide Grid (WWG) testbed: http://www.buyya.com/ecogrid/wwg/

Cluster and Grid Info Centres: www.buyya.com/cluster/ || www.gridcomputing.com

Page 41: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

41

Further Information

Books: High Performance Cluster Computing, V1,

V2, R.Buyya (Ed), Prentice Hall, 1999. The GRID, I. Foster and C. Kesselman (Eds),

Morgan-Kaufmann, 1999. IEEE Task Force on Cluster Computing

http://www.ieeetfcc.org Global Grid Forum

www.gridforum.org

IEEE/ACM CCGrid’xy: www.ccgrid.org CCGrid 2002, Berlin: ccgrid2002.zib.de

Grid workshop - www.gridcomputing.org

Page 42: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

42

Further Information

Cluster Computing Info Centre: http://www.buyya.com/cluster/

Grid Computing Info Centre: http://www.gridcomputing.com

IEEE DS Online - Grid Computing area:

http://computer.org/dsonline/gc

Compute Power Market Project http://www.ComputePower.com

Page 43: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

43

Final Word?

Page 44: Parallel and Distributed Computing: Clusters and Grids Information Session Subject Code: 433-498 Rajkumar Buyya Grid Computing and Distributed Systems.

Backup Slides