Top Banner
From PC Clusters to a From PC Clusters to a Global Computational Global Computational Grid Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon Giddy, DST Rok Sosic, Active Tool Andrew Lewis, QPS Ian Foster, AN Rajkumar Buyya, Monas Tom Peachy, Monas
54

From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

Dec 30, 2015

Download

Documents

Jonathan Reeves
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

From PC Clusters to a From PC Clusters to a Global Computational GridGlobal Computational Grid

From PC Clusters to a From PC Clusters to a Global Computational GridGlobal Computational Grid

David Abramson

Head of School

Computer Science and Software Engineering

Monash University Thanks to Jon Giddy, DSTC

Rok Sosic, Active ToolsAndrew Lewis, QPSF

Ian Foster, ANLRajkumar Buyya, Monash

Tom Peachy, Monash

Page 2: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

22©David Abramson

Applications

Nimrod/G‘98 -

DSTC

Nimrod/ONimrod/O‘‘97 - ‘9997 - ‘99

ARC

Research ModelResearch Model

NimrodNimrod‘‘94 - ‘9894 - ‘98

ActiveSheets‘00 -

DSTC

Commercialisation (‘97 -)

Page 3: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

33©David Abramson

Parametrised ModellingParametrised ModellingKiller App for the Grid?Killer App for the Grid?Parametrised ModellingParametrised ModellingKiller App for the Grid?Killer App for the Grid?

Study the behaviour of some of the output variables against a range of different input scenarios.

Computations are uncoupled (file transfer)

Allows real time analysis for many applications

More realistic simulations

Study the behaviour of some of the output variables against a range of different input scenarios.

Computations are uncoupled (file transfer)

Allows real time analysis for many applications

More realistic simulations

Page 4: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

44©David Abramson

Working with Small ClustersWorking with Small ClustersNimrod (1994 - )

– DSTC Funded project– Designed for Department level clusters– Proof of concept

Clustor (www.activetools.com) (1997 - )– Commercial version of Nimrod– Re-engineered

Features– Workstation Orientation– Access to idle workstations– Random allocation policy– Password security

Page 5: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

55©David Abramson

Execution ArchitectureExecution Architecture

Input FilesInput FilesSubstitutionSubstitution

Output FilesOutput Files

Root MachineRoot Machine Computational Computational NodesNodes

Page 6: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

Clustor ToolsClustor ToolsClustor ToolsClustor Tools

Page 7: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

77©David Abramson

Physical Model

f

fTime to Time to crack in crack in this positionthis position (Courtesy Prof Rhys Jones, (Courtesy Prof Rhys Jones,

Dept Mechanical Engineering,Dept Mechanical Engineering,Monash University)Monash University)

Clustor by example Clustor by example

Page 8: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

88©David Abramson

Dispatch cycle using Clustor ...Dispatch cycle using Clustor ...

Page 9: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

99©David Abramson

Sample Applications of ClustorSample Applications of ClustorSample Applications of ClustorSample Applications of Clustor

Bioinformatics: Bioinformatics: Protein Protein

ModellingModelling

SensitivitySensitivityexperiments experiments

on smog formationon smog formation

Combinatorial Combinatorial Optimization:Optimization:

Meta-heuristic Meta-heuristic parameter estimationparameter estimation

Ecological Modelling: Ecological Modelling: Control Strategies Control Strategies

for Cattle Tickfor Cattle Tick

Electronic CAD: Electronic CAD: Field Programmable Field Programmable

Gate ArraysGate Arrays

Computer Graphics: Computer Graphics: Ray TracingRay Tracing

High Energy High Energy Physics: Physics:

Searching for Searching for Rare EventsRare Events

Physics: Physics: Laser-Atom Laser-Atom

CollisionsCollisions

VLSI Design: VLSI Design: SPICE SimulationsSPICE Simulations

Fuzzy Logic Fuzzy Logic Parameter settingParameter setting

ATM Network DesignATM Network Design

Page 10: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1010©David Abramson

SMOG Sensitivity ExperimentsSMOG Sensitivity ExperimentsSMOG Sensitivity ExperimentsSMOG Sensitivity Experiments

Control ROCControl ROC

Co

ntr

ol N

Ox

Co

ntr

ol N

Ox

$$$$$$

Page 11: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1111©David Abramson

Physics - Laser Physics - Laser InteractionInteractionPhysics - Laser Physics - Laser InteractionInteraction

Page 12: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1212©David Abramson

Electronic CADElectronic CADElectronic CADElectronic CAD

Page 13: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1313©David Abramson

Dr Dinelli MatherMonash University &MacFarlane Burnett

Public Health Policy

Health Standards

Lew KotlerAustralian Radiation Protection and Nuclear Safety Agency

Airframe Simulation

Dr Shane Dunn,AMRL, DSTO

Network Simulation

Dr Mahbun Hassan, Monash

Current Application DriversCurrent Application Drivers

Page 14: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1414©David Abramson

Evolution of the Global GridEvolution of the Global Grid

GlobalGlobalClustersClusters

DesktopDesktop DepartmentDepartmentClustersClusters

SharedSharedSupercomputerSupercomputer

Enterprise-WideEnterprise-WideClustersClusters

Page 15: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1515©David Abramson

The Nimrod Vision ...The Nimrod Vision ... Can we Can we make it 10% make it 10%

smaller?smaller? We need We need the answer the answer by 5 o’clockby 5 o’clock

Page 16: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1616©David AbramsonSource: www.globus.org & updated

Towards Grid Computing…. Towards Grid Computing…. The Gusto TestbedThe Gusto Testbed

Page 17: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1717©David Abramson

What does the Grid have to offer?What does the Grid have to offer?

“Dependable, consistent, pervasive access to

[high-end] resources”

Dependable: Can provide performance and functionality guarantees

Consistent: Uniform interfaces to a wide variety of resources

Pervasive: Ability to “plug in” from anywhere

Source: www.globus.org

Page 18: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1818©David Abramson

Challenges for the Global GridChallenges for the Global Grid

Security

Resource Allocation & Scheduling

Data locality

Network Management

System ManagementResource Location

Uniform Access

Page 19: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

1919©David Abramson

Nimrod on Enterprise Wide Nimrod on Enterprise Wide Networks Networks and the Global Gridand the Global Grid

Manual resource location– Static file of machine names

No resource Scheduling– First come first serve

No cost Model– All machines/users cost alike

Homogeneous Access Mechanism

Page 20: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2020©David Abramson

RequirementsRequirements

Users & system managers want to know– Where it will run– When it will run– How much it will cost– That access is secure– Will support a range of

access mechanisms

Page 21: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2121©David AbramsonSource: www.globus.org

The Globus ProjectThe Globus Project

Basic research in grid-related technologies– Resource management, QoS, networking, storage, security, adaptation,

policy, etc.

Development of Globus toolkit– Core services for grid-enabled tools & applns

Construction of large grid testbed: GUSTO– Largest grid testbed in terms of sites & apps

Application experiments– Tele-immersion, distributed computing, etc.

Page 22: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2222©David Abramson

Layered Globus ArchitectureLayered Globus Architecture

Applications

Local Services

LSF

Condor MPI

NQEEasy

TCP

SolarisIrixAIX

UDP

High-level Services and Tools

DUROC globusrunMPI Nimrod/GMPI-IO CC++

GlobusView Testbed Status

Core ServicesMetacomputing

Directory Service

GRAMGlobus

Security Interface

Heartbeat Monitor

Nexus

Gloperf GASS

Source: www.globus.org

Page 23: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2323©David Abramson

Some issues for Nimrod/GSome issues for Nimrod/G

Page 24: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2424©David Abramson

Resource LocationResource Location

Need to locate suitable machines for an experiment– Speed– Number of processors– Cost– Availability– User account

Available resources will vary across experiment

Supported through Directory Server (Globus MDS)

Page 25: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2525©David Abramson

Resource SchedulingResource Scheduling

User view– solve problem in minimum time

System– Spread load across machines

Soft real time problem through deadlines– Complete by deadline– Unreliable resource provision

– Machine load may change at any time– Multiple machine queues

Page 26: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2626©David Abramson

Resource Scheduling ...Resource Scheduling ...

Need to establish rate at which a machine can consume jobs

Use deadline as metric for machine performance

Move jobs to machines that are performing well

Remove jobs from machines that are falling behind

Node 4

Node 2

Time

Page 27: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2727©David Abramson

Computational EconomyComputational Economy

Resource selection on based real money and market based

A large number of sellers and buyers (resources may be dedicated/shared)

Negotiation: tenders/bids and select those offers meet the requirement

Trading and Advance Resource Reservation

Schedule computations on those resources that meet all requirements

Page 28: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2828©David Abramson

Cost ModelCost Model

Without cost ANY shared system becomes un-managable

Charge users more for remote facilities than their own

Choose cheaper resources before more expensive ones

Cost units may be– Dollars– Shares in global facility– Stored in bank

Page 29: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

2929©David Abramson

Cost Model ...Cost Model ...

Non-uniform costing

Encourages use of local resources first

Real accounting systemcan control machine usage

11 33

22 11User 5User 5

Mach

ine 1

Mach

ine 1

User 1User 1

Mach

ine 5

Mach

ine 5

Page 30: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3030©David Abramson

SecuritySecurity

Uses Globus Security Layer

Generic Security Service API using an implementation of SSL, Secure Sockets Layer.

RSA encryption algorithm employing both public and private keys.

X509 certificate consisting of – duration of the permissions, – the RSA public key, – signature of the Certificate Authority (CA).

Page 31: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3131©David Abramson

Uniform AccessUniform Access

Resource Allocation Module (GRAM) provides interface to range of schemes

– Fork– Queue (Easy, LoadLeveler, Condor, LSF)

Multiple pathways to same machine (if supported)

Integrated with Security scheme

Page 32: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3232©David Abramson

Nimrod/G ArchitectureNimrod/G Architecture

Nimrod/G Client Nimrod/G ClientNimrod/G Client

Grid Directory Services

Schedule Advisor

Resource Discovery

Grid Middleware Services

Dispatcher

GUSTO Test Bed

Parametric Engine

Persistent Info.

Page 33: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3333©David Abramson

Nimrod/G InteractionsNimrod/G Interactions

MDSserver

Resource location

QueuingSystem

GRAMserver

Resource allocation

(local)

Additional services used implicitly:• GSI (authentication & authorization)• Nexus (communication)

Userprocess

File accessGASSserver

Gatekeeper node

JobWrapper

Computational node

Dispatcher

Root node

Scheduler

Prmtc..Engine

Page 34: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3434©David Abramson

A Nimrod/G ClientA Nimrod/G Client

CostCostDeadlineDeadline

AvailableAvailableMachinesMachines

Page 35: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3535©David Abramson

Nimrod/G Scheduling AlgorithmNimrod/G Scheduling Algorithm

Find a set of machines (MDS search)

Distribute jobs from root to machines

Establish job consumption rate for each machine

For each machine

Can we meet deadline?

If not, then return some jobs to root

If yes, distribute more jobs to resource

If cannot meet deadline with current resource

Find additional resources

Page 36: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3636©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 37: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3737©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 38: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3838©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 39: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

3939©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 40: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4040©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 41: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4141©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 42: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4242©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 43: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4343©David Abramson

Nimrod/G Scheduling algorithm ...Nimrod/G Scheduling algorithm ...

LocateLocate

MachinesMachines

DistributeDistribute

JobsJobs

EstablishEstablish

RatesRates

MeetMeet

Deadlines?Deadlines?

Re-distributeRe-distribute

JobsJobs

LocateLocate

moremore

MachinesMachines

Page 44: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4444©David Abramson

Some results experimentsSome results experimentsGraph 2 - GUSTO Usage for Ionization Chamber Study

0

10

20

30

40

50

60

70

80

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

Ave

rag

eN

o P

roce

sso

rs

20 Hour deadline15 hour deadline10 hour deadline

Page 45: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4545©David Abramson

Graph 3 - GUSTO Usage for 20 Hour Deadline

0

2

4

6

8

10

12

14

16

18

20

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

Ave

rag

e N

o P

roce

sso

rs

5 CUs

10 CUs

15 CUs

20 CUs

50 CUs

5 Cost Units

10 Cost Units

Page 46: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4646©David Abramson

Graph 4 - GUSTO Usage for 15 Hour Deadline

0

2

4

6

8

10

12

14

16

18

20

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

Ave

rag

e N

o P

roc

es

so

rs

5 CUs

10 CUs

15 CUs

20 CUs

50 CUs

5 Cost Units

50 Cost Units

15 Cost Units

10 Cost Units

Page 47: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4747©David Abramson

Graph 5 - GUSTO Usage for 10 Hour Deadline

0

5

10

15

20

25

30

35

0 2.5 5 7.5 10 12.5 15 17.5 20

Time

No

Pro

ce

ss

es 5 CUs

10 CUs

15 CUs

20 CUs

50 CUs

10 Cost Units

50 Cost Units

20 Cost Units

5 Cost Units

15 Cost Units

Page 48: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4848©David Abramson

Optimal Design using Optimal Design using computation - Nimrod/Ocomputation - Nimrod/O

Clustor allows exploration of design scenarios– Search by enumeration

Search for local/global minima based on objective function– How do I minimise the cost of this design?– How do I maxmimize the life of this object?

Objective function evaluated by computational model– Computationally expensive

Driven by applications

Page 49: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

4949©David Abramson

Application DriversApplication Drivers

Complex industrial design problems– Air quality– Antenna Design– Business Simulation– Mechanical Optimisation

Page 50: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

5050©David Abramson

Cost function minimizationCost function minimization

Continuous functions - gradient descent

Quasi-Newton BFGS algorithm– find gradient using finite difference approximation– line search using bound constrained, parallel method

Page 51: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

5151©David Abramson

ImplementationImplementation

Master - slave parallelization

Gradient-determination & line-searching– tasks queued via IBM LoadLeveler– (adapt to number of CPUs allocated by the Resource Manager)

Interfaced to existing dispatchers– Clustor – Nimrod/G

Page 52: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

5252©David Abramson

Meta-heuristicMeta-heuristic

SearchSearchMeta-heuristicMeta-heuristic

SearchSearch

Supercomputer orSupercomputer orCluster PoolCluster Pool

ArchitectureArchitecture

BFGSBFGS

ClustorClustorDispatcherDispatcher

FunctionFunctionEvaluationsEvaluations

JobsJobs

ClustorClustorPlanPlanFileFile

Page 53: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

5353©David Abramson

Ongoing researchOngoing research

Increased parallelism– Multi-start for better coverage– High dimensioned problems– Addition of other search algorithms

– Simplex algorithm

Mixed integer problems– BFGS modified to support mixed integer– Mixed search/enumeration– Meta-heuristic based search

– Adaptive Simulated Annealing (ASA)

Page 54: From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

5454©David Abramson

Further InformationFurther Information

Nimrod www.csse.monash.edu.au/~davida/nimrod.html

DSTC www.dstc.edu.au

Globus www.globus.org

Activetools www.activetools.com

Our Cluster hathor.csse.monash.edu.au