HPGC 2006 Workshop on High-Performance Grid Computing

HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006

HPGC 2006 Workshop on High-Performance Grid Computingat IPDPC 2006 Rhodes Island, Greece, April 25 – 29, 2006

Major HPC Grid Projects

From Grid Testbeds to Sustainable

High-Performance Grid Infrastructures

Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG, e-IRG

[email protected]

Thanks to: Eric Aubanel, Virendra Bhavsar, Michael Frumkin, Rob F. Van der Wijngaart

mailto:[email protected]


HPGC 2006 Workshop on High-Performance Grid Computingat IPDPC 2006 Rhodes Island, Greece, April 25 – 29, 2006

Major HPC Grid Projects

From Grid Testbeds to Sustainable

High-Performance Grid Infrastructures

Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG, e-IRG

[email protected]

Thanks to: Eric Aubanel, Virendra Bhavsar, Michael Frumkin, Rob F. Van der Wijngaart

and INTEL

mailto:[email protected]


Focus

… on HPC capabilities of grids

… on sustainable grid infrastructures

… selected six major HPC grid projects:

UK e-Science, US TeraGrid, NAREGI Japan,

EGEE and DEISA Europe, D-Grid Germany

… and I apologize for not mentioning

Your favorite grid project, but…


Too Many Major Grids to mention them all:


UK e-Science Gridstarted in early 2001

$400 Mio

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

SouthamptonLondon

Belfast

DL

RAL Hinxton

Application independent

6Neil Geddes

CCLRC e-Science

NGS Overview:User view

• Resources– 4 Core clusters – UK’s National HPC services– A range of partner contributions

• Access– Support UK academic researchers– Light weight peer review for limited “free” resources

• Central help desk – www.grid-support.ac.uk

7Neil Geddes

CCLRC e-Science

NGS Overview:Oganisational view

• Management– GOSC Board

• Strategic direction– Technical Board

• Technical coordination and policy

• Grid Operations Support Centre– Manages the NGS– Operates the UK CA + over 30 RA’s– Operates central helpdesk– Policies and procedures– Manage and monitor partners

Number of Registered NGS Users

0

50

100

150

200

250

300

14 January2004

23 April2004

01 August2004

09November

2004

17February

2005

28 May2005

05September

2005

14December

2005

Date

Nu

mb

er o

f U

sers

NGS UserRegistrations

Linear (NGS UserRegistrations)

NGS UseUsage Statistics (Total Hours for all 4 Core Nodes)

0

50000

100000

150000

200000

250000

12 34 56 789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100

10110

210

310

410

510

610

710

810

911

011

111

211

311

411

511

611

711

811

912

012

112

212

312

412

512

612

712

812

913

013

113

213

313

413

513

613

7

Users (Anonymous)

Hours User DN

Total

0

5

10

15

20

25

30

35

40

45

50

O=

univ

ersi

teit-

utre

cht

OU

=B

BS

RC

OU

=B

irmin

gha

m

OU

=B

risto

l

OU

=C

ambr

idg

e

OU

=C

ardi

ff

OU

=C

LRC

OU

=C

PP

M

OU

=D

LS

OU

=D

MP

HB

OU

=E

dinb

urgh

OU

=G

lasg

ow

OU

=Im

peria

l

OU

=La

ncas

ter

OU

=Le

eds

OU

=Li

verp

ool

OU

=M

anch

est

er

OU

=N

ewca

stle

OU

=N

ottin

gha

m

OU

=O

AS

IS

OU

=O

xfor

d

OU

=P

orts

mo

uth

OU

=Q

UB

OU

=Q

uee

nMa

ryLo

ndo

n

OU

=R

eadi

ng

OU

=S

heff

ield

OU

=S

outh

amp

ton

OU

=U

CL

OU

=W

arw

ick

OU

=W

estm

inst

er

OU

=Y

ork

Count of OU=

OU=

Total

0

20

40

60

80

100

120

140

160

bbsrc cclrc epsrc nerc pparc AHRC mrc esrc

Count of "RC"

"RC"

Files stored

Users by institution

CPU time by user

Users by discipline

biol

ogy

Larg

e fa

cilit

ies

Eng

. +

Phy

s. S

ci

Env

. S

ci

PP

+ A

stro

nom

y

Med

icin

e

Hum

ani

ties

Soc

iolo

gy

Over 320 users

9Neil Geddes

CCLRC e-Science

NGS Development

• Core Node refresh• Expand partnership

– HPC– Campus Grids– Data Centres– Digital Repositories – Experimental Facilities

• Baseline services– Aim to map user requirements onto standard

solutions– Support convergence/interoperability

• Move further towards project (VO) support– Support collaborative projects

• Mixed economy– Core resources– Shared resources– Project/project/contract specific resources

Baseline Services

Storage Element

Basic File Transfer

Reliable File Transfer

Catalogue Services

Data Management tools

Compute Element

Workload Management

VO specific services

VO Membership Services

DataBase Services

Posix-like I/O

Application Software Installation Tools

Job Monitoring

Reliable Messaging

Information System


The Architecture of Gateway Services

The Users Desktop

SecuritySecurity Data ManagementService

Data ManagementService

AccountingService

AccountingService

Notification ServiceNotification Service

PolicyPolicy Administration& Monitoring

Administration& Monitoring

Grid OrchestrationGrid OrchestrationResource

Allocation

ResourceAllocation

Reservations And Scheduling

Reservations And Scheduling

TeraGrid Gateway Services

Web Services Resource Framework – Web Services Notification

Grid Portal Server

Grid Portal Server

Physical Resource Layer

Core Grid Services

Proxy CertificateServer / vault

Proxy CertificateServer / vault

Application EventsApplication EventsResource BrokerResource Broker

User MetadataCatalog

User MetadataCatalog

Replica MgmtReplica Mgmt

ApplicationWorkflow

ApplicationWorkflow

App. Resourcecatalogs

App. Resourcecatalogs

ApplicationDeployment

ApplicationDeployment

Courtesy Jay Boisseau

11Charlie Catlett ([email protected])

0

10

20

30

40

50

60

70

Oct-04

Nov-04

Dec-04

Jan-05

Feb-05

Mar-05

Apr-05

May-05

Jun-05

Jul-05

Aug-05

Sep-05

Oct-05

Nov-05

Mont

hly U

sage

(Mill

ions

of N

U)

Total Monthly Usage

Monthly Roaming Usage

Annual Growth ~33%

TeraGrid Use

Physics15%

Materials5%

Biology26%

Math1%

GEO4%

Social Science

s

Astronomy9%

CS/Eng7%

ENG12%

Chem21% 600 users

1600 users

12Charlie Catlett ([email protected])

Delivering User Priorities in 2005

Remote File Read/WriteHigh-Performance File TransferCoupled Applications, Co-scheduling

Advanced Reservations

Grid Portal ToolkitsGrid Workflow ToolsBatch MetaschedulingGlobal File SystemClient-Side Computing ToolsBatch Scheduled Parameter Sweep Tools

Partners in Need(breadth of need)

Overall Score(depth of need)

DataGrid ComputingScience Gateways

Results of in-depth discussions with 16

TeraGrid user teams during first annual

user survey (August 2004).

CapabilityType


National Research Grid Infrastructure (NAREGI) 2003-2007

• Petascale Grid Infrastructure R&D for Future Deployment– $45 mil (US) + $16 mil x 5 (2003-2007) = $125 mil total– PL: Ken Miura (FujitsuNII)

• Sekiguchi(AIST), Matsuoka(Titech), Shimojo(Osaka-U), Aoyagi (Kyushu-U)…– Participation by multiple (>= 3) vendors, Fujitsu, NEC, Hitachi, NTT, etc.– NOT AN ACADEMIC PROJECT, ~100FTEs– Follow and contribute to GGF Standardization, esp. OGSA

AIST

Grid Middleware Grid Middleware R&DR&D

SuperSINETSuperSINET

Grid R&D Grid R&D Infrastr.Infrastr.

15 TF => 100TF15 TF => 100TF

National AAA National AAA Infr.Infr.

““NanoGrid”NanoGrid”IMS ~10TFIMS ~10TF

(BioGrid(BioGridRIKEN)RIKEN)

OtherOtherInst.Inst.

National ResearchNational ResearchGrid Middleware R&DGrid Middleware R&D

NanotechNanotechGrid AppsGrid Apps

(Biotech(BiotechGrid Apps)Grid Apps)

(Other(OtherApps)Apps)

Titech

Fujitsu

NECOsaka-U

U-Kyushu Hitachi

Focused “Grand Challenge” Grid Apps Areas

IMS

14

NAREGI Software Stack (Beta Ver. 2006)

Computing Resources and Virtual Organizations

NII IMS Research Organizations

Major University Computing Centers

（（ WSRF (GT4+Fujitsu WP1) + GT4 and other services)WSRF (GT4+Fujitsu WP1) + GT4 and other services)

SuperSINET

Grid-Enabled Nano-Applications (WP6)

Grid PSEGrid Programming (WP2)

-Grid RPC -Grid MPI

Grid Visualization

Grid VM (WP1)

Packag

ing

DistributedInformation Service

(CIM)

Grid Workflow (WFML (Unicore+ WF))

Super Scheduler

Grid Security and High-Performance Grid Networking (WP5)

Data (W

P4)

WP3

WP1WP1

15

GridMPI• MPI applications run on the Grid environment• Metropolitan area, high-bandwidth environment: 10 Gpbs,

500 miles (smaller than 10ms one-way latency)– Parallel Computation

• Larger than metropolitan area– MPI-IO

Wide-areaNetwork

Single (monolithic) MPI applicationover the Grid environment

computing resourcesite A

computing resourcesite A

computing resourcesite B

computing resourcesite B

16

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE Infrastructure

Scale> 180 sites in 39 countries

~ 20 000 CPUs

> 5 PB storage

> 10 000 concurrent jobs per day

> 60 Virtual Organisations

Country participating

in EGEE

17


INFSO-RI-508833

The EGEE project

• Objectives– Large-scale, production-quality infrastructure for e-Science

leveraging national and regional grid activities worldwide consistent, robust and secure

– improving and maintaining the middleware– attracting new resources and users from industry as well as science

• EGEE – 1st April 2004 – 31 March 2006– 71 leading institutions in 27 countries,

federated in regional Grids

• EGEE-II– Proposed start 1 April 2006 (for 2 years)– Expanded consortium

> 90 partners in 32 countries (also non-European partners)

Related projects, incl. • BalticGrid• SEE-GRID• EUMedGrid• EUChinaGrid• EELA

18


INFSO-RI-508833

Applications on EGEE

• More than 20 applications from 7 domains– High Energy Physics

4 LHC experiments (ALICE, ATLAS, CMS, LHCb) BaBar, CDF, DØ, ZEUS

– Biomedicine Bioinformatics (Drug Discovery, GPS@, Xmipp_MLrefine, etc.) Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D, etc.)

– Earth Sciences Earth Observation, Solid Earth Physics,

Hydrology, Climate

– Computational Chemistry– Astronomy

MAGIC Planck

– Geo-Physics EGEODE

– Financial Simulation E-GRID

Ano

ther

8 a

pplic

atio

ns fr

om 4

dom

ains

are

in e

valu

atio

n st

age

19


INFSO-RI-508833

Steps for “Grid-enabling” applications II

• Tools to easily access Grid resources through high level Grid middleware (gLite) – VO management (VOMS etc.)– Workload management– Data management– Information and monitoring

• Application can– interface directly to gLite

or – use higher level services such as portals, application specific

workflow systems etc.

20


INFSO-RI-508833

EGEE Performance Measurements

• Information about resources (static & dynamic)– Computing: machine properties (CPUs, memory architecture, ..),

platform properties (OS, compiler, other software, …), load– Data: storage location, access properties, load– Network: bandwidth, load

• Information about applications– Static: computing and data requirements to reduce search space– Dynamic: changes in computing and data requirements (might need re-

scheduling)

Plus• Information about Grid services (static & dynamic)

– Which services available Status Capabilities

21


INFSO-RI-508833

Sustainability: Beyond EGEE-II

• Need to prepare for permanent Grid infrastructure– Maintain Europe’s leading position in global science Grids– Ensure a reliable and adaptive support for all sciences– Independent of project funding cycles– Modelled on success of GÉANT

Infrastructure managed centrally in collaboration with national bodies

Permanent Grid Infrastructure

e-Infrastructures Reflection Group:

e-IRG Mission:

… to support on political, advisory and monitoring level,

the creation of a policy and administrative framework

for the easy and cost-effective shared use of electronic resources in Europe

(focusing on Grid-computing, data storage, and networking resources)

across technological, administrative and national domains.

DEISA PerspectivesTowards cooperative extreme computing in Europe

Victor Alessandrini

IDRIS - CNRS

[email protected]

Fourth EGEE ConferencePise, October 23 - 28, 2005

V. Alessandrini, IDRIS-CNRS 24

The DEISA Supercomputing Environment(21.900 processors and 145 Tf in 2006, more than 190 Tf in 2007)

• IBM AIX Super-cluster

– FZJ-Julich, 1312 processors, 8,9 teraflops peak

– RZG – Garching, 748 processors, 3,8 teraflops peak

– IDRIS, 1024 processors, 6.7 teraflops peak

– CINECA, 512 processors, 2,6 teraflops peak

– CSC, 512 processors, 2,6 teraflops peak

– ECMWF, 2 systems of 2276 processors each, 33 teraflops peak

– HPCx, 1600 processors, 12 teraflops peak

• BSC, IBM PowerPC Linux system (MareNostrum) 4864 processeurs, 40 teraflops peak

• SARA, SGI ALTIX Linux system, 1024 processors, 7 teraflops peak

• LRZ, Linux cluster (2.7 teraflops) moving to SGI ALTIX system (5120 processors and 33 teraflops peak in 2006, 70 teraflops peak in 2007)

• HLRS, NEC SX8 vector system, 646 processors, 12,7 teraflops peak.



DEISA objectives

• To enable Europe’s terascale science by the integration of Europe’s most powerful supercomputing systems.

• Enabling scientific discovery across a broad spectrum of science and technology is the only criterion for success

• DEISA is a European Supercomputing Service built on top of existing national services.

• Integration of national facilities and services, together with innovative operational models

• Main focus is HPC and Extreme Computing applications that cannot by supported by the isolated national services

• Service providing model is the transnational extension of national HPC centers: – Operations, – User Support and Applications Enabling, – Network Deployment and Operation, – Middleware services.



About HPC

• Dealing with large complex systems requires exceptional computational resources. For algorithmic reasons, resources grow much faster than the systems size and complexity.

• Dealing with huge datasets, involving large files. Typical datasets are several PBytes.

• Little usage of commercial or public domain packages. Most applications are corporate codes incorporating specialized know how. Specialized user support is important.

• Codes are fine tuned and targeted for a relatively small number of well identified.computing platforms. They are extremely sensitive to the production environment.

• Main requirement for high performance is bandwidth (processor to memory, processor to processor, node to node, system to system).



HPC and Grid Computing

• Problem: the speed of light is not big enough• Finite signal propagation speed boosts message passing latencies in a

WAN from a few microseconds to tens of milliseconds (if A is in Paris and B in Helsinki)

• If A and B are two halves of a tightly coupled complex system, communications are frequent and the enhanced latencies will kill performance.

• Grid computing works best for embarrassingly parallel applications, or coupled software modules with limited communications.

• Example: A is an ocean code, and B an atmospheric code. There is no bulk interaction.

• Large, tightly coupled parallel applications should be run in a single platform. This is why we still need high end supercomputers.

• DEISA implements this requirement by rerouting jobs and balancing the computational workload at a European scale.


Applications for Grids• Single-CPU Jobs: jobmix, many users, many serial applications,

suitable for grid (e.g in universities and research centers)

• Array Jobs: 100s/1000s of jobs, one user, one serial application, varying input parameters, suitable for grid (e.g. parameter studies in Optimization, CAE, Genomics, Finance)

• Massively Parallel Jobs, loosely coupled: one job, one user, one parallel application, no/low communication, scalable, fine-tune for grid (time-explicit algorithms, film rendering, pattern recognition)

• Parallel Jobs, tightly coupled: one job, one user, one parallel application, high interprocs communication, not suitable for distribution over the grid, but for parallel system in the grid (time-implicit algorithms, direct solvers, large linear algebra equation systems)


Objectives of e-Science Initiative

Building one Grid Infrastructure in Germany Combine existing German grid activities

Development of e-science services for the research community Science Service Grid: „Services for Scientists“

Important: Sustainability Production grid infrastructure after the funding period Integration of new grid communities (2. generation) Evaluation of new business models for grid services

German D-Grid ProjectPart of 100 Mio Euro e-Science in Germany


e-Science Projects

Generic Grid Middleware and Grid Services

Integration Project

As

tro

-Gri

d

C3

-Gri

d

HE

P-G

rid

IN-G

rid

Me

diG

rid

ON

TO

VE

RS

E

WIK

ING

ER

WIN

-EM

Te

xtg

rid

VIOLA eSciDoc

. . .

D-Grid Knowledge Management

Im W

iss

en

sne

tz


DGI D-Grid Middleware Infrastructure

Nutzer

ApplicationDevelopment

and User Access

GAT API

Data/Software

Resourcesin D-Grid

High-levelGrid

Services

Basic Grid Services

DistributedData Archive

User

NetworkInfrastructur

LCG/gLite

Globus 4.0.1

AccountingBilling

User/VO-Mngt

SchedulingWorkflow Management

Data management

Security

Plug-In

UNICORE

DistributedCompute Resources

GridSphere

Monitoring


Key Characteristics of D-Grid

Generic Grid infrastructure for German research communities

Focus on Sciences and Scientists, not industry

Strong influence of international projects: EGEE, Deisa, CrossGrid, CoreGrid, GridLab, GridCoord, UniGrids, NextGrid, …

Application-driven (80% of funding), not infrastructure-driven

Focus on implementation, not research

Phase 1 & 2: 50 MEuro, 100 research organizations


Conclusion:

moving towardsSustainable Grid Infrastructures

OR

Why Grids are here to stay !


• Resource Utilization: increase from 20% to 80+%• Productivity: more work done in shorter time • Agility: flexible actions and re-actions • On Demand: get resources, when you need them• Easy Access: transparent, remote, secure• Sharing: enable collaboration over the network• Failover: migrate/restart applications automatically• Resource Virtualization: access compute services, not servers• Heterogeneity: platforms, OSs, devices, software• Virtual Organizations: build & dismantle on the fly

Reason #1: Benefits


Reason #2: StandardsThe Global Grid Forum

• Community-driven set of working groups that are developing standards and best practices for distributed computing efforts

• Three primary functions: community, standards, and operations

• Standards Areas: Infrastructure, Data, Compute, Architecture, Applications, Management, Security, and Liaison

• Community Areas: Research Applications, Industry Applications, Grid Operations, Technology Innovations, and Major Grid Projects

• Community Advisory Board represents the different communities and provides input and feedback to GGF


Reason #3: Industry EGA, Enterprise Grid Alliance

• Industry-driven consortium to implement standards in industry products and make them interoperable

• Founding members: EMC, Fujitsu Siemens Computers, HP, NEC, Network Appliance, Oracle and Sun, plus 20+ Associate Members

• May 11, 2005: Enterprise Grid Reference Model v1.0


Reason #3: Industry EGA, Enterprise Grid Alliance

• Industry-driven consortium to implement standards in industry products and make them interoperable

• Founding members: EMC, Fujitsu Siemens Computers, HP, NEC, Network Appliance, Oracle and Sun, plus 20+ Associate Members

• May 11, 2005: Enterprise Grid Reference Model v1.0

Feb06: GGF & EGF signed a letter of intent to merge. A joint team is planning the transition, expected to be complete this summer


Reason #4: OGSAONE Open Grid Services Architecture

OGSA

Web ServicesGrid Technologies

OGSA Open Grid Service Architecture

Integrates grid technologies with Web Services (OGSA => WS-RF)

Defines the key components of the grid

OGSA enables the integration of services and resources across distributed, heterogeneous, dynamic, virtual organizations – whether within a single enterprise

or extending to external resource-sharing and service-provider relationships.”


Reason #5: Quasi-Standard Tools Example: The Globus Toolkit

2. discover resource, MDS

3. submit job, GRAM

4. transfer data, GridFTP

1. secure environment, GSI

• Globus Toolkit provides four major functions for building grids

Courtesy Gridwise Technologies


• Seamless, secure, intuitive access to distributed resources & data• Available as Open Source • Features: intuitive GUI with single sign-on, X.509 certificates for

AA, workflow engine for multi-site, multi-step workflows, job monitoring, application support, secure data transfer, resource management, and more

• In production

Courtesy: Achim Streit, FZJ

. . . . and


Glo

bus

2.4

UN

ICO

RE

Globus 2

UNICORE

TSIGridFTP Client

Client

NJSUUDB

Uspace

IDB

GRAM Client

GRAM Job-Manager GridFTP Server

RMS

GRAM Gatekeeper

Gateway

MDS

Workflow Engine

FileTransfer

UserManagement

(AAA)

MonitoringResource

ManagementApplication

Support

WS-RF WS-RFWS-RF

WS-RF WS-RFWS-RF

Network Job Network Job SupervisorSupervisor

Gateway + Service RegistryWS-RF

Client PortalCommand

LineWS-RF WS-RFWS-RF

WS-Resource based Resource Management Framework for dynamic resource information and resource negotiation

Gateway

Courtesy: Achim Streit, FZJ


Reason #6: Global Grid Community


#7: Projects/Initiatives Testbeds Companies Altair Avaki Axceleon Cassatt Datasynapse Egenera Entropia eXludus GridFrastructure GridIron GridSystems Gridwise GridXpert HP Utility Data Center IBM Grid Toolbox Kontiki Metalogic Noemix Oracle 10g Parabon Platform Popular Power Powerllel/Aspeed Proxima Softricity Sun N1 TurboWorx United Devices Univa . . .

ActiveGrid BIRN Condor-G Deisa Dame EGA EnterTheGrid GGF Globus Globus Alliance GridBus GridLab GridPortal GRIDtoday GriPhyN I-WAY Knowledge Grid Legion MyGrid NMI OGCE OGSA OMII PPDG Semantic Grid TheGridReport UK eScience Unicore . . .

CO Grid Compute-against-Cancer D-Grid DeskGrid DOE Science Grid EEGE EuroGrid European DataGrid FightAIDS@home Folding@home GRIP NASA IPG NC BioGrid NC Startup Grid NC Statewide Grid NEESgrid NextGrid Nimrod Ninf NRC-BioGrid OpenMolGrid OptIPuter Progress SETI@home TeraGrid UniGrids Virginia Grid WestGrid White Rose Grid . . .

44

Information Society and Media Directorate-General – European CommissionUnit Grid Technologies

GGF16 – Athens, 15 February 2006

#8: FP6 Grid Technologies Projects#8: FP6 Grid Technologies Projects

DataminingGrid

OntoGrid

InteliGridK-WF GridCoreGRIDsix virtual laboratories

UniGrids HPC4U

Provenance

GridCoord Grid@Asia

NextGRIDservice

architecture

Akogrimomobile

services

SIMDATindustrial

simulations

data, knowledge, data, knowledge, semantics, miningsemantics, mining

KnowArc Chemomen tum

A-Ware Sorma

platforms, user platforms, user environmentsenvironments

Specific support action Integrated project Network of excellence Specific targeted research project

g-Eclipse

Gredia

GridComp

QosCosGrid

Grid4all

AssessGridGridTrust

trust, securitytrust, security

Grid services, Grid services, business modelsbusiness models

ArguGrid Edutain@ Grid

GridEcon

Nessi-GridChallengers Degree

BREINagents &

semanticsXtreemOS

Linux based Grid

operating system

supporting the NESSI ETP & Grid communitysupporting the NESSI ETP & Grid community

BeinGridbusiness

experiments

EU Funding: 124 M€EU Funding: 124 M€ Call 5 start: Summer 2006Call 5 start: Summer 2006


Reason #9: Enterprise Grids

Servers,

Blades,

& VIZ

Workstations

Grid Manager

Linux Racks

Optional Control Network (Gbit-E)

Data Network (Gbit-E)

HA NFSScalable QFS/NFS

NAS/NFS

Myrinet

Myrinet Myrinet

Sun Fire Link

Gbit-E switch Gbit-E switch

V880 QFS/NFS Server V880 QFS/NFS Server

FC Switch

V240 / V880 NFSV240 / V880 NFS

Gbit-E switch

Simple NFS

V240 / V880 NFS


V240 / V880 NFS

SunRay Access

Browser Accessvia GEP

Workstation Access


Enterprise Grid Reference Architecture

Servers,

Blades,

& VIZ

Workstations

Grid Manager

Linux Racks

Optional Control Network (Gbit-E)

Data Network (Gbit-E)

HA NFSScalable QFS/NFS

NAS/NFS

Myrinet

Myrinet Myrinet

Sun Fire Link


V880 QFS/NFS Server V880 QFS/NFS Server

FC Switch

V240 / V880 NFSV240 / V880 NFS

Gbit-E switch

Simple NFS

V240 / V880 NFS


V240 / V880 NFS

SunRay Access

Browser Accessvia GEP

Workstation Access

Access

Compute

Data


1000s of Enterprise Grids in Industry

• Life SciencesStartup and cost efficient

Custom research or limited use applications

Multi-day application runs (BLAST)

Exponential Combinations

Limited administrative staff

Complementary techniques

● Electronic DesignTime to Market

Fastest platforms, largest Grids

License Management

Well established application suite

Large legacy investment

Platform Ownership issues

● Financial ServicesMarket simulations

Time IS Money

Proprietary applications

Multiple Platforms

Multiple scenario execution

Need instant results & analysis tools

● High Performance Computing

Parallel Reservoir Simulations

Geophysical Ray Tracing

Custom in-house codes

Large scale, multi-platform execution


Reason #10 : Grid Service Providers Example: BT

• Inside data center, within Firewall• Virtual use of own IT assets• The GRID virtualiser engine inside

Firewall:– Opens up under-used ICT assets– improves TCO, ROI and Apps

performance

BUT• Intra-enterprise GRID is self limiting

– Pool of virtualised assets is restricted by firewall

– Does not support Inter-Enterprise usage

• BT is focussing on managed Grid solution

WANS LANS

ENTERPRISE

Pre-GRIDIT asset usage 10-15 %

WANSLANS

ENTERPRISE

Post-GRIDIT asset usage 70-75 %

GRID EngineVirtualised

assets

Courtesy: Piet Bel, BT


BT’s Virtual Private Grid ( VPG )

Virtualised IT assets

GRID Engine

WANS LANS

ENTERPRISE

WANS LANS

ENTERPRISE A

GRID ENGINEBT NETWORK

Courtesy: Piet Bel, BT


Reason #11: There will be a Market for Grids


• Today, there are 100s of important grid projects around the world • GGF identifies about 15 research projects which have major impact• Most research grids focus on HPC and collaboration, most industry

grids focus on utilization and automation• Many grids are driven by user / application needs, few grid projects are

driven by infrastructure research • Few projects focus on performance / benchmarks where performance

is mostly seen at the job / computation / application level• Need for metrics and measurements that help us understand grids• In a grid, application performance has 3 major areas of concern:

system capabilities, network, and software infrastructure• Evaluating performance in a grid is different from classic benchmarking,

because grids are dynamically changing systems incorporating new components.

General Observations on Grid Performance


The Grid Engine

[email protected]

Thank You !

HPGC 2006 Workshop on High-Performance Grid Computing

Documents