HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006 HPGC 2006 Workshop on High-Performance Grid Computing at IPDPC 2006 Rhodes Island, Greece, April 25 – 29, 2006 Major HPC Grid Projects From Grid Testbeds to Sustainable High-Performance Grid Infrastructures Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG, e-IRG [email protected]Thanks to: Eric Aubanel, Virendra Bhavsar, Michael Frumkin, Rob F. Van der Wijngaart
52
Embed
HPGC 2006 Workshop on High-Performance Grid Computing
HPGC 2006 Workshop on High-Performance Grid Computing at IPDPC 2006 Rhodes Island, Greece, April 25 – 29, 2006 Major HPC Grid Projects From Grid Testbeds to Sustainable High-Performance Grid Infrastructures Wolfgang Gentzsch, D-Grid, RENCI, GGF GFSG, e-IRG [email protected] Thanks to: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
HPGC 2006 Workshop on High-Performance Grid Computingat IPDPC 2006 Rhodes Island, Greece, April 25 – 29, 2006
• Resources– 4 Core clusters – UK’s National HPC services– A range of partner contributions
• Access– Support UK academic researchers– Light weight peer review for limited “free” resources
• Central help desk – www.grid-support.ac.uk
7Neil Geddes
CCLRC e-Science
NGS Overview:Oganisational view
• Management– GOSC Board
• Strategic direction– Technical Board
• Technical coordination and policy
• Grid Operations Support Centre– Manages the NGS– Operates the UK CA + over 30 RA’s– Operates central helpdesk– Policies and procedures– Manage and monitor partners
Number of Registered NGS Users
0
50
100
150
200
250
300
14 January2004
23 April2004
01 August2004
09November
2004
17February
2005
28 May2005
05September
2005
14December
2005
Date
Nu
mb
er o
f U
sers
NGS UserRegistrations
Linear (NGS UserRegistrations)
NGS UseUsage Statistics (Total Hours for all 4 Core Nodes)
• Tools to easily access Grid resources through high level Grid middleware (gLite) – VO management (VOMS etc.)– Workload management– Data management– Information and monitoring
• Application can– interface directly to gLite
or – use higher level services such as portals, application specific
workflow systems etc.
20
Enabling Grids for E-sciencE
INFSO-RI-508833
EGEE Performance Measurements
• Information about resources (static & dynamic)– Computing: machine properties (CPUs, memory architecture, ..),
• Information about applications– Static: computing and data requirements to reduce search space– Dynamic: changes in computing and data requirements (might need re-
scheduling)
Plus• Information about Grid services (static & dynamic)
– Which services available Status Capabilities
21
Enabling Grids for E-sciencE
INFSO-RI-508833
Sustainability: Beyond EGEE-II
• Need to prepare for permanent Grid infrastructure– Maintain Europe’s leading position in global science Grids– Ensure a reliable and adaptive support for all sciences– Independent of project funding cycles– Modelled on success of GÉANT
Infrastructure managed centrally in collaboration with national bodies
Permanent Grid Infrastructure
e-Infrastructures Reflection Group:
e-IRG Mission:
… to support on political, advisory and monitoring level,
the creation of a policy and administrative framework
for the easy and cost-effective shared use of electronic resources in Europe
(focusing on Grid-computing, data storage, and networking resources)
across technological, administrative and national domains.
DEISA PerspectivesTowards cooperative extreme computing in Europe
• To enable Europe’s terascale science by the integration of Europe’s most powerful supercomputing systems.
• Enabling scientific discovery across a broad spectrum of science and technology is the only criterion for success
• DEISA is a European Supercomputing Service built on top of existing national services.
• Integration of national facilities and services, together with innovative operational models
• Main focus is HPC and Extreme Computing applications that cannot by supported by the isolated national services
• Service providing model is the transnational extension of national HPC centers: – Operations, – User Support and Applications Enabling, – Network Deployment and Operation, – Middleware services.
Fourth EGEE ConferencePise, October 23 - 28, 2005
V. Alessandrini, IDRIS-CNRS 26
About HPC
• Dealing with large complex systems requires exceptional computational resources. For algorithmic reasons, resources grow much faster than the systems size and complexity.
• Dealing with huge datasets, involving large files. Typical datasets are several PBytes.
• Little usage of commercial or public domain packages. Most applications are corporate codes incorporating specialized know how. Specialized user support is important.
• Codes are fine tuned and targeted for a relatively small number of well identified.computing platforms. They are extremely sensitive to the production environment.
• Main requirement for high performance is bandwidth (processor to memory, processor to processor, node to node, system to system).
Fourth EGEE ConferencePise, October 23 - 28, 2005
V. Alessandrini, IDRIS-CNRS 27
HPC and Grid Computing
• Problem: the speed of light is not big enough• Finite signal propagation speed boosts message passing latencies in a
WAN from a few microseconds to tens of milliseconds (if A is in Paris and B in Helsinki)
• If A and B are two halves of a tightly coupled complex system, communications are frequent and the enhanced latencies will kill performance.
• Grid computing works best for embarrassingly parallel applications, or coupled software modules with limited communications.
• Example: A is an ocean code, and B an atmospheric code. There is no bulk interaction.
• Large, tightly coupled parallel applications should be run in a single platform. This is why we still need high end supercomputers.
• DEISA implements this requirement by rerouting jobs and balancing the computational workload at a European scale.
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Applications for Grids• Single-CPU Jobs: jobmix, many users, many serial applications,
suitable for grid (e.g in universities and research centers)
• Array Jobs: 100s/1000s of jobs, one user, one serial application, varying input parameters, suitable for grid (e.g. parameter studies in Optimization, CAE, Genomics, Finance)
• Massively Parallel Jobs, loosely coupled: one job, one user, one parallel application, no/low communication, scalable, fine-tune for grid (time-explicit algorithms, film rendering, pattern recognition)
• Parallel Jobs, tightly coupled: one job, one user, one parallel application, high interprocs communication, not suitable for distribution over the grid, but for parallel system in the grid (time-implicit algorithms, direct solvers, large linear algebra equation systems)
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Objectives of e-Science Initiative
Building one Grid Infrastructure in Germany Combine existing German grid activities
Development of e-science services for the research community Science Service Grid: „Services for Scientists“
Important: Sustainability Production grid infrastructure after the funding period Integration of new grid communities (2. generation) Evaluation of new business models for grid services
German D-Grid ProjectPart of 100 Mio Euro e-Science in Germany
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
e-Science Projects
Generic Grid Middleware and Grid Services
Integration Project
As
tro
-Gri
d
C3
-Gri
d
HE
P-G
rid
IN-G
rid
Me
diG
rid
ON
TO
VE
RS
E
WIK
ING
ER
WIN
-EM
Te
xtg
rid
VIOLA eSciDoc
. . .
D-Grid Knowledge Management
Im W
iss
en
sne
tz
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
DGI D-Grid Middleware Infrastructure
Nutzer
ApplicationDevelopment
and User Access
GAT API
Data/Software
Resourcesin D-Grid
High-levelGrid
Services
Basic Grid Services
DistributedData Archive
User
NetworkInfrastructur
LCG/gLite
Globus 4.0.1
AccountingBilling
User/VO-Mngt
SchedulingWorkflow Management
Data management
Security
Plug-In
UNICORE
DistributedCompute Resources
GridSphere
Monitoring
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Key Characteristics of D-Grid
Generic Grid infrastructure for German research communities
Focus on Sciences and Scientists, not industry
Strong influence of international projects: EGEE, Deisa, CrossGrid, CoreGrid, GridLab, GridCoord, UniGrids, NextGrid, …
Application-driven (80% of funding), not infrastructure-driven
Focus on implementation, not research
Phase 1 & 2: 50 MEuro, 100 research organizations
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Conclusion:
moving towardsSustainable Grid Infrastructures
OR
Why Grids are here to stay !
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
• Resource Utilization: increase from 20% to 80+%• Productivity: more work done in shorter time • Agility: flexible actions and re-actions • On Demand: get resources, when you need them• Easy Access: transparent, remote, secure• Sharing: enable collaboration over the network• Failover: migrate/restart applications automatically• Resource Virtualization: access compute services, not servers• Heterogeneity: platforms, OSs, devices, software• Virtual Organizations: build & dismantle on the fly
Reason #1: Benefits
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Reason #2: StandardsThe Global Grid Forum
• Community-driven set of working groups that are developing standards and best practices for distributed computing efforts
• Three primary functions: community, standards, and operations
• Community Areas: Research Applications, Industry Applications, Grid Operations, Technology Innovations, and Major Grid Projects
• Community Advisory Board represents the different communities and provides input and feedback to GGF
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Reason #3: Industry EGA, Enterprise Grid Alliance
• Industry-driven consortium to implement standards in industry products and make them interoperable
• Founding members: EMC, Fujitsu Siemens Computers, HP, NEC, Network Appliance, Oracle and Sun, plus 20+ Associate Members
• May 11, 2005: Enterprise Grid Reference Model v1.0
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Reason #3: Industry EGA, Enterprise Grid Alliance
• Industry-driven consortium to implement standards in industry products and make them interoperable
• Founding members: EMC, Fujitsu Siemens Computers, HP, NEC, Network Appliance, Oracle and Sun, plus 20+ Associate Members
• May 11, 2005: Enterprise Grid Reference Model v1.0
Feb06: GGF & EGF signed a letter of intent to merge. A joint team is planning the transition, expected to be complete this summer
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Reason #4: OGSAONE Open Grid Services Architecture
OGSA
Web ServicesGrid Technologies
OGSA Open Grid Service Architecture
Integrates grid technologies with Web Services (OGSA => WS-RF)
Defines the key components of the grid
OGSA enables the integration of services and resources across distributed, heterogeneous, dynamic, virtual organizations – whether within a single enterprise
or extending to external resource-sharing and service-provider relationships.”
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Reason #5: Quasi-Standard Tools Example: The Globus Toolkit
2. discover resource, MDS
3. submit job, GRAM
4. transfer data, GridFTP
1. secure environment, GSI
• Globus Toolkit provides four major functions for building grids
Courtesy Gridwise Technologies
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
• Seamless, secure, intuitive access to distributed resources & data• Available as Open Source • Features: intuitive GUI with single sign-on, X.509 certificates for
AA, workflow engine for multi-site, multi-step workflows, job monitoring, application support, secure data transfer, resource management, and more
• In production
Courtesy: Achim Streit, FZJ
. . . . and
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Glo
bus
2.4
UN
ICO
RE
Globus 2
UNICORE
TSIGridFTP Client
Client
NJSUUDB
Uspace
IDB
GRAM Client
GRAM Job-Manager GridFTP Server
RMS
GRAM Gatekeeper
Gateway
MDS
Workflow Engine
FileTransfer
UserManagement
(AAA)
MonitoringResource
ManagementApplication
Support
WS-RF WS-RFWS-RF
WS-RF WS-RFWS-RF
Network Job Network Job SupervisorSupervisor
Gateway + Service RegistryWS-RF
Client PortalCommand
LineWS-RF WS-RFWS-RF
WS-Resource based Resource Management Framework for dynamic resource information and resource negotiation
Gateway
Courtesy: Achim Streit, FZJ
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Reason #6: Global Grid Community
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
#7: Projects/Initiatives Testbeds Companies Altair Avaki Axceleon Cassatt Datasynapse Egenera Entropia eXludus GridFrastructure GridIron GridSystems Gridwise GridXpert HP Utility Data Center IBM Grid Toolbox Kontiki Metalogic Noemix Oracle 10g Parabon Platform Popular Power Powerllel/Aspeed Proxima Softricity Sun N1 TurboWorx United Devices Univa . . .
• Inside data center, within Firewall• Virtual use of own IT assets• The GRID virtualiser engine inside
Firewall:– Opens up under-used ICT assets– improves TCO, ROI and Apps
performance
BUT• Intra-enterprise GRID is self limiting
– Pool of virtualised assets is restricted by firewall
– Does not support Inter-Enterprise usage
• BT is focussing on managed Grid solution
WANS LANS
ENTERPRISE
Pre-GRIDIT asset usage 10-15 %
WANSLANS
ENTERPRISE
Post-GRIDIT asset usage 70-75 %
GRID EngineVirtualised
assets
Courtesy: Piet Bel, BT
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
BT’s Virtual Private Grid ( VPG )
Virtualised IT assets
GRID Engine
WANS LANS
ENTERPRISE
WANS LANS
ENTERPRISE A
GRID ENGINEBT NETWORK
Courtesy: Piet Bel, BT
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
Reason #11: There will be a Market for Grids
HPGC 2006 IPDPS Rhodes Island, Greece, 29.4.2006
• Today, there are 100s of important grid projects around the world • GGF identifies about 15 research projects which have major impact• Most research grids focus on HPC and collaboration, most industry
grids focus on utilization and automation• Many grids are driven by user / application needs, few grid projects are
driven by infrastructure research • Few projects focus on performance / benchmarks where performance
is mostly seen at the job / computation / application level• Need for metrics and measurements that help us understand grids• In a grid, application performance has 3 major areas of concern:
system capabilities, network, and software infrastructure• Evaluating performance in a grid is different from classic benchmarking,
because grids are dynamically changing systems incorporating new components.