27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 1 Distributed Data Management Distributed Data Management at DKRZ at DKRZ Wolfgang Sell Wolfgang Sell Deutsches Klimarechenzentrum Deutsches Klimarechenzentrum GmbH GmbH sell@ sell@ dkrz dkrz .de .de
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 1
Distributed Data ManagementDistributed Data Management
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 2
Table of Table of ContentsContents
• DKRZ - a German HPC Center
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Some Results
• Some Lessons Learnt
• Summary
• DKRZ - a German HPC Center
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Some Results
• Some Lessons Learnt
• Summary
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 3
DKRZ - a German HPCCDKRZ DKRZ -- a German HPCCa German HPCC
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• DKRZ Restructuring
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• DKRZ Restructuring
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 4
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing and data service to the German scientific community to conduct top of the line Earth System and Climate Modelling.
• Provide associated services including high level visualization.
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing and data service to the German scientific community to conduct top of the line Earth System and Climate Modelling.
• Provide associated services including high level visualization.
Mission of DKRZMission of DKRZ
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 5
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
DKRZ and its Organization (1)DKRZ and its Organization (1)
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 6
DKRZ internal Structure
• 3 departments for• systems and networks• visualisation and consulting• administration
• 20 staff in total• until restructuring end of 1999 a fourth department
supported climate model applications and climate data management
DKRZ internal Structure
• 3 departments for• systems and networks• visualisation and consulting• administration
• 20 staff in total• until restructuring end of 1999 a fourth department
supported climate model applications and climate data management
DKRZ and its Organization (2)DKRZ and its Organization (2)
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 7
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
DKRZ ServicesDKRZ Services
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 8
Application center: Model & Data
• professional handling of community models• specific scenario runs, e.g. IPCC• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
Application center: Model & Data
• professional handling of community models• specific scenario runs, e.g. IPCC• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
Model & Data ServicesModel & Data Services
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 9
HPC Systemarchitecture suited for Earth System Modeling
HPCHPC Systemarchitecture Systemarchitecture suited forsuited for Earth SystemEarth System ModelingModeling
• Principal HPC System Configuration
• Configuration Variants
• Links between Different Services
• The Data Problem
• Pros and Cons of Shared Filesystems
• Principal HPC System Configuration
• Configuration Variants
• Links between Different Services
• The Data Problem
• Pros and Cons of Shared Filesystems
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 10
Generic HPC System ConfigurationGenericGeneric HPC System HPC System ConfigurationConfiguration
80%
20%
CS DS
Global Systemarchitecture
Rest of theWorld
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 11
Variants of System Configuration (1)VariantsVariants of System of System Configuration Configuration (1)(1)
80%
20%
CS DS
Shared Filesystem
Rest of theWorld
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 12
Variants of System Configuration (2)VariantsVariants of System of System Configuration Configuration (2)(2)
80%
20%
CS DS
Classical LAN-Coupling
Rest of theWorld
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 13
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
• Balance between Computational and Data Management Capabilities
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
• Balance between Computational and Data Management Capabilities
Link between Compute Powerand Non-Computing ServicesLink between Compute Powerand Non-Computing Services
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 14
Evolution of Computing Power at DKRZEvolution of Evolution of ComputingComputing Power Power at DKRZat DKRZ
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 15
Adaptation Problem for Data ServerAdaptation Problem Adaptation Problem for Datafor Data ServerServer
Dataproblem in HPC
0
500
1.000
1.500
2.000
2.500
3.000
0 50 100 150 200 250 300 350 400 450 500
Effective Compute Power (P) in GFlops
Dat
ener
zeu
gu
ng
srat
e in
TB
yte/
Jah
r
data increase:
linear, P1
P3/4
P2/3
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 16
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
Pros of Shared Filesystem CouplingPros of Shared Filesystem Coupling
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 17
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Systems that can be connected
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Systems that can be connected
Cons of Shared Filesystem CouplingCons of Shared Filesystem Coupling
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 18
HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ
HöchstLeistungsRechnersystem für die Erdsystemforschung = HLREHigh Performance Computer System for Earth System Research
•Principal HLRE System Configuration
• Requirements and Constraints
• Links between Different Services
• Option for Systemoperation
HöchstLeistungsRechnersystem für die Erdsystemforschung = HLREHigh Performance Computer System for Earth System Research
•Principal HLRE System Configuration
• Requirements and Constraints
• Links between Different Services
• Option for Systemoperation
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 19
Principal HLRE System ConfigurationPrincipalPrincipal HLRE System HLRE System ConfigurationConfiguration
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 20
Hardware at DKRZ(October 2004)Hardware at DKRZHardware at DKRZ(October 2004)(October 2004)
• 24 SX-6 Nodes (192 Vector CPUs, 1,5 TByte CM and 1,5 Tflops peak)
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 35
Some Lessons LearntSome Lessons LearntSome Lessons Learnt
• Current Implementation of Non-Computing Services needs Significant Amount of Local Disk Space, e.g.HSM and DBMS need their Own Cache
• Lack of Standardisation for Shared FilesystemsDependence on Co-operativeness, e.g.Graphics Server IntegrationPre/Post-Processing Servers from Different Vendors
• Fail-over Solutions needed inComplex Distributed Systems
• Current Implementation of Non-Computing Services needs Significant Amount of Local Disk Space, e.g.HSM and DBMS need their Own Cache
• Lack of Standardisation for Shared FilesystemsDependence on Co-operativeness, e.g.Graphics Server IntegrationPre/Post-Processing Servers from Different Vendors
• Fail-over Solutions needed inComplex Distributed Systems
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 36
Some Lessons Learnt, cont.Some Lessons LearntSome Lessons Learnt, , contcont..
• Server Scalability needed, but no ProblemClient Scalability may be a Problem, e.g128 LUN Limitation for Linux 2.4
• Distributed Servers may Generate Intriguing Dependencies, i.e. clearly Structured High LevelServices do not Guarantee Ease of PerformantOperation
• Server Scalability needed, but no ProblemClient Scalability may be a Problem, e.g128 LUN Limitation for Linux 2.4
• Distributed Servers may Generate Intriguing Dependencies, i.e. clearly Structured High LevelServices do not Guarantee Ease of PerformantOperation
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 37
Invocation Period and Lifetime of Dirty Pages for kupdatedInvocation Period and Lifetime of Dirty Pages for kupdated