9-Sept-2003 CAS2003, Annecy, France, WFS 1 Distributed Data Management Distributed Data Management at DKRZ at DKRZ Wolfgang Sell Wolfgang Sell Hartmut Fichtel Hartmut Fichtel Deutsches Klimarechenzentrum GmbH Deutsches Klimarechenzentrum GmbH [email protected], [email protected][email protected], [email protected]
37
Embed
9-Sept-2003CAS2003, Annecy, France, WFS1 Distributed Data Management at DKRZ Distributed Data Management at DKRZ Wolfgang Sell Hartmut Fichtel Deutsches.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
9-Sept-2003 CAS2003, Annecy, France, WFS 1
Distributed Data ManagementDistributed Data Management
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Implementing IA64/Linux based Distributed Data Management
• Some Results
• Summary
• DKRZ - a German HPC Center
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Implementing IA64/Linux based Distributed Data Management
• Some Results
• Summary
9-Sept-2003 CAS2003, Annecy, France, WFS 3
DKRZ - a German HPCCDKRZ - a German HPCCDKRZ - a German HPCCDKRZ - a German HPCC
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• Model and Data Services
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• Model and Data Services
9-Sept-2003 CAS2003, Annecy, France, WFS Page 4
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing
and data service to the German scientific
community to conduct top of the line Earth
System and Climate Modelling.
• Provide associated services including
high level visualization.
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing
and data service to the German scientific
community to conduct top of the line Earth
System and Climate Modelling.
• Provide associated services including
high level visualization.
Mission of DKRZMission of DKRZ
9-Sept-2003 CAS2003, Annecy, France, WFS Page 5
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
• usage 50 % shareholders and 50 % community
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
• usage 50 % shareholders and 50 % community
DKRZ and its Organization (1)DKRZ and its Organization (1)
9-Sept-2003 CAS2003, Annecy, France, WFS Page 6
DKRZ internal Structure
• 3 departments for
• systems and networks
• visualisation and consulting
• administration
• 20 staff in total
• until restructuring end of 1999 a fourth department supported climate model applications and climate data management
DKRZ internal Structure
• 3 departments for
• systems and networks
• visualisation and consulting
• administration
• 20 staff in total
• until restructuring end of 1999 a fourth department supported climate model applications and climate data management
DKRZ and its Organization (2)DKRZ and its Organization (2)
9-Sept-2003 CAS2003, Annecy, France, WFS Page 7
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
DKRZ ServicesDKRZ Services
9-Sept-2003 CAS2003, Annecy, France, WFS Page 8
competence center: Model & Data
• professional handling of community models• specific scenario runs• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
competence center: Model & Data
• professional handling of community models• specific scenario runs• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
Model & Data ServicesModel & Data Services
9-Sept-2003 CAS2003, Annecy, France, WFS 9
HPC Systemarchitecture HPC Systemarchitecture suited for Earth System Modeling suited for Earth System ModelingHPC Systemarchitecture HPC Systemarchitecture suited for Earth System Modeling suited for Earth System Modeling
• Principal HPC System Configuration
• Links between Different Services
• The Data Problem
• Principal HPC System Configuration
• Links between Different Services
• The Data Problem
9-Sept-2003 CAS2003, Annecy, France, WFS Page 10
Principal HPC System ConfigurationPrincipal HPC System ConfigurationPrincipal HPC System ConfigurationPrincipal HPC System Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 11
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
Link between Compute Powerand Non-Computing ServicesLink between Compute Powerand Non-Computing Services
9-Sept-2003 CAS2003, Annecy, France, WFS Page 12
Compute server powerCompute server powerCompute server powerCompute server power
Installed compute power (peak)
0,10
1,00
10,00
100,00
1000,00
10000,00
GF
lop
s
Installed compute power (peak)
0,10
1,00
10,00
100,00
1000,00
10000,00
GF
lop
s
9-Sept-2003 CAS2003, Annecy, France, WFS Page 13
Adaptation Problem for Data ServerAdaptation Problem for Data ServerAdaptation Problem for Data ServerAdaptation Problem for Data Server
Dataproblem in HPC
0
500
1.000
1.500
2.000
2.500
3.000
0 50 100 150 200 250 300 350 400 450 500
Effective Compute Power (P) in GFlops
Da
ten
erz
eu
gu
ng
sra
te in
TB
yte
/Ja
hr
data increase:
linear, P1
P3/4
P2/3
9-Sept-2003 CAS2003, Annecy, France, WFS Page 14
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
Pros of Shared Filesystem CouplingPros of Shared Filesystem Coupling
9-Sept-2003 CAS2003, Annecy, France, WFS Page 15
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Vendors whose Systems can be connected
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Vendors whose Systems can be connected
Cons of Shared Filesystem CouplingCons of Shared Filesystem Coupling
9-Sept-2003 CAS2003, Annecy, France, WFS 16
HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ
HöchstLeistungsRechnersystem für die Erdsystem-forschung = HLREHigh Performance Computer System for Earth System Research
• Principal HLRE System Configuration
• HLRE Installation Phases
• IA64/Linux based Data Services
• Final HLRE Configuration
HöchstLeistungsRechnersystem für die Erdsystem-forschung = HLREHigh Performance Computer System for Earth System Research
• Principal HLRE System Configuration
• HLRE Installation Phases
• IA64/Linux based Data Services
• Final HLRE Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 17
Principal HLRE System ConfigurationPrincipal HLRE System ConfigurationPrincipal HLRE System ConfigurationPrincipal HLRE System Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 18
HLRE PhasesHLRE PhasesHLRE PhasesHLRE Phases
Mass Storage Capacit y [Tbytes] >720 >1400 >3400
Date Feb 2002 4Q 2002 3Q 2003
Nodes 8 16 24
CPUs 64 128 192
Expected Sustained Performance [Gflops]
ca. 200 ca. 350 ca. 500
Expected Increase in Thruput compared to CRAY C916
• CS performance increase • f = 63/100• F = f3/4 = 22.4/31.6• minimal component
performanceindicated in diagram
• implicit user access• local UFS commands
• CS disks with local copies
• shared disks (GFS)
• DS disks for IO buffercache
• Intel/Linux platforms• homogenous HW
• technological challenge
• CS performance increase • f = 63/100• F = f3/4 = 22.4/31.6• minimal component
performanceindicated in diagram
• implicit user access• local UFS commands
• CS disks with local copies
• shared disks (GFS)
• DS disks for IO buffercache
• Intel/Linux platforms• homogenous HW
• technological challenge
CS client(s)
DS
other clients
GE
270/325 MB/s
70/80 MB/s
225/270 MB/s
560/675 MB/s
16.5 TB ~ PB
FC25/30 TB
11 TB
9-Sept-2003 CAS2003, Annecy, France, WFS 22
Implementing IA64/Linux based Implementing IA64/Linux based Distributed Data ManagementDistributed Data ManagementImplementing IA64/Linux based Implementing IA64/Linux based Distributed Data ManagementDistributed Data Management
• Overall Phase 1 Configurations
• Introducing Linux based Distributed HSM
• Introducing Linux based Distributed DBMS
• Final Overall Phase 3 Configuration
• Overall Phase 1 Configurations
• Introducing Linux based Distributed HSM
• Introducing Linux based Distributed DBMS
• Final Overall Phase 3 Configuration
9-Sept-2003 CAS2003, Annecy, France, WFS Page 23
Proposed final phase 3 configurationProposed final phase 3 configurationProposed final phase 3 configurationProposed final phase 3 configuration
HS/MS LAN
GE x 48
x 16
x 25
FE x 2/nodeFor PolestarLite
AsAmA 16wayAsAmA 16wayGFS/Server
UVDM AsAmA 16wayAsAmA 16wayGFS/Server
UVDM
UDSN
AsAmA 4wayAsAmA 4wayGFS/Client
Oracle AsAmA 4wayAsAmA 4way
GFS/ClientOracle
UDSN/UDNL UDSN/UDNL
GFS Disk(Polestar)0.28 x 53=14.8TB
x 36GFS Disk(Polestar)0.28 x 53=14.8TB
x 36
x 8x 8 x 8
x 2 x 2
x 4
x 4
FC x 72
x 8
Disk Cache (Polestar)
0.57TB x 15= 8.5TB
Disk Cache (DDN)
0.69TB x 12= 8.3TB
x 72
Local DiskFC- RAID
0.28TB x20=5.6TB
Local DiskFC- RAID
0.28TB x20=5.6TB
Silkworm 12000
x 20 x 20
x 120
x 32SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6 SX-6SX-6