-
Scientific Supercomputing Center Karlsruhe
HP XC4000 at SSCKxc2.rz.unikarlsruhe.de
hwwxc2.hww.de
Organization – Infrastructure – Architecture
Nikolaus GeersRudolf Lohner
Rechenzentrum, SSCKUniversität Karlsruhe (TH)
[email protected]@rz.unikarlsruhe
-
Scientific Supercomputing Center Karlsruhe
hkzbw
»
High Performance Computing Competence Center of the State of BadenWürttemberg
»
Founded in 2003 by the Universities of Karlsruhe and StuttgartUniversity of Heidelberg joined in 2004
»
Coordinate HPC competency to build a center that is competitive at an international level
»
HPCsystem for nation wide usage HLRSHPCsystem for state wide usage SSCK
» Grid Computing across both sites
» Research activities–
Cooperate with end users in development of new HPC applications
• Life sciences• Environment research•
Energy research
– Grid computing
-
Scientific Supercomputing Center Karlsruhe
hww: Cooperation with Industry
»
Höchstleistungsrechner für Wissenschaft und Wirtschaft (hww) GmbHHigh Performance Computing for Science and Industry
»
Joint operation and management of HPC systems within hww GmbH–
Universities of Stuttgart, Karlsruhe and Heidelberg
– TSystems SfR
– Porsche
» End user support–
academic users HLRS / SSCK
–
industry and research labs TSystems
»
Through hww the new HP XC system will be available to customers from universities as well as from industry and research labs.
-
Scientific Supercomputing Center Karlsruhe
High Performance Computing Competence Center (HPTC3)
»
Cooperation of SSC Karlsruhe, HP and Intel–
Similar Cooperation is planned with AMD
»
Extending XC system and testing of new features–
Integration of XC and Lustre
–
Integration of different node types into XC system
–
High availability of critical resources
– Monitoring
» Training and Education–
Usage of XC system
–
Optimization and tuning of application codes
»
Porting and tuning of ISV codes
» Program development tools
-
Scientific Supercomputing Center Karlsruhe
Development of HPC Systems xc1 and xc2 at SSCK
Q1/04 Q4/04 Q1/05 Q3/06 15.1.2007 Q1/07
Phase 0Landes HLR
Phases 1 and 1aLandes HLR
Start ofInstallation
Phase 2Landes HLR
Nov. 06Dec. 06
Start ofTest Operation
Phase 2Landes HLR
Start ofProd.Operation
Phase 2Landes HLR
xc1HLR of Univ,Shutdown of
IBM SP
xc1 xc2
-
Scientific Supercomputing Center Karlsruhe
HP XC – Installation Schedule (Phase 2)
Phase 0 (Q1 2004)»
12 2way nodes (Itanium2)»
4 file server nodes
– 2 TB shared storage
»
Single rail Quadrics interconnectPhase 1 (Q4 2004)»
108 2way nodes
– Intel Itanium2
» 8 file server nodes–
Approx. 11 TB storage system
»
Single rail Quadrics interconnectPhase 1 (Q1 2005)
» 6 16way nodes–
Intel Itanium2 –
2 partitions with 8 CPUs each
»
Single rail Quadrics interconnectPhase 2 (Q3 2006)»
750 4way nodes
– two sockets –
dual core AMD 2,6 GHz, 16 GB
» 10 server nodes »
Infiniband DDR Interconnect»
56 TB storage system
Q1/04 Q4/04 Q1/05 Q4/06
today
» Total of 3.000 processor cores»
Total of 15,6 TFlop/s peak performance»
Total of 12 TB of main memory
~300 proc. | ~2 TFlop/s | ~2 TB mem.Test System
Q1/07
-
Scientific Supercomputing Center Karlsruhe
Time Schedule of xc2
» September 2006–
delivery and assembly of racks
» October 2006–
cabling of Admin network and InfiniBand interconnect–
Software installation– First internal testing
» November 2006– Further internal testing–
Early ‘friendly’ users–
Start of acceptance test
» January 2007– End of acceptance test
» January 15, 2007–
Start of production service
-
Scientific Supercomputing Center Karlsruhe
Challenges: Room Layout
Front
Rack
Fron
t
Rac
k
Front
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Rac
k
Fron
t
Rac
k
Fron
t
Rac
k
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Rack
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fro
nt
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Ra
ck
Front
Rack
Front
Rack
Front
Rack
MC
S
Front
Ra
ckM
CS
1211
109
1718
1920
75
614
1315
83
12
416
43
21
UB
BIBE3
IBE5
IBE2
IBR
2IB
R1
IBR
3IB
E4
IBE
1
SFS
» 20 racks for compute nodes
» 8 racks for network switches
»
Maximum cable length for IB DDR is 8 m
-
Scientific Supercomputing Center Karlsruhe
Challenges: Cabling
Front
Ra
ck
Fron
t
Rac
k
Front
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Rac
k
Fro
nt
Rac
k
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Rack
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fro
nt
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Front
Ra
ck
Front
Ra
ck
Front
Rack
Front
Ra
ck
Front
Rack
MC
S
Front
Rack
MC
S
1211
109
1718
1920
75
614
1315
83
12
416
43
21
UB
BIBE
3IB
E5
IBE2
IBR
2IB
R1
IBR
3IB
E4
IBE1
SFS
S
S
S
S
S
S
»
cable ducts on top of racks for InfiniBand cables
-
Scientific Supercomputing Center Karlsruhe
Challenges: Cabling
» Cable ducts on top of racks
» Cable ducts under raised floor
Front
Ra
ck
Fron
t
Rac
k
Front
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Rac
k
Fro
nt
Rac
k
Fron
t
Rac
k
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckM
CS
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Rack
Front
Rack
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Front
Ra
ckM
CS
Front
Ra
ckM
CS
Fron
t
Rac
k
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Fron
t
Ra
ckMC
S
Front
Ra
ck
Front
Ra
ck
Front
Ra
ck
Front
Ra
ck
Front
Rack
MC
S
Front
Ra
ckM
CS
1211
109
1718
1920
75
614
1315
83
12
416
43
21
UB
BIBE3
IBE5
IBE2
IBR
2IB
R1
IBR
3IB
E4
IBE1
SFS
-
Scientific Supercomputing Center Karlsruhe
Challenges: Cooling
» Water cooling of each rack
-
Scientific Supercomputing Center Karlsruhe
Challenge: Cooling
»
HP Modular Cooling System added to each rack
-
Scientific Supercomputing Center Karlsruhe
Challenges: Hardware Installation
-
Scientific Supercomputing Center Karlsruhe
HP XC4000 at a Glance
xc2.rz.unikarlsruhe.dehwwxc2.hww.de
-
Scientific Supercomputing Center Karlsruhe
HP XC4000@SSCK: The Key Figures
» 750 fourway compute nodes– 3000 cores
» 2 eightway login nodes
» 10 service nodes
» 10 file server nodes
» InfiniBand DDR interconnect
» 15.6 TFlop/s peak performance
» 12 TB main memory
» 56 TB shared storage
» 110 TB local storage
xc2.rz.unikarlsruhe.de
-
Scientific Supercomputing Center Karlsruhe
XC2 in Detail
xc2.rz.unikarlsruhe.de
hwwxc2.hww.de
-
Scientific Supercomputing Center Karlsruhe
Compute Nodes for MPI Applications
» 750 fourway nodes HP DL 145 G2–
Two dual core CPUs
• 2.6 GHz, 5.2 GFlop/s per core•
1 MB L2 cache per core
» 16 GB main memory per node–
4 GB per core
» 146 GB local disk space
» Fast InfiniBand DDR interconnect–
Latency: ~ 3 µsec–
Bandwidth: 1600 MB/s at application (MPI) level
»
Parallel MPI applications, up to O(1000) tasks
-
Scientific Supercomputing Center Karlsruhe
HP DL 145 G2 Block Diagramm
OpteronCPU 1
PC3200DDR1
PC3200DDR1
400 MHz 400 MHzOpteronCPU 2HT link
6,4 GB/s 6,4 GB/s
PCIExpress
HT
link
HT
link
InfiniBand 2.0 GB/s
Peripherals
-
Scientific Supercomputing Center Karlsruhe
AMD dual core Opteron Processor
Core 1 Core 2
64KBICache
64KBDCache
1 MB L2 cache
System request Queue
64KBICache
64KBDCache
1 MB L2 cache
Crossbar
Integrated memory controller
64 b
it
64 b
it
HT
link
1
HT
link
2
HT
link
3
-
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR Network
» InfiniBand DDR Network
» Full fat tree structure
» 2 GB/s peak bandwidth (bidirectional)
» 3 µsec latency
-
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR Network
750 compute nodes
10 service nodes
10 file server nodes 65 leaf switches
3 core switches
-
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: nonduplex pingpong
0
200
400
600
800
1000
1200
1400
1600
1800
0 10000000 20000000 30000000 40000000 50000000 60000000 70000000
80000000 90000000 100000000
Message Length[B]
Ban
dwid
th [M
B/s
]
One MPI process per node
Two MPI processes per node
Four MPI processes per node
-
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: nonduplex pingpong
0
200
400
600
800
1000
1200
1400
1600
1800
0 10000 20000 30000 40000 50000 60000 70000 80000 90000
100000
Message Length[B]
Ban
dwid
th [M
B/s
]
One MPI process per node
Two MPI processes per node
Four MPI processes per node
-
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: duplex pingpong
0
500
1000
1500
2000
2500
3000
0 1000000 2000000 3000000 4000000 5000000 6000000 7000000
8000000 9000000 10000000
Message Length[B]
Tota
l Ban
dwid
th [M
B/s
]
One MPI process per node
Two MPI processes per node
Four MPI processes per node
-
Scientific Supercomputing Center Karlsruhe
InfiniBand DDR NetworkSKAMPI: duplex pingpong
0
500
1000
1500
2000
2500
3000
0 10000 20000 30000 40000 50000 60000 70000 80000 90000
100000
Message Length[B]
Tota
l Ban
dwid
th [M
B/s
]
One MPI process per node
Two MPI processes per node
Four MPI processes per node
-
Scientific Supercomputing Center Karlsruhe
Login Nodes
» 2 eightway nodes HP DL 545 G2–
Four dual core CPUs
• 2.6 GHz, 5.2 GFlop/s per core•
1 MB L2 cache per core
» 32 GB main memory per node–
4 GB per core
» 292 GB local disk space
» Interactive access–
File management, job submission–
Program development (compilation, short test runs etc.)–
Debugging– Pre and Postprocessing
-
Scientific Supercomputing Center Karlsruhe
Parallel File System HP SFS
»
Shared storage for all nodes of XC system
» 10 file server nodes–
2 MDS / Admin– 2 OSS for $HOME–
6 OSS for $WORK
» 56 TB file space– 8 TB $HOME–
48 TB $WORK
» Expected bandwidth–
Read / write from one node: 340 MB/s / 340 MB/s–
Total read / write bandwidth of $HOME: 600 MB/s / 360 MB/s–
Total read / write bandwidth of $WORK: 3600 MB/s / 2200 MB/s
-
Scientific Supercomputing Center Karlsruhe
File Systems of XC2
$WORK$HOME$TMP
$TMP $TMP $TMP
-
Scientific Supercomputing Center Karlsruhe
Characteristics of File Systems
» $HOME–
shared, i.e. identical view on all nodes–
permanent files– regular backup–
files space limited by quotas–
used for many small files
» $WORK–
shared, i.e. identical view on all nodes–
semi permanent files, one week lifetime of files–
best used by large files, sequential files access
» $TMP–
local, nodes on different nodes see different $TMP–
temporary files, will be discarded at job end–
best used by temporary scratch files
-
Scientific Supercomputing Center Karlsruhe
Software Environment
» HP XC version 3.0 software stack–
HP XC Linux for HPC (based on Red Hat Enterprise Linux Advanced
Server Version 3.0– nagios, syslogng, …–
SLURM, local addon JMS (job_submit … )–
HP MPI–
Modules package (module add … )
»
HP SFS file system (based on lustre)»
Compilers
– gnu, Intel, PGI, PathScale
» Debuggers– gdb, ddt
» Profilers» Applications
-
Scientific Supercomputing Center Karlsruhe
XC2 in Comparison with XC1
» Identical software environment
» Different processor architecture
» 4way nodes instead of 2way nodes
» Similar ratio of–
Memory size : floating point performance–
Communication bandwidth : floating point performance
»
Number of CPUs (cores) increased by factor of 10–
Much larger jobs O(1000) MPI processes
• Fine grain parallelization•
Finer resolution
– More jobs in parallel
-
Scientific Supercomputing Center Karlsruhe
XC2 in Comparison with XC1
1,21
1,47
1,07
3,44
1,10
SPARC
PLESOCC
IMDMETRAS
FDEM/LINSOL
-
Scientific Supercomputing Center Karlsruhe
Early ‘friendly’ users
»
You will help us to stabilize and improve the system.
»
You may get a lot of CPU cycles for your research work.
» But–
We cannot guarantee the high stability of a production system.–
We may have to shut down the system without warning.–
Not all software components may work as desired.–
Scalability of some tools may be a problem.
»
If you can work with these restrictions and want to become an early user of the xc2, please send an email to
[email protected]
-
Scientific Supercomputing Center Karlsruhe
Thank You
-
Scientific Supercomputing Center Karlsruhe
Compilers on XC2
»
GNU Compilers and third party compilers
-
Scientific Supercomputing Center Karlsruhe
Compilers and module command
»
module add compilerwhere compiler stands for: gnu/3, gnu/4, intel, pgi or pathscale
»
Environment variables modified by this command:–
PATH– LD_LIBRARY_PATH– MANPATH–
FC, F77, CC F90, CXX–
MPI_F77, MPI_F90, MPI_CC MPI_CXX–
CFLAGS, FFLAGS– ACMLPATH
–
Some compiler specific variables, i.e. LM_LICENSE_FILE etc.
»
By default the command module add intelis executed during login.
-
Scientific Supercomputing Center Karlsruhe
module add intel
»
Environment variables modified by this command:–
PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added–
FC = ifort– F77 = ifort
MPI_F77 = ifort– F90 = ifort
MPI_F90 = ifort– CC = icc
MPI_CC = icc– CXX = icpc
MPI_CXX = icpc– CFLAGS = – FFLAGS = –
ACMLPATH =
-
Scientific Supercomputing Center Karlsruhe
module add pgi
»
Environment variables modified by this command:–
PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added–
FC = pgf77– F77 = pgf77
MPI_F77 = pgf77– F90 = pgf90
MPI_F90 = pgf90– CC = pgcc
MPI_CC = pgcc– CXX = pgcc
MPI_CXX = pgcc– CFLAGS = – FFLAGS = –
ACMLPATH =
-
Scientific Supercomputing Center Karlsruhe
module add pathscale
»
Environment variables modified by this command:–
PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added–
FC = pathf77– F77 = pathf77
MPI_F77 = pathf77– F90 = pathf90
MPI_F90 = pathf90– CC = pathcc
MPI_CC = pathcc– CXX = pathcc
MPI_CXX = pathcc– CFLAGS = –
FFLAGS = – ACMLPATH =
-
Scientific Supercomputing Center Karlsruhe
module add gnu/3
»
Environment variables modified by this command:–
PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added–
FC = gf77– F77 = gf77 MPI_F77 = gf77–
F90 = MPI_F90 = – CC = gcc
MPI_CC = gcc– CXX = g++
MPI_CXX = g++– CFLAGS = – FFLAGS = –
ACMLPATH =
-
Scientific Supercomputing Center Karlsruhe
module add gnu/4 or module add gnu
»
Environment variables modified by this command:–
PATH, LD_LIBRARY_PATH and MANPATH:
correspondig subdirctories of compiler installation directories are added–
FC = gfortran– F77 = gfortran
MPI_F77 = gfortran– F90 = gfortran
MPI_F90 = gfortran– CC = gcc
MPI_CC = gcc– CXX = g++
MPI_CXX = g++– CFLAGS = – FFLAGS = –
ACMLPATH =