1 400G Demonstrator for ISC ‘13 400G Demonstrator for ISC ‘13 400G Demonstrator for ISC ‘13 400G Demonstrator for ISC ‘13 HPCN Workshop, Braunschweig, 7. Mai 2013 HPCN Workshop, Braunschweig, 7. Mai 2013 HPCN Workshop, Braunschweig, 7. Mai 2013 HPCN Workshop, Braunschweig, 7. Mai 2013 Wolfgang Wünsch, Technische Universität Dresden Eduard Beier, T-Systems International
55
Embed
400G Demonstrator for ISC ‘13 - t-systems-sfr.com · 400G Demonstrator for ISC ‘13 HPCN Workshop, Braunschweig, 7. Mai 2013 Wolfgang Wünsch, Technische Universität Dresden ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
400G Demonstrator for ISC ‘13400G Demonstrator for ISC ‘13400G Demonstrator for ISC ‘13400G Demonstrator for ISC ‘13
HPCN Workshop, Braunschweig, 7. Mai 2013HPCN Workshop, Braunschweig, 7. Mai 2013HPCN Workshop, Braunschweig, 7. Mai 2013HPCN Workshop, Braunschweig, 7. Mai 2013
Wolfgang Wünsch, Technische Universität Dresden
Eduard Beier, T-Systems International
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13
Agenda
� Partner
� Purpose
� Project Structure
� Topology
� Turbine Development
� Climate Computing
� Service Recipient Relations
� Data Path
� The Big Picture
� Project Lifetime
� Timeline
� DATE
� Test items
just click on the just click on the just click on the just click on the
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 3
Partner
Back to Back to Back to Back to AgendaAgendaAgendaAgenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 4
Purpose
The purpose of the project is: to demonstrate, that bandwidth beyond 100GBit/s is feasible and useful
Back to Back to Back to Back to AgendaAgendaAgendaAgenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 5
Project Structure
Back to Back to Back to Back to AgendaAgendaAgendaAgenda
Project Project Project Project BoardBoardBoardBoardProf. Dr. A. Bode / Prof. Prof. Dr. A. Bode / Prof. Prof. Dr. A. Bode / Prof. Prof. Dr. A. Bode / Prof. DrDrDrDr .W. Nagel .W. Nagel .W. Nagel .W. Nagel
Dr. A. KlugeDr. A. KlugeDr. A. KlugeDr. A. Kluge F. SchneiderF. SchneiderF. SchneiderF. Schneider Prof. Dr. W. GentzschProf. Dr. W. GentzschProf. Dr. W. GentzschProf. Dr. W. Gentzsch R. R. R. R. WienekeWienekeWienekeWieneke
M. M. M. M. ZappolinoZappolinoZappolinoZappolino Dr. A. GeigerDr. A. GeigerDr. A. GeigerDr. A. Geiger M. M. M. M. RoosenRoosenRoosenRoosen M. FuchsM. FuchsM. FuchsM. Fuchs
A. A. A. A. ClaubergClaubergClaubergClauberg T. T. T. T. WeselowskiWeselowskiWeselowskiWeselowski Jan Jan Jan Jan HeichlerHeichlerHeichlerHeichler
Back to Back to Back to Back to Project StructureProject StructureProject StructureProject Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 15
100G Sachsen operational
100G Sachsen planning
400G Demonstrator p
Chemnitz
Topology
10GbE for Demonstrator
Computing Center
Euro Industriepark
München
DT PoP
Back to Back to Back to Back to AgendaAgendaAgendaAgenda
Turbine Development
Back to Back to Back to Back to AgendaAgendaAgendaAgenda
� Cooperation with DLR
� Workflow Demonstration
� Preprocessing
� Solver 1
� Solver 2
� Postprocessing
� Turbine model calculation with n Eigenmodes and m Phase Angels
Details:Details:Details:Details:
� Data volume: ~ 1TB
� Overall Workflow:
� Multitude of independent simulation runs (HTC).
� Simulations running on HPC resources at different sites.
� Every simulation produces input data for subsequent simulations.
� Subsequent simulations again run at different sites.
Thus to avoid knock-on delays in workflow execution data instantly should be availableat different sites!
GPFS:GPFS:GPFS:GPFS:
� Adopted feature: Active File Management (AFM) and Stretched Cluster
� Cross site data replication allows running simulations without prior copying
� Implicit data consistent backup via AFM data replication
Turbine Development: Benefits of GPFS Usage on 400G
Back to Back to Back to Back to Turbine Development Turbine Development Turbine Development Turbine Development
400G:400G:400G:400G:
� Possible job distributions on HPC resources:
� n * m jobs (n Eigenmodes, m phase angles)
� a * b jobs on cluster (a parallel jobs running on c cores, b serial jobs) with a * b ≥ n * m
� Bandwidth requirements: bandwidth = �����∙���∆�
, where� ���� = Avg. file size written per job
� ����� = No. of running in parallel (Here: corresponds to a)
� ∆� = Avg. time for disk access
Turbine Development: Benefits of GPFS Usage on 400G
No. cores
time
a = 6
n * m = 28 ≥ a * b = 30
∆t
Solver 1
Solver 2
b = 5
240min
Back to Back to Back to Back to Turbine Development Turbine Development Turbine Development Turbine Development
400G: 400G: 400G: 400G: BandwidthBandwidthBandwidthBandwidth requirementsrequirementsrequirementsrequirements forforforfor different different different different jobjobjobjob distributiondistributiondistributiondistribution setupssetupssetupssetups
� Extreme/HTC setup with a = n * m = 300, b = 1:
� Assuming jobs all writing within 15min to disk an avg. file size of 150MB (i.e. write peak):
Back to Back to Back to Back to Turbine Development Turbine Development Turbine Development Turbine Development
� Order 30 different Order 30 different Order 30 different Order 30 different modelsmodelsmodelsmodels areareareare usedusedusedused worldwideworldwideworldwideworldwide
� Movement Movement Movement Movement ofofofof datadatadatadata shouldshouldshouldshould bebebebe withinwithinwithinwithin monthsmonthsmonthsmonths ****
Transfer Rate
Time totransport 1 PB of Data
10 Mbps ~ 27 years
1 Gbps ~ 97 days
100 Gbps ~ 23 hours* Otherwise the questions will be forgotten ;-) Statistics taken from: „BER Network Requirements Workshop”,
LBNL report LBNL-4089E 2010, P 33. Recommended Reading
Climate Computing
ExtremelyExtremelyExtremelyExtremely High High High High BandwidthBandwidthBandwidthBandwidth RequirementsRequirementsRequirementsRequirements
‘‘‘‘VeryVeryVeryVery Big Data‘Big Data‘Big Data‘Big Data‘
Back to Back to Back to Back to AgendaAgendaAgendaAgenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 23
Model postModel postModel postModel post----processing and processing and processing and processing and analysisanalysisanalysisanalysisVisualisiationVisualisiationVisualisiationVisualisiation @ ISC ’13 @ ISC ’13 @ ISC ’13 @ ISC ’13
LeipzigLeipzigLeipzigLeipzig
Folder Folder Folder Folder 3333CMIPCMIPCMIPCMIP
Back to Back to Back to Back to Climate Climate Climate Climate ComputingComputingComputingComputing
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 24
CCA & GPFS & iRODS
GPFS and/or GPFS and/or GPFS and/or GPFS and/or
Global Namespace Global Namespace Global Namespace Global Namespace
iRODSiRODSiRODSiRODS
Back to Back to Back to Back to Climate Climate Climate Climate ComputingComputingComputingComputing
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 25
TRACE on HPC TRACE on HPC TRACE on HPC TRACE on HPC ResourcesResourcesResourcesResources
TRACE on HPC TRACE on HPC TRACE on HPC TRACE on HPC ResourcesResourcesResourcesResources
Turbine DevelopmentTurbine DevelopmentTurbine DevelopmentTurbine Development
Client evaluating Client evaluating Client evaluating Client evaluating results, e.g. TECPLOTresults, e.g. TECPLOTresults, e.g. TECPLOTresults, e.g. TECPLOT
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 28
ConnectionConnectionConnectionConnection RZG Router RZG Router RZG Router RZG Router –––– RZGRZGRZGRZG----WDM and WDM and WDM and WDM and Connection TUD RouterConnection TUD RouterConnection TUD RouterConnection TUD Router–––– TUD WDMTUD WDMTUD WDMTUD WDM
coherent Super Channel (2 x 16QAM@ 50 coherent Super Channel (2 x 16QAM@ 50 coherent Super Channel (2 x 16QAM@ 50 coherent Super Channel (2 x 16QAM@ 50 GHzGridGHzGridGHzGridGHzGrid / 2 x 200 / 2 x 200 / 2 x 200 / 2 x 200 GBitGBitGBitGBit/s) /s) /s) /s)
Back to Back to Back to Back to Big PictureBig PictureBig PictureBig Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 29
ConnectionConnectionConnectionConnection TUD Server TUD Server TUD Server TUD Server –––– TUD IBTUD IBTUD IBTUD IB----Switch Switch Switch Switch andandandandConnectionConnectionConnectionConnection RZG Server RZG Server RZG Server RZG Server –––– RZG IB SwitchRZG IB SwitchRZG IB SwitchRZG IB Switch
Type MPO (Infiniband FDR (56GBit/s))
length TUD: 10m, RZG: 10m
Volume 12 each side (24 total)
Infiniband Connections
MellanoxMellanoxMellanoxMellanox ConnectConnectConnectConnect----IB IB IB IB Volume:12on each side (24total)Volume:12on each side (24total)Volume:12on each side (24total)Volume:12on each side (24total)
Back to Back to Back to Back to Big PictureBig PictureBig PictureBig Picture
MellanoxMellanoxMellanoxMellanox active Cable (incl. QSFP)active Cable (incl. QSFP)active Cable (incl. QSFP)active Cable (incl. QSFP)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 30
ConnectionConnectionConnectionConnection TUD Server TUD Server TUD Server TUD Server –––– TUD Router TUD Router TUD Router TUD Router and and and and ConnectionConnectionConnectionConnection RZG Server RZG Server RZG Server RZG Server –––– RZG RouterRZG RouterRZG RouterRZG Router
Type MPO (40GBaseSR4)
length TUD: 10m; RZG: 10m
Volume 24x10m MellanoxMellanoxMellanoxMellanox Connect X3 Connect X3 Connect X3 Connect X3 Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)
MellanoxMellanoxMellanoxMellanox active Cable (incl. QSFP)active Cable (incl. QSFP)active Cable (incl. QSFP)active Cable (incl. QSFP)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)Volume: 12 on each side (24 total)
AlcatelAlcatelAlcatelAlcatel----Lucent 6Lucent 6Lucent 6Lucent 6----port 40GbE IMM port 40GbE IMM port 40GbE IMM port 40GbE IMM (no picture) (no picture) (no picture) (no picture) Volume:2@TUDVolume:2@TUDVolume:2@TUDVolume:2@TUD
Back to Back to Back to Back to Big PictureBig PictureBig PictureBig Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 31
IBM IBM IBM IBM iDataPlexiDataPlexiDataPlexiDataPlex dx360 M4 dx360 M4 dx360 M4 dx360 M4 (Volume: 12@RZG)(Volume: 12@RZG)(Volume: 12@RZG)(Volume: 12@RZG)
Back to Back to Back to Back to Big PictureBig PictureBig PictureBig Picture
AlcatelAlcatelAlcatelAlcatel----Lucent 2Lucent 2Lucent 2Lucent 2----port 100GbE IMM port 100GbE IMM port 100GbE IMM port 100GbE IMM Volume: 3 @ TUD Volume: 3 @ TUD Volume: 3 @ TUD Volume: 3 @ TUD
AlcatelAlcatelAlcatelAlcatel----Lucent 1Lucent 1Lucent 1Lucent 1----port 100GbE IMM port 100GbE IMM port 100GbE IMM port 100GbE IMM (no picture) (no picture) (no picture) (no picture) Volume:4@RZGVolume:4@RZGVolume:4@RZGVolume:4@RZG
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 33
ConnectionConnectionConnectionConnection TUD Router TUD Router TUD Router TUD Router –––– TUD 10GbE ClusterTUD 10GbE ClusterTUD 10GbE ClusterTUD 10GbE Cluster
Type LC duplex multimode (10GBaseSR)
length ?
Volume 17
Back to Back to Back to Back to Big PictureBig PictureBig PictureBig Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 34
Volume: 36 Volume: 36 Volume: 36 Volume: 36 PCIePCIePCIePCIe cards on each side (72 total)cards on each side (72 total)cards on each side (72 total)cards on each side (72 total)
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13
closingclosingclosingclosing
WSWSWSWS21.6.21.6.21.6.21.6.----????
37
Project Lifetime
Demonstrator SetupDemonstrator SetupDemonstrator SetupDemonstrator Setup Getting through Getting through Getting through Getting through Test Item List TILTest Item List TILTest Item List TILTest Item List TIL
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 39
Demonstrator Application Test Environment
DATE
ObjectiveObjectiveObjectiveObjective
•3 weeks between RfS IP and the begin of the ISC is by far not enough time to get two high sophisticated applications running @400G
•Therefore the support of the applications as early as possible is an integrated part of the project; the application teams get access as soon possible on new building blocks of the ‘big picture’
1. April 1. April 1. April 1. April ---- 17. May 17. May 17. May 17. May
Back to Back to Back to Back to AgendaAgendaAgendaAgenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 40
DATE Phase 1
Router / SwitchRouter / SwitchRouter / SwitchRouter / Switch400G WDM 400G WDM 400G WDM 400G WDM Super Channel Super Channel Super Channel Super Channel (4x100GbE)(4x100GbE)(4x100GbE)(4x100GbE)
Each customer configures Each customer configures Each customer configures Each customer configures his own FW entityhis own FW entityhis own FW entityhis own FW entity
•Is 5GByte/s Firewall / Encryption / Compression with standard hardware (2xE5-2670 & PCIe3.0) feasible?•Even hypervisored?•What is the impact for the application?
NFV Objectives & Comments
CommentsCommentsCommentsComments
•In ‘real life’ network functions like encryption, firewalling and compression can become very important , even in HPC environments•There is no need for 400GBit/s; 100GBit/s is equivalent
Back to Back to Back to Back to Test Item ListTest Item ListTest Item ListTest Item List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 50
Loadbalancer / Bundling / Performance / CoS / FCAPS Setup
Back to Back to Back to Back to Test Item ListTest Item ListTest Item ListTest Item List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 51
•Wie verhält sich GPFS,wenn die Daten nicht zwischen 2, sondern 3 Sites (Option 4 Sites, Hub be TUD) gespiegelt werden? (Server Überlast?)•Was passiert, wenn z.B eine Site mit geringerer Bandbreite angebunden ist? “Kommt das GPFS noch nach”?
Test Item GPFS Network
CommentsCommentsCommentsComments
•There is no need for 400GBit/s; 100GBit/s is equivalent
Back to Back to Back to Back to Test Item ListTest Item ListTest Item ListTest Item List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 53
Test Item SDN
CommentsCommentsCommentsComments
* Aufsetzen einer virtuellen SDN Umgebung zwischen ZIH & RZG mittels vSwitch* Wünschenswert wäre die Einbindung von aktiven Netzwerkelementen mit OpenFlowUnterstützung (bspw. auch Barracudas SDN Gateway)* Gegenüberstellung verschiedener verfügbarer OpenFlow Controller (Beacon, Floodlight, FlowER, OpenDaylight, ...)* Zeitraum: 1 Monat, kann aber parallel zu anderen Untersuchungen laufen
Da vermutlich nicht besonders viel Zeit zur Verfügung stehen wird, glaube ich das dies bereits sehr ambitioniert ist. Ich muss dann schauen wieviele Controller aufgesetzt und getestet werden können. Aber ich würde die Umgebung so aufbauen das ich auch nach dem 400G Showcase noch damit arbeiten kann.
Back to Back to Back to Back to Test Item ListTest Item ListTest Item ListTest Item List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 54
Test Item RDMA over Ethernet
CommentsCommentsCommentsComments
- Analyse vorhandener Protokolle für RDMA über Ethernet- Vergleich mit RDMA über InfiniBand- Performanceanalyse und -optimierung innerhalb einer 40GbE Testumgebung- Gegenüberstellung der Ergebnisse mit FDR InfiniBand
Back to Back to Back to Back to Test Item ListTest Item ListTest Item ListTest Item List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 55