Page 1
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
Sergei AbramovWorkshop introducing the AURORA
project4 June 2009 Conference Room FBK,
via Sommarive, 18 - Povo. Trento, Italy
SKIF-AURORA SKIF-AURORA ProjectProject
Page 2
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Outline
Pereslavl-Zalessky and Program Systems Institute of the RAS: Short introduction
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
2008–2010: Series 4 of SKIF supercomputersSeries 4 of SKIF supercomputer == SKIF-
AURORASKIF-AURORA Selected Topics
Management Subsystem 3D-torus Interconnect Combining standard CPUs and FPGA-accelerators
ConclusionApril 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 2
Page 3
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
Pereslavl-Zalessky andPereslavl-Zalessky andProgram Systems Institute of the Program Systems Institute of the
RAS:RAS:Short introductionShort introduction
Page 4
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Pereslavl-Zalessky
Beautiful ancient Russian Beautiful ancient Russian town, 860 years oldtown, 860 years old
The center of the The center of the Russian Golden Ring Russian Golden Ring CityCity
Hometown of Great Hometown of Great Dukes of RussiaDukes of Russia
The first building site The first building site Peter The Great navyPeter The Great navy
Ancient capital of Russian Ancient capital of Russian Orthodox churchOrthodox church
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 4
Moscow
Pereslavl-Zalessky
120
km
Page 5
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
PSI RAS, Pereslavl-Zalesski
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 5
Page 6
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Foundation of the Institute
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 6
The Program Systems Institute was founded in 1984 by a decree of the USSR government. The foundation was aimed at the development of computer science in the country.The first (1984–2003) director of the Institute wasProf. A. Ailamazyan
Page 7
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»2009: Organization of the Institute
Artificial Intelligence Research Center
Medical Informatics Research Center
Research Center for Multiprocessor Systems
System Analysis Research Center
Control Processes Research Center
Scientific and Educational Center — International Children’s Computer CenterApril 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 7
Page 8
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Ailamazyan University of Pereslavl
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 8
Page 9
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
Supercomputer Supercomputer ProjectsProjects
SKIF and SKIF-GRID SKIF and SKIF-GRID ofof
Russia and BelorussiaRussia and Belorussia
Page 10
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SKIF and SKIF-GRIDSupercomputing Projects
Joint Supercomputing Projects ofRussian Federation and Republic of Belarus
R&D in all directions and levels of supercomputer and grid-technologies: hardware, operating system, parallel programming systems, applications etc.
SKIF: 2000–2004,10 + 10 = 20 organizations
SKIF-GRID: 2007–2010,12 + 23 = 35 organizations
PSI RAS is lead organizationfrom Russian Federation
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 10
Page 11
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
SKIF-GRID Project organization
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 11
Project directions1.Grid technology2.Supercomputer
s
• SW
• HW3.Security4.Pilot projects —
applications of HPC and grid technology
Page 12
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Series 1, 2, and 3 of the SKIF supercomputers
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 12
Series 1 (2000–2003)2000: SKIF Firstborn 0.02/0.0112001: SKIF ВМ-5100 0.048/0.0262003: SKIF ES1710.03 0.04/.023
Series 2 (2003–2007)2003: SKIF -Forge-32 0.1/0.0742003: SKIF K-500 0.717/0.4172004: SKIF К-1000 2.53/2.03
Series 3 (2007–2008)2007: SKIF Cyberia 12/9.012008: SKIF Ural 15.94/12.2 2008: SKIF MSU 60/47.17
Page 13
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Flagship of SKIF supercomputers:SKIF MSU (March 2008)
June 2008: #36 in Top500Peak performance 60 Tflops, Linpack: 47
TflopsOriginal blade design, CPU model: 4-cores
Intel XEON E5472 3,0 GHzNodes (dual CPU): 625CPU cores total: 5,000 Interconnect:
Infiniband DDR,Fat Tree
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 13
Page 14
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 14
10,000,000
1,000,000
100,000
10,000
1,000
100
Top 1Top10Top100Top200Top300Top400Top500M ade in Russia
21
3
45
6
LipackGflops
2002 JuneMVS 1000M0.734/1.024 TFlops
2003 NovemberSKIF K-5000.423/0.717 TFlops
2004 NovemberSKIF K-10002.032/2.534 TFlops
2007 FebruarySKIF Cyberia9.013/12.002 TFlops
2008 MaySKIF Ural12.2/15.9 TFlops
2008 майSKIF MSU47.1/60 TFlops
Only six developed in Russia supercomputers were ranked in the Top500… Five of them are SKIFs
Only six developed in Russia supercomputers were ranked in the Top500… Five of them are SKIFs
Page 15
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 15
Top1Top10Top100Top200Top300Top400Top500TopSKIF
100,000,000
1,000,000
100,000
10,000
1,000
100
LipackGflops
10
10,000,000
SeriesSeries11
Series Series 22
Series Series 33
Series Series 44
2032 Gflops SKIF K-1000472 Gflops SKIF K-500
57 Gflops Firstborn-M26 Gflops VM510011 Gflops Firstborn
47.17 Tflops SKIF MSU12.2 Tflops SKIF Ural9 Тflops SKIF Syberia
Top1Top10Top100Top200Top300Top400Top500TopSKIF
100,000,000
1,000,000
100,000
10,000
1,000
100
LipackGflops
10
10,000,000
1Q 2012 SKIF P~5.03Q 2010 SKIF P-1.03Q 2009 SKIF P-0.5
Completed: Series 1–3Nearest plan: Series 4
Linpack
Series 1, 2, 3 and 4 of SKIF supercomputers
Page 16
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
2008–2010: 2008–2010: Series Series 44 of ofSKIF supercomputersSKIF supercomputers
Page 17
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
SKIF Series 4: Aims of R&D
Highest density of performance(biggest possible number CPU per 1U) Smaller latency Less cables and connectors — better reliability Enlarged emission of heat per 1U
• We need new technology of cooling… How to? Improved Interconnect: we need better
scalability, bandwidth and latency that it’s provided by best available solutions (eg. Infiniband QDR)
New approach to monitoring and management of the supercomputer
Combining standard CPUs and accelerators in computational nodes of the supercomputer
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 17
Page 18
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Spring’2008: SKIF Series 4 — How To?
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 18
How to enlarge number of
CPU per 1U?
How to enlarge number of
CPU per 1U?
How to cool supercomputer
nodes?
How to cool supercomputer
nodes?
How to developimproved
interconnect?
How to developimproved
interconnect?
How to combinestandard CPUs and
accelerators?
How to combinestandard CPUs and
accelerators?
How developmanagementsubsystem?
How developmanagementsubsystem?
SKIF series 4SKIF series 4is extremelyis extremely
complex project.complex project.We need strongWe need strong
partners!partners!
Page 19
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Summer’2008: SKIF Series 4 — Know How!
Italian-Russian Cooperation«SKIF Series 4» ==
«SKIF-AURORA Project»Designed by an alliance of
Eurotech, PSI RAS and RSC SKIF with support by Intel
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 19
Program SystemsInstitute of RAS
Page 20
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SKIF-AURORA: Designed by the alliance of Eurotech, PSI RAS and RSC SKIF
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 20
Program SystemsInstitute of RAS
PCBs, schematics,
mechanics, power
supply, cooling,
1 and 2 levels of
management system
3 level of management
system, Interconnect
(3D-torus: firmware,
routing, drivers,
MPI-2…), FPGA as
accelerator
Page 21
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
SKIF-AURORA:SKIF-AURORA:State of the ProjectState of the Project
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 21
Page 22
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Node Card
19 апреля, 2023
СКИФ-ГРИД © 2009 Все права защищены Слайд 22
Page 23
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
PSU Card
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 23
Page 24
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Root Card
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 24
Page 25
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Chassis
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 25
Page 26
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Chassis
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 26
Page 27
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Chassis
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 27
Page 28
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»ISC’09, Hamburg, June 23–25, 2009
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 28
Page 29
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
RackRack
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 29
Page 30
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SystemPoject SKIF-Aurora 500 Tflops
19 апреля, 2023 СКИФ-ГРИД © 2009 Все права защищены Слайд 30
Page 31
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
SKIF-AURORA:SKIF-AURORA:Management SubsystemManagement Subsystem
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 31
Page 32
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»Subjects of Management Subsystem
1 Pflops = 42 racks == 10,752 nodes+ 672 DC/DC trays+ 672 root nodes
For scalability we need robust and redundant management subsystem
Comprehensive monitoring and control in all situations
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 32
Page 33
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»1st Level of Management Subsystem
Standard solution: IPMI over TCP/IP (Infiniband)
Available when nodes, root card, and IB-network are powered on and work properly
Root cards and DC/DC trays are not covered by monitoring and control
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 33
Page 34
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»2nd Level of Management Subsystem
Catalyst module on the root card implements node power control and serial console for the nodes
Available when root card and IB-network are powered on and work properly
Root cards and DC/DC trays are not covered by monitoring and control
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 34
Page 35
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»3rd Level of Management Subsystem
SKIF Servnet: independent sensor network
Available always, uses dedicated power network, power consumption: 3W per chassis
Accessible over dedicated network: Ethernet + CANbus + I2C
Monitors temperature, humidity, supply voltages on node cards, root card, DC/DC tray. Transfer this information to 2nd level (to Catalist)
Can turn off DC/DC PSU in case of emergency
Turn-off decision is made locally by ARM microcontroller located on the root cardApril 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 35
Program SystemsInstitute of RAS
Page 36
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»SKIF-AURORA Management Subsystem: Total monitoring and control
3-way redundantDesigned for “dark
datacenter”Robust management
subsystem
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 36
Program SystemsInstitute of RAS
Page 37
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
SKIF-AURORA Management Subsystem: Total monitoring and control
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved
Page 38
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
3D-torus Interconnect3D-torus Interconnect
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 38
Page 39
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
3D-torus Interconnect
Only QCD specific is implemented by Italian teamRussian teams to upgrade network to general-
purpose interconnect (MPI 2.0)Due to appear fall 2009 Support and improvements in 2010–2012
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 39
Page 40
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»3D-torus Interconnect. Current status
Simple rounting implemented on a prototype (SKIFino)
Routing on single-FPGA prototype is working
MPI is based on MPICH2 codebase — prototyped
MPICH2 self-test implemented
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 40
Page 41
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»R&D Directions Using FPGA Resources
Collective MPI operations using FPGA FPGA to facilitate support of PGAS-languages
(UPC, Titanium, etc) FPGA+CPU hybrid computing
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 41
System Interconnect, 3D-torus
Subsidiary Interconnect, Infiniband
FPGA FPGA FPGA FPGA...
CPU CPU CPU CPUstandard part
non-standard part
Page 42
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
ConclusionsConclusions
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 42
Page 43
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Conclusions
SKIF-AURORA project Is based on collaboration between international
teams Harnesses shared expertise and results Aimed to develop a family of top-level
supercomputers with innovative techniques: Higher density of CPUs (flops per volume) Efficient water cooling system Efficient power supply system Scalable powerful 3D-Torus Interconnect Most modern standard CPUs for computation and
FPGA for its acceleration Redundant robust management subsystem Etc.
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 43
Page 44
Program Systems Institute of RAS — Supercomputer Project «SKIF-GRID»
Conclusions
The collaboration between Italian and Russian teams Allows to obtain world class supercomputer
technologies Provides leading positions in supercomputer
industry (at least in the nearest future) for all participants of the collaboration
Makes all results available in reasonable time and by reasonable efforts and resources
April 19, 2023 © PSI RAS, SKIF-GRID, 2009 All rights reserved 44
Page 45
Program Systems Institute of the Russian Academy of Sciences
Supercomputer Projects SKIF and SKIF-GRID of Russia and Belorussia
Grazie per l’attenzione
!