Specialized Supercomputers Piero Vicini INFN Istituto Nazionale di Fisica Nucleare Italian National Institute for Nuclear Physics.

Specialized Supercomputers

Piero ViciniINFN

Istituto Nazionale di Fisica NucleareItalian National Institutefor Nuclear Physics

Dedicated SuperComputing

• WHY– The Scientific Case– Custom vs Commodity– Italian Experience

• APE project

• HOW TO – The international scenario– Petaflops machine

• Some ideas

• TOOLS– EU funding– National funding

SuperComputing: the Scientific Case

Large Scale numerical applications

– Astrophysics and Plasma Physics

• Today: 70-100 TF/s, 2009: >500 TFs/s

• Dedicated architecture: Grape (Japan/Europe)

– High-Energy Physics (LQCD)

• Today: 10-50 TF/s, several projects 2009: 500-1000 TFs/s

• Dedicated architecture: APE (Europe), QCDOC(USA/UK)

– Weather, Climatology, Earth sciences

• Today: 10-30 TF/s, 2009: several projects per 200-300 TF/s aggregated power

• Dedicated architecture: Earth Simulator (Japan)

– Life Sciences (molecular dynamics, protein folding, in silico drug design,…)

• Today:…., 2009-2010: > N*Petaflops

• Dedicated architecture: IBM Blue/Gene (USA)

– .........

Dedicated vs General Purpose Parallel Machine

• Processor level -> very well balanced architecture– Computing unit designed to be very efficent on kernel of

(several) classes of applications – Integration of “unusual” memory interfaces based on large

register File, huge multiport,…. – Integration of optimized interconnection network (low

latency, high bandwidth)

* Communication overhead not included

Eff. (H) 0.56 0.53 0.27* 0.11 0.42 0.05

QCDbenchmarks

Dedicated vs General Purpose Parallel Machine(2)

• System level:Dense, safe and cheap systems

– Very high ratio of Flops/Watt

– Very high ratio of Flops/Volume

– Cost effective systems• 0.5 €/Mflops• Very low cost

maintenance

3670

80

46

apeNEXT

3670

72

50972

apeNEXT

The Italian experience: ape project

Our line of Home Made Computers …

APE(1988)

APE100(1993)

APEmille(1999)

apeNEXT(2004)

Italian research team

Italian research

team

European research team

+Industry(QSW,Eurotech)

European research team+

Industry(Eurotech)

Architecture SIMD SIMD SIMD SIMD++

comp. nodes 16 2048 2048 4096

Interc. Topology

flexible 1D rigid 3D flexible 3D flexible 3D

Memory size 256 MB 8 GB 64 GB 1 TB

registers(w.size)

64 (x32) 128 (x32) 512 (x32) 512 (x64)

Clock speed 8 MHz 25 MHz 66 MHz 200 MHz

Peak power 1 GFlops 100 GFlops 1 TFlops 7 TFlops

apeNEXT architecture

•3D mesh of computing nodes

• Custom VLSI processor - 200 MHz (J&T)

• 1.6 GFlops per node (complex “normal”)

• 256 MB (1 GB) memory per node

•First neighbor communication network “loosely synchronous”

•YZ internal, X on cables

•r = 8/16 => 200 MB/s per channel

•Scalable 25 GFlops -> 6 Tflops• Processing Board 4 x 2 x 2 ~ 26 GF• Crate (16 PB) 4 x 8 x 8 ~ 0.5 TF• Rack (32 PB) 8 x 8 x 8 ~ 1 TF• Large systems (8*n) x 8 x 8

•Linux PCs as Host system

Z+(bp)

Y+(bp)

X+(cables)

•0 •2

•4 •6

•8 •10

•12 •14

•1 •3

•5 •7

•9 •11

•13 •15

•J&T

•DDR-MEM

•X+

•…•…•Z-

Evaluating the success of APE(1)

Apemille (2000):Italy 1365 GFGermany 650 GFUK 65 GFFrance 16 GF

Total 2 TF

apeNEXT (2005):Development costs = 2000 k€uro 1100 k€uro VLSI NRE 250 k€uro non-VLSI NRE 650 k€uro prototype procurement Manpower = 20 man/yearMass production cost ~ 0.5 €uro/Mflops Installations:

Italy 10.6 TF

Germany 8.0 TF

France 1.6 TFTotal 20.2 TF

Evaluating the success of APE(2)

• Scientific, technological and social impacts:– APE is standard “de facto” in European LQCD computing area– Huge number of scientific and technological (HW, SW, Architecture)

papers– Establishment of an international computing facility fully

dedicated to scientific numerical computing• Laboratorio di Calcolo apeNEXT: 12 TFs installed, opening on February,

8th

– Strategic opportunities to increase national(European) industry capability• Eurotech

– INFN collaboration -> HPC division, market expansion, international visibility• Finmeccanica/QSW

– Training, dissemination and establishment of spin-off company• Atmel/Ipitec• Nergal • Digital Video• Venere

What’s next after apeNEXT?: scenario

• In the future (2010) the required computing platform for numerical “large-scale” applications will be of the order of PetaFlops

• The International scenario – Today (www.top500.org):

• IBM Blue/Gene: dedicated architecture (very similar to APE….), N*100TFlops

• Earth Simulator: N*10TFlops• PC Clusters approach: N*10TFlops

– Future (2010 and beyond):• USA: IBM, Blue/Gene evolution, N*Petaflops• Japan: NEC/Hitachi/University, 3 Petaflops per biotech and nanotech,

custom silicon, custom interconnect• Japan: Fujitsu, 3 Petaflops, cluster approach with optical

interconnection

• Europe?

Brainstorming• Silicon shrink

– apeNEXT: 0.18 um – today: 0.13um – Next years: 0.90 – 0.65 um

1319

78

39

0

10

20

30

40

50

60

70

80

90

0.18 0.13 0.09 0.06

Silicon Process (um)

Area (mm2)

Die area per FP Node

Worst case: 6 computing Nodes per chip (Tiled architecture)

Brainstorming(2)• Performance scaling

– Clock frequency scales with silicon process– Power consumption decrease with silicon process (est. 0.3 W/Gflops)– Architecture: Multi-Tiles versus Single-Tile

1,6

4,8

13,1

24,0

1,6 2,4 3,2 4,0

0,0

5,0

10,0

15,0

20,0

25,0

30,0

0:18 0:13 0:09 0:06 Silicon Process (um)

GFs

Multi-Tiles Perf (GFs)Single-Tile perf (GFs)

Brainstorming(3)• “Smart memory architecture” and new “3D Engineering”

– On chip large and hierarchical memory buffers -> reduction of components per board– Processing board “sandwich” (stacked) -> surface distributed network connectors – 512 FP Nodes per board

Worst case: Factor 100 in 5 years…

26,239,3

52,4 65,5

0,81,2

1,6 2,0

26,2

78,6

215,2393,2

0,1

1,0

10,0

100,0

1000,0

0:18 0:13 0:09 0:06 Silicon Process

TFs

"3D Eng Rack Comp. Power"

"apeNEXT Eq. Rack Comp. Power"

3D Eng. Multi Tiles

apeNEXT rack

PetaFlops class computer proposal

• Leverage on European leadership in embedded processor technology

• European collaboration (research + industry) to design a new computing architecture for scientific and engineering numerical applications

• Parameters:– (Less) dedicated architecture suitable for future great

challenging applications– 0.5/1 PetaFlops system (factor 50 better than apeNEXT)– 300W/TeraFlops– 10KEuro/Teraflops (factor 50 better than apeNEXT)– Programming environment to produce parallel code with very

high efficiency

Tools(1)

• EU Level– FP6 and beyond

• SHAPES (Scalable sw/Hw Architecture Platform for Embedded Systems)

– FP6-2004-IST-4 2.3.4(viii) Advanced Computing Architectures

– Partners: INFN-Roma, ATMEL-ROMA, ST, TIMA(FR), TARGET COMPILER(BE)….

– Target: technology R&D to study feasibility of 2TFs board in 4 years (Tiled architecture, NoC, Off-chip network and 3D Engineering multi board system)

• HPC Europe Initiative

– Joint action at EU level (France, Germany, UK, Spain + NederLand, Finland,Italy) to consolidate European role in supercomputing applications and to ensuring the availability of the most advanced supercomputer systems in the EU

– Main target: In 2010, 4 computing centre in Europe equipped with “general purpose” (->not-european) supercomputers

– 800 MEuro (!) partially funded by EU and national governements

Tools(2)

• National Level– PNR

• “High Performance Computing for scientific and engineering applications: architecture, hardware, development software and selected applications”– Partners: INFN, EUROTECH, CNR(MI), CILEA, SISSA, UNI MI

BICOCCA,UNI PADOVA– Target: Petaflops supercomputer suitable for

engineering and scientific applications – 200 W/Teraflops, 10 KEuro/Petaflops– Development cost (+ prototype procurement): 20 Meuro +

40 man/years– Project duration: 4 years

Backup slides

apeNET: status ed attivita’ future

• Testato con successo un cluster di 16 PCs interconnessi via apeNET– Performance misurate >800 MB/s

(send-receive) per direzione

• Ottimizzazioni SW/HW/FW– RDMA, Network driver, LAM/MPI

• INFN Roma2 ha finanziato un cluster da 128 nodi (dual Xeon + apeNET)– Fornitura prevista per Settembre

2005

• Attivita’ future:– PCI-X -> PCI Express – Integrazione di uP core su FPGA– Sviluppo applicazioni QCD e

dinamica molecolare

apeNEXT: l’ultima generazione

• Network di comunicazione a primo vicino debolmente sincrona

• Sistema scalabile da 25 GF a 6 TF – 16 processori per scheda (PB)– Sistemi 8x8x16=1024 nodi o 8x8x64=4096 nodi

• “Host system” realizzato con PC (Linux)

Z+(bp)

Y+(bp)

X+(cables)

0 2

4 6

8 10

12 14

1 3

5 7

9 11

13 15

J&T

DDR-MEM

X+

……… Z-

• Reticolo 3D di 4096 “nodi di calcolo” (6.5 TF)

– Processore custom VLSI- 200 MHz (J&T)

– 1.6 GFlops per nodo (a*b+c su dati complessi)

• 4Q03: !! apeNEXT running !!

apeNEXT

J&T Chip layout

PCI Host interfaceProcessing board

Next Rack

APENET

• Network d’interconnessione per PC cluster con topologia 3D toroidale per cluster di PC– apeLINK: PCI-X (133MHz) board

• 6 link LVDS, bidirezionali e full-duplex

• 700 MB/s per link per direzione (-> 8.4GByte/s)

• Link basati su National Instr. SERDES– Capacita’ di routing e switching

integrata– Alta banda passante e bassa latenza

grazie all’adozione di un protocollo “leggero”

Specialized Supercomputers Piero Vicini INFN Istituto Nazionale di Fisica Nucleare Italian National Institute for Nuclear Physics.

Documents

Specialized Supercomputers Piero Vicini INFN Istituto Nazionale di Fisica Nucleare Italian National Institute for Nuclear Physics.