San Diego, March 27th 2003 Roberto De Pietri -- chep03 1 apeNEXT * The apeNEXT multi-TFlops LGT supercomputer: architecture description and project status report * The apeNEXT project Roberto De Pietri ([email protected]) Università di Parma & INFN gruppo collegato di Parma
40
Embed
San Diego, March 27th 2003Roberto De Pietri -- chep031 apeNEXT * The apeNEXT multi-TFlops LGT supercomputer: architecture description and project status.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
San Diego, March 27th 2003
Roberto De Pietri -- chep03 1
apeNEXT* The apeNEXT multi-TFlops LGT supercomputer: architecture description and project status report
* The apeNEXT project
Roberto De Pietri ([email protected])Università di Parma & INFN gruppo collegato di Parma
San Diego, March 27th 2003
Roberto De Pietri -- chep03 2
The APE familyOur line of Home Made Computers
APE(1988)
APE100(1993)
APEmille(1999)
apeNEXT(2003)
Architecture SIMD SIMD SIMD SIMD++
# nodes 16 2048 2048 4096
Topology flexible 1D
rigid 3D flexible 3d flexible 3D
Memory 256 MB 8 GB 64 GB 1 TB
# registers (w.size)
64 (x32) 128 (x32) 512 (x32) 512 (x64)
clock speed 8 MHz 25 MHz 66 MHz 200 MHz
Total Computing Power of all …
~1.5 GFlops
~ 250 GFlops
~ 2 TFlops ~ 8-20 TFlops
San Diego, March 27th 2003
Roberto De Pietri -- chep03 3
APE (‘88) 1 GFlops
San Diego, March 27th 2003
Roberto De Pietri -- chep03 4
The APE paradigm
Very efficient for LQCD The normal operation as a basic operation Native implementation of the complex type a x b + c (complex numbers)
Large number of register Efficient optimizations
VLIW (very long instruction word) Reliable and safe HW solution Easy to program software tools
APEse, TAO Machine simulator
San Diego, March 27th 2003
Roberto De Pietri -- chep03 5
Since APE 100 Our own designed VLSI
Pipelined normal operation on a chip (MAD) 3D topology
Remote I/O and X - link ON CABLE Y and Z – link on the BACKPLANE
Large number of APEmille installation in Europe 30 crate (~ 65 GFlops) Almost 2 TeraFlops of computing power
✔ January 2001: VHDL design starts✔ May 2001: Contract with Atmel established✔ November 2001: First placement experiment started
✔ February 2002: Major rework on the network protocol (to increase robustness against transmission errors).
✔April 2002: Network OK, re-start placement exercises✔June 2002: Good placement available✔June 2002 (end): Satisfactory routing available
✔July 2002(beginning): Power routing not OK and✔ 5% of “random logic” removed✔July 2002(end): Both problems solved
………………Continues on next slides .............................
San Diego, March 27th 2003
Roberto De Pietri -- chep03 25
Key steps of the J&T design (2)
✔September 2002: New placement available (with new power layout)✔September 2002: Excessive congestion .... OR✔October 2002: Very bad timing closure✔November 2002: Satisfactory placement OK✔Dec. 9th 2002: successful routing completed.
✔January 2003: Timing analysis reasonably satisfactory✔January 2003: Simulations with back annotation OK ✔January 2003: Analysis of critical path (dangerous and not)✔February 2003: Hammering down remaining timing problems✔February 2003: Careful analysis of all risky corners✔February 2003: Transfer of simulation data to Atmel✔End of March Final sign off (Laura …. is working on it…..)
San Diego, March 27th 2003
Roberto De Pietri -- chep03 26
Timing J&T ready June 03
We will receive between 300 to 600 chips We need 256 processor to assemble a crate !!
We expect them to work !! The same team designed 7 ASICs of similar complexity Impressive full-detailed simulations of multiple J&T systems More one simulate less one has to test !!
Everything else ready and tested Within days/weeks the first working apeNEXT computer will
operate
September ’03 mass production will star (hopefully) at Neuricam INFN already founded 8 TFlops of computing power !!
San Diego, March 27th 2003
Roberto De Pietri -- chep03 27
Mechanics DC/DC
J&T Module
apeNEXT PB
J&T Module
Board-to-Board Connector
AIR-FLOWCHANNEL
2
TOP VIEW ( local )
AIR-FLOW CHANNEL
1
AIR-FLOWCHANNEL
3
AIR-FLOWCHANNEL
3
Fra
me
a1
b1
b3
a3
b2
a2
PB constraints:
• Power consumption: up to 340W
• PB-BP insertion force: 80-150 Kg (!)
• Fully populated PB weight: 4-5 Kg
Custom design of card frame and insertion tool
Detailed study of airflow
San Diego, March 27th 2003
Roberto De Pietri -- chep03 28
• T, V, I monitored;• Interfaced to I2C control network
PB Prototype
San Diego, March 27th 2003
Roberto De Pietri -- chep03 29
PB (preliminary)Test• Next Test-Bed: metal frame with power supply
• I2C Test i.e. test of “slow-control” I/O intf.
• minimal set of components assembled•simple/short test (1 week) •done succesfully (Dec 01)
• Clock distribution test• PB LVDS characterization