Top Banner
GPU Supercomputing N.D. Hari Dass Indian Institute of Science, Bangalore Poornaprajna Institute, Bangalore Saturday, August 22, 2009
24

Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Jun 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

GPU Supercomputing

N.D. Hari DassIndian Institute of Science, Bangalore

Poornaprajna Institute, Bangalore

Saturday, August 22, 2009

Page 2: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Supercomputing in Old Stone Age

• Long long ago Supercomputers had to be specially built.

• It required large memory blocks - expensive!!

• The interconnects were proprietary - also expensive, though with great performance!

• Additional features like large scale vector processing.

2

Saturday, August 22, 2009

Page 3: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Supercomputing in New Stone Age

• The idea was to use off the shelf desktops without monitors, connect them with networks with as high bandwidth and as low latency as possible.

• Distribute the memory• Era of Clusters

3

Saturday, August 22, 2009

Page 4: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

KABRU – The Massive Cluster at IMSc

Saturday, August 22, 2009

Page 5: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Saturday, August 22, 2009

Page 6: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Saturday, August 22, 2009

Page 7: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Saturday, August 22, 2009

Page 8: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Supermicro Twin - 2 Nodes in 1UNode 1

Node 2

1U Twin™ is Supermicro innovative designed 1U rack mount system for increasing computing density, saving cost, and reducing energy and space requirements. Supports Dual Xeon Dual/Quad Core CPUs (up to 16 cores in 1 U, up to 672 cores in a 42U rack)

1U Twin systemcontains two independent symmetric motherboards!!!

Saturday, August 22, 2009

Page 9: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Twin Motherboards

Saturday, August 22, 2009

Page 10: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Supermicro Twin - Specifications• Supports up to two Intel® Xeon® 51xx, 52xx,

53xx & 54xx processors per node 1600/1333/1066MHz System Bus

• Supports up to 64GB memory per node DDR2-667/800(1.8V/1.5V) FBDIMMs (1.5V FBDIMMs consume less power and generate less heat)

• Available with GbE/DDR IB/10Gb Ethernet• PCI-Express x16 expansion slot• High-efficiency shared power supply (93%

efficiency)

Saturday, August 22, 2009

Page 11: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Supermicro Blade

• 90% cable reduction Results in better airflow & better cooling• Easier and faster to deploy & troubleshoot• Common, Shared, Redundant and high-efficiency power supply (90%-93% efficiency)

• 7U Blade chassis• Can accommodate 10 Dual-Processor or Quad-processor blades• Up to 160 cores per 7U or 960 cores per 42U rack (using quad-processor blades)• Up to 32GB/64 memory per Dual/Quad processor blade• DDR Infiniband available as option

Saturday, August 22, 2009

Page 12: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Clusters: Then & Now

2003 NOW

1U TWIN BLADE

No. Of CPU

164 20 20 20

Rack Space

82U 10U 5U 7U

WATTS 25KW

4KW 3.85KW 3.85KW

Saturday, August 22, 2009

Page 13: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Twin-U Vs Blade

Twin 1U Blades More Compact/Less space (0.5U)

0.7U

Cheaper Expensive Std. PCI-Express Expansion

Mezzanine Expansion

Power supply not redundant

Redundant Power

Cabling is a mess

Lesser/Neater cabling

Saturday, August 22, 2009

Page 14: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Some of the problems..

• Slow PCI slot performance• Memory access bottlenecks

14

Saturday, August 22, 2009

Page 15: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Core Incompetence?

15

Single 493 MB 81.2 s 1.936µs --

2 Cores 246.5 43.1 s 2.06µs 788 MB/s

4 Cores 129 33.3 s 3.18µs 4928 Cores1-D

70.4 32.2 s 6.15µs 173

8 Cores3-D

61.7 31.6 s 6.03µs 414

Intel 2xQuad Core @ 2.8 GHz

Saturday, August 22, 2009

Page 16: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Core Incompetence?

16

AMD 2xQuad 2111 GHz

1 Core 492 147 s 3.5µs

2 Cores 246 72.32 s 3.448µs

4 Cores 129 47.8 4.56µs

8 Cores 70 29.3 5.6

Saturday, August 22, 2009

Page 17: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Intel Nehalem

• This architecture has significantly overcome the FSB bottlenecks.

• The scaling from 1 to 2, 2 to 4 cores is excellent.

• The scaling from 4 to 8 is good though not as good as in the case of AMD

• But the overall performance of Nehalem better than that of AMD

17

Saturday, August 22, 2009

Page 18: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Speed - Memory Issue

• As the number of cores goes up the CPU performance (theoretical peak) increases.

• KABRU: 4.8 GFlops/CPU• Intel Quad Core: 50 GFlops/CPU• It becomes harder to maintain the ratio of

‘Memory to Performance’.• Issues with increasing memory: different

chipset, power consumption, ...

18

Saturday, August 22, 2009

Page 19: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

GPU Based Supercomputing

• On a single Tesla C1060 card the claimed peak performance of 1Teraflops in single precision!

• Four such cards can sit in a single 1U box• Cost of such GPGPU supercomputers is

about 5 lakh rupees.• Nearly 4 times as fast as Kabru but

costing 50 times less!• Power consumption about 800 W - 40

times less; no airconditioning/infrastructure19

Saturday, August 22, 2009

Page 20: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

A Tesla C1060 Card

20

Saturday, August 22, 2009

Page 21: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

4 Tesla In 1U

21

Saturday, August 22, 2009

Page 22: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Issues with GPU’s

• Codes should have a high degree of data parallelism.

• Available dedicated memory rather low - even for Tesla C1060 cards it is 4 GB per card.

• Double precision performance much poorer than single precision performance - factor 12 lower!!

• Due to register structure - an improvement by a factor of 3 talked about. 22

Saturday, August 22, 2009

Page 23: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Issues with GPU’s

• If the code is a mixture of single and double precisions with the volume of latter around 10% still OK.

• Exploiting the host CPU’s an option.• Transfers between CPU and GPU through

the PCI x16 Gen 2.0 technology.• Transfer speed nowhere compared to, say,

between CPU & Cache• Often better to perform a fresh calculation

instead of fetching processed data 23

Saturday, August 22, 2009

Page 24: Building a Teraflop Supercomputer for IndiaSupermicro Twin - 2 Nodes in 1U Node 1 Node 2 1U Twin is Supermicro innovative designed 1U rack mount system for increasing computing density,

Issues with GPU’s

• Have to code using a new ‘language’ - CUDA in the case of NVIDIA cards

• Not really a problem for moderate sized codes but can be an issue for large codes

• Requires a dexterous management of CPU and GPU resources

• But considering the phenomenal performance improvements that are being talked about, worth the trouble!!

• Intel Larrabie ?? 24

Saturday, August 22, 2009