Top Banner
SGI’2000 Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters
12

SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

Dec 15, 2015

Download

Documents

Ariel Seales
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

Supercomputers 2

With the acknowledgement of

Igor Zacharov and Wolfgang Mertz

SGI European Headquarters

Page 2: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

MIMD

MultiprocessorsSingle Address spaceShared Memory

MulticomputersMultiple Address spaces

UMACentral Memory

NUMAdistributed memory

NORMAno-remote memory access

PVP (Cray T90)

SMP (Intel SHV, SUN E10000, DEC 8400SGI Power Challenge, IBM R60, etc.)

COMA (KSR-1, DDM)

CC-NUMA(SGI Origin2000, SN1 (SGI3000), Cray T3E, HP Exemplar, Sequent NUMA-Q, Data General)

NCC-NUMA (Cray T3D, IBM SP3)

Cluster (IBM SP2, DEC TruCluster,Microsoft Wolfpack, “Beowolf”, etc.)

loosely coupled, multiple OS

“MPP” (Intel TFLOPS,TM-5)

tightly coupled & single OSMIMD Multiple Instruction s Multiple Data PVP Parallel Vector ProcessorUMA Uniform Memory Access SMP Symmetric Multi-ProcessorNUMA Non-Uniform Memory Access COMA Cache Only Memory ArchitectureNORMA No-Remote Memory Access CC-NUMA Cache-Coherent NUMAMPP Massively Parallel Processor NCC-NUMA Non-Cache Coherent NUMA

Classification of Computers

Page 3: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

Design Space of Competing Computer Architecture

Page 4: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

Processor

Cache

Processor

Cache

I/OI/OI/OI/OMain

MemoryMain

MemoryMain

MemoryMain

Memory

Processor

Cache

Central Bus

Structure of an SMP System (1)

• Does NOT scale due to Bus-saturation

• Bus is a very complex Component

• High Memory-Latency due to the Complexity

Page 5: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

Central Crossbar

Processor

Cache

Processor

Cache

I/OI/OI/OI/OMain

MemoryMain

MemoryMain

MemoryMain

Memory

Processor

Cache

Structure of an SMP System (2)

• Scales very well

• Crossbar is a very complex Component

• High Memory-Latency due to the Complexity

Page 6: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

^Nodeboard

I/O

Structure of an SMP System (3)Origin SGI NUMA Architecture

SGI NUMAhypercube

Global SwitchInterconnect N

N

R

R

R

R R

R

R

R

N

N

N

N

N

N

N

N

NN

N

N

N N

^Nodeboard

I/O

Page 7: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

Systems are built from Modules

Deskside(Module)

Rack(2 Modules)

Multi-rack(4 Modules)

Etc...

2-8 CPUs

16 CPUs

..128 CPUs

32 CPUs

Page 8: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

SGI Origin 3200SGI Onyx 3200

SGI Origin 3400SGI Onyx 3400

SGI Origin 3800SGI Onyx 3800

New High-End ProductsOrigin 3000 Servers – Onyx 3 Systems

IRIX 6.5

Page 9: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

SGI 3800 System (16-512p)

Minimum (16p) System 128p System

128P System Topology

R

Rack 1

C

CC

C

RC

CC

C

R

Rack 2

C

CC

C

R C

CC

C

R

Rack 3

C

CC

C

RC

CC

C

R

Rack 4

C

CC

C

R C

CC

C

1 2 3 4

Power Bay

Power Bay

I-Brick

C-Brick

C-Brick

C-Brick

Power Bay

R-Brick

C-Brick

R-Brick

C-Brick

C-Brick

C-Brick

Power Bay

C-Brick

C-Brick

C-Brick

C-Brick

Power Bay

R-Brick

C-Brick

R-Brick

C-Brick

C-Brick

C-Brick

Power Bay

C-Brick

C-Brick

C-Brick

C-Brick

Power Bay

R-Brick

C-Brick

R-Brick

C-Brick

C-Brick

C-Brick

Power Bay

C-Brick

C-Brick

C-Brick

C-Brick

Power Bay

R-Brick

C-Brick

R-Brick

C-Brick

C-Brick

C-Brick

Power Bay

C-Brick

Power Bay

Power Bay

I-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

Power Bay

Power Bay

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

Power Bay

Power Bay

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

Power Bay

Power Bay

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick

P, I, or, X-Brick P, I, or, X-Brick P, I, or, X-Brick

R-Brick8-port router

C-Brick

C-Brick

C-Brick

Power Bay

R-Brick

C-Brick

Power Bay

Page 10: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

ASCI Blue MountainLos Alamos National Laboratories

Origin 2000 with 3+ Tflops peak

1+ Tflop Application Performance

48 Systems with 128 CPUs each = 6144 CPUs

1536 Gbyte Memory

76 Tbyte Diskspace

Page 11: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

Spee

d of

Acc

ess

1/cl

ock

64reg

32KB(L1)

8MB(L2)

~1 - 100s GB

Cache subsystem memory

Device Capacity (size)

1

0.1

0.01

~4000 cy

~100 - 300 cy(NUMA)

~10 cy

~2-3cy

disk

Memory hierarchy

175 175235

285335 335

435485

585

343

554

759 759836

1067

1169

0

200

400

600

800

1000

1200

1400

2p 4p 8p 16p 32p 64p 128p 256p 512p

Rem

ote

Lat

ency

(n

s)

SN-MIPS Latency

Origin2000 Latency

Page 12: SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

SGI’2000Parallel Programming Tutorial

I/O

Web serving

Weather simulation CPU

Storage

Repository / archive

Signal processing

Media streaming

Traditional big supercomputer

Scale in Any and All Dimensions

NUMAflex™Flexible Configuration