Page 1
CS668- Lecture 2 - Sept. 30Today’s topics
• Parallel Architectures (Chapter 2)• Memory Hierarchy• Busses and Switched Networks• Interconnection Network Topologies• Multiprocessors / Multicomputers• Flynn’s Taxonomy• Analysis of Interconnection Networks
Page 2
Theoretic Computer Architectures
• Turing Machine• Von Neumann Architecture• Fetch/Execute Cycle• Memory Models• RAM model• PRAM model extension• Shared Memory vs. Distributed Shared
Memory vs. Distributed Memory
Page 3
Processors and the Memory Hierarchy
• Registers (1 clock cycle, 100s of bytes)• 1st level cache (3-5 clock cycles, 100s KBytes)• 2nd level cache (~10 clock cycles, MBytes)• Main memory (~100 clock cycles, GBytes)• Disk (milliseconds, 100GB to gianormous)
registers
1st level Instructions
1st level Data
2nd Level unified (Instructions & Data)
CPU
Page 4
IBM Dual Core
From Intel® 64 and IA-32 Architectures Optimization Reference Manualhttp://www.intel.com/design/processor/manuals/248966.pdf
Page 5
Shared Memory Multiprocessor• One or more memories• Global address space (all system memory visible to all
processors)• Transfer of data between processors is usually implicit, just read
(write) to (from) a given address (OpenMP)• Complex Cache-coherency protocols to maintain consistency
between processors.
Interconnection Network
Memory
CPU
Memory
CPU
Memory
CPU
(UMA) Uniform-memory-access Shared-memory System
Page 6
Distributed Shared Memory
• Single address space with implicit communication• Hardware support for read/write to non-local memories, cache
coherency• Latency for a memory operation is greater when accessing non local
data than when accessing date within a CPU’s own memory
(NUMA)Non-Uniform-memory-access Shared-memory System
Interconnection Network
MemoryCPU MemoryCPU MemoryCPU
Page 7
Distributed Memory / Message Passing
• Each processor has access to its own memory only• Data transfer between processors is explicit, user calls message
passing functions• Common Libraries for message passing
– MPI, PVM• User has complete control/responsibility for data placement and
management
Interconnection Network
MemoryCPU MemoryCPU MemoryCPU
Page 8
Hybrid Systems
• Distributed memory system with multiprocessor shared memory nodes.
• Most common architecture for current generation of parallel machines
Interconnection Network
CPU
Mem
ory
CPU
CPU
Network Interface
CPU
Mem
ory
CPU
CPU
Network Interface
CPU
Mem
ory
CPU
CPU
Network Interface
Page 9
Flynn’s Taxonomy (figure 2.20 from Quinn)
SISDUniprocessor
SIMDProcessor arraysPipelined vector
processors
MISDSystolic array
MIMDMultiprocessorsMulticomputers
Single Multiple
Sin
gle
Mu
ltip
leData stream
Inst
ruct
ion
stre
am
Page 10
Analysis of Switch Network Topologies
• View switched network as a graph– n - Vertices = processors or switches– m - Edges = communication paths
• Two kinds of topologies– Direct - ratio of switches to processors 1:1– Indirect - ratio is d:1
Page 11
Evaluating Switch Topologies
• Diameter• Bisection width• Number of edges / node (d = degree)• Constant edge length? (yes/no)
– Layout area/wire length
Page 12
2-D Mesh Network
• Direct topology• Switches arranged into a 2-D lattice• Communication allowed only between
neighboring switches• Variants allow wraparound connections
between switches on edge of mesh
Page 14
Evaluating 2-D Meshes
• Diameter: (n1/2)
• Bisection width: (n1/2)
• Number of edges per switch: 4
• Constant edge length? Yes
Page 15
Binary Tree Network
• Indirect topology• n = 2d processor nodes, n-1 switches
Page 16
Evaluating Binary Tree Network
• Diameter: 2 log n
• Bisection width: 1
• Edges / node: 3
• Constant edge length? Yes/No?
Page 17
Hypertree Network
• Indirect topology• Shares low diameter of binary tree• Greatly improves bisection width• From “front” looks like k-ary tree of height
d• From “side” looks like upside down binary
tree of height d
Page 18
Hypertree Network
Page 19
Evaluating 4-ary Hypertree
• Diameter: log n
• Bisection width: n / 2
• Edges / node: 6
• Constant edge length? No
Page 20
Butterfly Network
• Indirect topology• n = 2d processor
nodes connectedby n(log n + 1)switching nodes
0 1 2 3 4 5 6 7
3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7
2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7
1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7
0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7Rank 0
Rank 1
Rank 2
Rank 3
Page 21
Butterfly Network Routing
Page 22
Evaluating Butterfly Network
• Diameter: log n
• Bisection width: n / 2
• Edges per node: 4
• Constant edge length? No
Page 23
Hypercube
• Directory topology• 2 x 2 x … x 2 mesh• Number of nodes a power of 2• Node addresses 0, 1, …, 2k-1• Node i connected to k nodes whose
addresses differ from i in exactly one bit position
Page 24
Hypercube Addressing
0010
0000
0100
0110 0111
1110
0001
0101
1000 1001
0011
1010
1111
1011
11011100
Page 25
Evaluating Hypercube Network
• Diameter: log n
• Bisection width: n / 2
• Edges per node: log n
• Constant edge length? No
Page 26
Shuffle-exchange
• Direct topology• Number of nodes a power of 2• Nodes have addresses 0, 1, …, 2k-1• Two outgoing links from node i
– Shuffle link to node LeftCycle(i)– Exchange link to node [xor (i, 1)]
Page 27
Shuffle-exchange Illustrated
0 1 2 3 4 5 6 7
Page 28
Shuffle-exchange Addressing
0000 0001 0010 0011 0100 0101
1110 11111000 1001 1010 1011 1100 1101
0110 0111
Page 29
Evaluating Shuffle-exchange
• Diameter: 2log n - 1
• Bisection width: n / log n
• Edges per node: 2
• Constant edge length? No
Page 30
Comparing Networks
• All have logarithmic diameterexcept 2-D mesh
• Hypertree, butterfly, and hypercube have bisection width n / 2
• All have constant edges per node except hypercube
• Only 2-D mesh keeps edge lengths constant as network size increases