7/31/2019 LecILP
1/37
slide 1
Outline
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem Interconnection networks
7/31/2019 LecILP
2/37
slide 2
Outline
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem Interconnection networks
Flynns [66]
Fengs [72]Hndlers [77]
Modern (Sima, Fountain & Kacsuk)
7/31/2019 LecILP
3/37
slide 3
Flynns Classification
Architecture Categories
SISD SIMD MISD MIMD
7/31/2019 LecILP
4/37
slide 4
SISD
C P MIS IS DS
7/31/2019 LecILP
5/37
7/31/2019 LecILP
6/37
slide 6
MISD
C
C
P
P
M
IS
IS
IS
IS
DS
DS
7/31/2019 LecILP
7/37
slide 7
MIMD
C
C
P
P
M
IS
IS
IS
IS
DS
DS
7/31/2019 LecILP
8/37
slide 8
Fengs Classification
1 16 32 64
1
16
64
256
16K
word length
bit slice
length
MPP
STARAN
C.mmP
PDP11
PEPE
IBM370
IlliacIV
CRAY-1
7/31/2019 LecILP
9/37
slide 9
Hndlers Classification
< K x K , D x D , W x W >
control data word
dash degree of pipelining
TI - ASC
CDC 6600 x (I/O)
C.mmP + +
PEPE
Cray-1
7/31/2019 LecILP
10/37
slide 10
Modern Classification
Parallelarchitectures
Data-parallel
architectures
Function-parallel
architectures
7/31/2019 LecILP
11/37
slide 11
Data Parallel Architectures
Data-parallel
architectures
Vector
architectures
Associative
And neural
architectures
SIMDs Systolic
architectures
7/31/2019 LecILP
12/37
slide 12
Function Parallel Architectures
Function-parallelarchitectures
Instr levelParallel Arch
Thread levelParallel Arch
Process levelParallel Arch
(ILPs) (MIMDs)
Pipelinedprocessors
VLIWs Superscalarprocessors
DistributedMemory
MIMD
SharedMemory
MIMD
7/31/2019 LecILP
13/37
slide 13
Outline
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem Interconnection networks
Pipelining VLIW
Superscalar
7/31/2019 LecILP
14/37
slide 14
Pipelining
IF D RF EX/AG M WB
faster throughput with pipelining
resource sharing across cycles
all instructions may not take same cycles
7/31/2019 LecILP
15/37
slide 15
Hazards in Pipelining
Procedural dependencies => Control hazards
conditional and unconditional branches, calls/returns
Data dependencies => Data hazards
RAW (read after write)
WAR (write after read)
WAW (write after write)
Resource conflicts => Structural hazards
use of same resource in different stages
7/31/2019 LecILP
16/37
slide 16
Pipeline Performance
CPI = 1 + (S - 1) * b
Time = CPI * T / S
TS stages
Frequency of interruptions - b
7/31/2019 LecILP
17/37
slide 17
Cache/
memory
Fetch
Unit Single multi-operation instruction
multi-operation instruction
FU FU FU
Register file
ILP in VLIW processors
7/31/2019 LecILP
18/37
slide 18
Cache/
memory
Fetch
UnitMultiple instruction
Sequential stream of instructions
FU FU FU
Register file
Decode
and issue
unit
Instruction/control
Data
FU Funtional Unit
ILP in Superscalar processors
7/31/2019 LecILP
19/37
slide 19
Why Superscalars are popular ?
Binary code compatibility among scalar &superscalar processors of same family
Same compiler works for all processors (scalars and
superscalars) of same family Assembly programming of VLIWs is tedious
Code density in VLIWs is very poor - Instruction
encoding schemes
7/31/2019 LecILP
20/37
slide 20
FU FU FU
Register file
Instruction encodingScalability: Access time, area, power consumption
sharply increase with number of register ports
Issues in VLIW Architecture
7/31/2019 LecILP
21/37
slide 21
Tasks of superscalar processing
Parallel Superscalar Parallel Preserving the Preserving the
decoding instruction instruction sequential sequential
issue execution consistency of consistency of
execution exceptionprocessing
7/31/2019 LecILP
22/37
slide 22
Outline
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem Interconnection networks
SIMD Processors
Vector Processors
Associative ProcessorsSystolic Arrays
7/31/2019 LecILP
23/37
slide 23
Data Parallel Architectures
SIMD ProcessorsMultiple processing elements driven by a single
instruction stream
Vector Processors
Uni-processors with vector instructions
Associative ProcessorsSIMD like processors with associative memory
Systolic ArraysApplication specific VLSI structures
7/31/2019 LecILP
24/37
slide 24
Systolic Arrays [H.T. Kung 1978]
Simplicity, Regularity, Concurrency, Communication
Example :
Band matrix multiplication
666564
56555453
45444342
34333231
232221
1211
666564
56555453
45444342
34333231
232221
1211
000
00
00
00
000
0000
000
00
00
00
000
0000
BBB
BBBB
BBBB
BBBB
BBB
BB
AAA
AAAA
AAAA
AAAA
AAA
AA
C
7/31/2019 LecILP
25/37
B11 B12
B21
B31
A11
A12
A21
A22
A31
A23
T=0
7/31/2019 LecILP
26/37
slide 26
Outline
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem Interconnection networks
MIMD Processors
- Shared Memory- Distributed Memory
7/31/2019 LecILP
27/37
7/31/2019 LecILP
28/37
slide 28
MIMD Architectures
Design Space
Extent of address space sharing
Location of memory modules
Uniformity of memory access
7/31/2019 LecILP
29/37
slide 29
Outline
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem Interconnection networks
Users perspective
Architects perspective
7/31/2019 LecILP
30/37
slide 30
Issues from users perspective
Specification / Program designexplicit parallelism or
implicit parallelism + parallelizing compiler
Partitioning / mapping to processors
Scheduling / mapping to time instants
static or dynamic
Communication and Synchronization
7/31/2019 LecILP
31/37
slide 31
Parallel programming models
Concurrentcontrol flow
Functional orlogic program
Vector/arrayoperations
Concurrenttasks/processes/threads/objects
With shared variablesor message passing
Relationship betweenprogramming modeland architecture ?
7/31/2019 LecILP
32/37
slide 32
Issues from architects perspective
Coherence problem in shared memory withcaches
Efficient interconnection networks
7/31/2019 LecILP
33/37
slide 33
Outline
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem Interconnection networks
Coherence Protocols
- Bus or directory based
- Invalidate or update- Definition of states
7/31/2019 LecILP
34/37
slide 34
Cache Coherence Problem
Multiple copies of data may exist
Problem of cache coherence
Options for coherence protocols
What action is taken?
Invalidate or Update
Which processors/caches communicate?
Snoopy (broadcast) or directory based
Status of each block?
7/31/2019 LecILP
35/37
7/31/2019 LecILP
36/37
slide 36
Interconnection Networks
Architectural Variations: Topology
Direct or Indirect (through switches)
Static (fixed connections) or Dynamic (connections
established as required)
Routing type store and forward/worm hole)
Efficiency:
Delay Bandwidth
Cost
7/31/2019 LecILP
37/37
slide 37
Books
D. Sima, T. Fountain, P. Kacsuk, "Advanced ComputerArchitectures : A Design Space Approach", Addison Wesley,1997.
M.J. Flynn, "Computer Architecture : Pipelined and ParallelProcessor Design", Narosa Publishing House/ Jones and Bartlett,
1996. D.A. Patterson, J.L. Hennessy, "Computer Architecture : AQuantitative Approach", Morgan Kaufmann Publishers, 2002.
K. Hwang, "Advanced Computer Architecture : Parallelism,Scalability, Programmability", McGraw Hill, 1993.
H.G. Cragon, "Memory Systems and Pipelined Processors",Narosa Publishing House/ Jones and Bartlett, 1998.
D.E. Culler, J.P Singh and Anoop Gupta, "Parallel ComputerArchitecture, A Hardware/Software Approach", Harcourt Asia /Morgan Kaufmann Publishers, 2000.