Top Banner

of 37

LecILP

Apr 05, 2018

Download

Documents

Gagandeep Kaur
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/31/2019 LecILP

    1/37

    slide 1

    Outline

    Classification

    ILP Architectures

    Data Parallel Architectures

    Process level Parallel Architectures

    Issues in parallel architectures

    Cache coherence problem Interconnection networks

  • 7/31/2019 LecILP

    2/37

    slide 2

    Outline

    Classification

    ILP Architectures

    Data Parallel Architectures

    Process level Parallel Architectures

    Issues in parallel architectures

    Cache coherence problem Interconnection networks

    Flynns [66]

    Fengs [72]Hndlers [77]

    Modern (Sima, Fountain & Kacsuk)

  • 7/31/2019 LecILP

    3/37

    slide 3

    Flynns Classification

    Architecture Categories

    SISD SIMD MISD MIMD

  • 7/31/2019 LecILP

    4/37

    slide 4

    SISD

    C P MIS IS DS

  • 7/31/2019 LecILP

    5/37

  • 7/31/2019 LecILP

    6/37

    slide 6

    MISD

    C

    C

    P

    P

    M

    IS

    IS

    IS

    IS

    DS

    DS

  • 7/31/2019 LecILP

    7/37

    slide 7

    MIMD

    C

    C

    P

    P

    M

    IS

    IS

    IS

    IS

    DS

    DS

  • 7/31/2019 LecILP

    8/37

    slide 8

    Fengs Classification

    1 16 32 64

    1

    16

    64

    256

    16K

    word length

    bit slice

    length

    MPP

    STARAN

    C.mmP

    PDP11

    PEPE

    IBM370

    IlliacIV

    CRAY-1

  • 7/31/2019 LecILP

    9/37

    slide 9

    Hndlers Classification

    < K x K , D x D , W x W >

    control data word

    dash degree of pipelining

    TI - ASC

    CDC 6600 x (I/O)

    C.mmP + +

    PEPE

    Cray-1

  • 7/31/2019 LecILP

    10/37

    slide 10

    Modern Classification

    Parallelarchitectures

    Data-parallel

    architectures

    Function-parallel

    architectures

  • 7/31/2019 LecILP

    11/37

    slide 11

    Data Parallel Architectures

    Data-parallel

    architectures

    Vector

    architectures

    Associative

    And neural

    architectures

    SIMDs Systolic

    architectures

  • 7/31/2019 LecILP

    12/37

    slide 12

    Function Parallel Architectures

    Function-parallelarchitectures

    Instr levelParallel Arch

    Thread levelParallel Arch

    Process levelParallel Arch

    (ILPs) (MIMDs)

    Pipelinedprocessors

    VLIWs Superscalarprocessors

    DistributedMemory

    MIMD

    SharedMemory

    MIMD

  • 7/31/2019 LecILP

    13/37

    slide 13

    Outline

    Classification

    ILP Architectures

    Data Parallel Architectures

    Process level Parallel Architectures

    Issues in parallel architectures

    Cache coherence problem Interconnection networks

    Pipelining VLIW

    Superscalar

  • 7/31/2019 LecILP

    14/37

    slide 14

    Pipelining

    IF D RF EX/AG M WB

    faster throughput with pipelining

    resource sharing across cycles

    all instructions may not take same cycles

  • 7/31/2019 LecILP

    15/37

    slide 15

    Hazards in Pipelining

    Procedural dependencies => Control hazards

    conditional and unconditional branches, calls/returns

    Data dependencies => Data hazards

    RAW (read after write)

    WAR (write after read)

    WAW (write after write)

    Resource conflicts => Structural hazards

    use of same resource in different stages

  • 7/31/2019 LecILP

    16/37

    slide 16

    Pipeline Performance

    CPI = 1 + (S - 1) * b

    Time = CPI * T / S

    TS stages

    Frequency of interruptions - b

  • 7/31/2019 LecILP

    17/37

    slide 17

    Cache/

    memory

    Fetch

    Unit Single multi-operation instruction

    multi-operation instruction

    FU FU FU

    Register file

    ILP in VLIW processors

  • 7/31/2019 LecILP

    18/37

    slide 18

    Cache/

    memory

    Fetch

    UnitMultiple instruction

    Sequential stream of instructions

    FU FU FU

    Register file

    Decode

    and issue

    unit

    Instruction/control

    Data

    FU Funtional Unit

    ILP in Superscalar processors

  • 7/31/2019 LecILP

    19/37

    slide 19

    Why Superscalars are popular ?

    Binary code compatibility among scalar &superscalar processors of same family

    Same compiler works for all processors (scalars and

    superscalars) of same family Assembly programming of VLIWs is tedious

    Code density in VLIWs is very poor - Instruction

    encoding schemes

  • 7/31/2019 LecILP

    20/37

    slide 20

    FU FU FU

    Register file

    Instruction encodingScalability: Access time, area, power consumption

    sharply increase with number of register ports

    Issues in VLIW Architecture

  • 7/31/2019 LecILP

    21/37

    slide 21

    Tasks of superscalar processing

    Parallel Superscalar Parallel Preserving the Preserving the

    decoding instruction instruction sequential sequential

    issue execution consistency of consistency of

    execution exceptionprocessing

  • 7/31/2019 LecILP

    22/37

    slide 22

    Outline

    Classification

    ILP Architectures

    Data Parallel Architectures

    Process level Parallel Architectures

    Issues in parallel architectures

    Cache coherence problem Interconnection networks

    SIMD Processors

    Vector Processors

    Associative ProcessorsSystolic Arrays

  • 7/31/2019 LecILP

    23/37

    slide 23

    Data Parallel Architectures

    SIMD ProcessorsMultiple processing elements driven by a single

    instruction stream

    Vector Processors

    Uni-processors with vector instructions

    Associative ProcessorsSIMD like processors with associative memory

    Systolic ArraysApplication specific VLSI structures

  • 7/31/2019 LecILP

    24/37

    slide 24

    Systolic Arrays [H.T. Kung 1978]

    Simplicity, Regularity, Concurrency, Communication

    Example :

    Band matrix multiplication

    666564

    56555453

    45444342

    34333231

    232221

    1211

    666564

    56555453

    45444342

    34333231

    232221

    1211

    000

    00

    00

    00

    000

    0000

    000

    00

    00

    00

    000

    0000

    BBB

    BBBB

    BBBB

    BBBB

    BBB

    BB

    AAA

    AAAA

    AAAA

    AAAA

    AAA

    AA

    C

  • 7/31/2019 LecILP

    25/37

    B11 B12

    B21

    B31

    A11

    A12

    A21

    A22

    A31

    A23

    T=0

  • 7/31/2019 LecILP

    26/37

    slide 26

    Outline

    Classification

    ILP Architectures

    Data Parallel Architectures

    Process level Parallel Architectures

    Issues in parallel architectures

    Cache coherence problem Interconnection networks

    MIMD Processors

    - Shared Memory- Distributed Memory

  • 7/31/2019 LecILP

    27/37

  • 7/31/2019 LecILP

    28/37

    slide 28

    MIMD Architectures

    Design Space

    Extent of address space sharing

    Location of memory modules

    Uniformity of memory access

  • 7/31/2019 LecILP

    29/37

    slide 29

    Outline

    Classification

    ILP Architectures

    Data Parallel Architectures

    Process level Parallel Architectures

    Issues in parallel architectures

    Cache coherence problem Interconnection networks

    Users perspective

    Architects perspective

  • 7/31/2019 LecILP

    30/37

    slide 30

    Issues from users perspective

    Specification / Program designexplicit parallelism or

    implicit parallelism + parallelizing compiler

    Partitioning / mapping to processors

    Scheduling / mapping to time instants

    static or dynamic

    Communication and Synchronization

  • 7/31/2019 LecILP

    31/37

    slide 31

    Parallel programming models

    Concurrentcontrol flow

    Functional orlogic program

    Vector/arrayoperations

    Concurrenttasks/processes/threads/objects

    With shared variablesor message passing

    Relationship betweenprogramming modeland architecture ?

  • 7/31/2019 LecILP

    32/37

    slide 32

    Issues from architects perspective

    Coherence problem in shared memory withcaches

    Efficient interconnection networks

  • 7/31/2019 LecILP

    33/37

    slide 33

    Outline

    Classification

    ILP Architectures

    Data Parallel Architectures

    Process level Parallel Architectures

    Issues in parallel architectures

    Cache coherence problem Interconnection networks

    Coherence Protocols

    - Bus or directory based

    - Invalidate or update- Definition of states

  • 7/31/2019 LecILP

    34/37

    slide 34

    Cache Coherence Problem

    Multiple copies of data may exist

    Problem of cache coherence

    Options for coherence protocols

    What action is taken?

    Invalidate or Update

    Which processors/caches communicate?

    Snoopy (broadcast) or directory based

    Status of each block?

  • 7/31/2019 LecILP

    35/37

  • 7/31/2019 LecILP

    36/37

    slide 36

    Interconnection Networks

    Architectural Variations: Topology

    Direct or Indirect (through switches)

    Static (fixed connections) or Dynamic (connections

    established as required)

    Routing type store and forward/worm hole)

    Efficiency:

    Delay Bandwidth

    Cost

  • 7/31/2019 LecILP

    37/37

    slide 37

    Books

    D. Sima, T. Fountain, P. Kacsuk, "Advanced ComputerArchitectures : A Design Space Approach", Addison Wesley,1997.

    M.J. Flynn, "Computer Architecture : Pipelined and ParallelProcessor Design", Narosa Publishing House/ Jones and Bartlett,

    1996. D.A. Patterson, J.L. Hennessy, "Computer Architecture : AQuantitative Approach", Morgan Kaufmann Publishers, 2002.

    K. Hwang, "Advanced Computer Architecture : Parallelism,Scalability, Programmability", McGraw Hill, 1993.

    H.G. Cragon, "Memory Systems and Pipelined Processors",Narosa Publishing House/ Jones and Bartlett, 1998.

    D.E. Culler, J.P Singh and Anoop Gupta, "Parallel ComputerArchitecture, A Hardware/Software Approach", Harcourt Asia /Morgan Kaufmann Publishers, 2000.