Top Banner
[email protected] Enabling Technologies for System-on-Chip Development, November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/ Reconfigurable Computing Architectures and Methodologies for System-on-Chip; Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs. Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing Part 2: Stream-based Computing for RC Wednesday, November 21, 10.30 12.00 hrs. Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland © 2001, [email protected] http://www.fpl.uni-kl.de University of Kaiserslautern Xputer Lab 2 Schedule time slot 08.30 10.00 Reconfigurable Computing (RC) 10.00 10.30 coffee break 10.30 12.00 Stream-based Computing for RC 12.00 14.00 lunch break 14.00 15.30 Resources for RC 15.30 16.00 coffee break 16.00 17.30 FPGAs: recent developments © 2001, [email protected] http://www.fpl.uni-kl.de University of Kaiserslautern Xputer Lab 3 >> EDA revolution EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de © 2001, [email protected] http://www.fpl.uni-kl.de University of Kaiserslautern Xputer Lab 4 EDA: where Electronics begins [Richard Newton] 1k Dataquest Initiative New book NASDAQ index EDA index © 2001, [email protected] http://www.fpl.uni-kl.de University of Kaiserslautern Xputer Lab 5 [Richard Newton] © 2001, [email protected] http://www.fpl.uni-kl.de University of Kaiserslautern Xputer Lab 6 The End is near year to market 10 0 10 3 10 6 10 9 10 12 10 15 1960 1970 1980 1990 2000 2010 2020 2030 2040 transistors/chip The end of Hypergrowth ?
11

Reiner Hartenstein, University of Kaiserslautern, GermanyUniversity of Kaiserslautern 14 Dead Supercomputer Society •ACRI •Alliant • •American Supercomputer •Ametek •Applied

Oct 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    Enabling Technologies for

    Reconfigurable Computing

    Enabling Technologies for Reconfigurable Computing Part 2: Stream-based Computing for RC

    Wednesday, November 21, 10.30 – 12.00 hrs.

    Reiner Hartenstein

    University of Kaiserslautern

    November 21, 2001, Tampere, Finland

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    2

    Schedule

    time slot

    08.30 – 10.00 Reconfigurable Computing (RC)

    10.00 – 10.30 coffee break

    10.30 – 12.00 Stream-based Computing for RC

    12.00 – 14.00 lunch break

    14.00 – 15.30 Resources for RC

    15.30 – 16.00 coffee break

    16.00 – 17.30 FPGAs: recent developments

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    3

    >> EDA revolution

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    4

    EDA: where Electronics begins [Richard Newton]

    1k

    • Dataquest Initiative

    New book

    • NASDAQ index

    EDA index

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    5

    [Richard Newton]

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    6

    The End is near

    year to market 10 0

    10 3

    10 6

    10 9

    10 12

    10 15

    1960 1970 1980 1990 2000 2010 2020 2030 2040

    transistors/chip

    The end of Hypergrowth ?

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    7

    Paradigm

    Shift

    Mainstream

    Tornado

    Development of Hypergrowth Markets

    Harper Business 1995

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    8

    Makimoto’s 3rd wave

    The next EDA Industry Revolution

    1978

    Transistor entry: Applicon, Calma, CV ...

    1992

    Synthesis: Cadence, Synopsys ... 1985

    Schematics entry: Daisy, Mentor, Valid ...

    [Keutzer / Newton]

    EDA industry paradigm switching every 7 years

    1999 (Co-) Compilation

    Stream-based DPU arrays

    [Hartenstein]

    2006

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    9

    Biggest Mistake in History

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    10

    Innovation Stalled ? [Richard Newton]

    What is next after VHDL ?

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    11

    What is next after VHDL ?

    Motivations • HDL-savvy designers needed • New Business Model • Co-Design never ending • HDLs ? • Extended HDLs – how far ? • Automatic Partitioning

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    12

    >> Dead Supercomputer

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    13

    Dead Supercomputer Society

    • 37 university and corporate R&D projects: 2 or 3 successes…

    • All the rest failed to work or to be successful (Research 1985-1995)

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    14

    Dead Supercomputer Society

    • ACRI • Alliant • American

    Supercomputer • Ametek • Applied Dynamics • Astronautics • BBN • CDC • Convex • Cray Computer • Cray Research • Culler-Harris • Culler Scientific • Cydrome • Dana/Ardent/

    Stellar/Stardent • DAPP

    • Denelcor • Elexsi • ETA Systems • Evans and Sutherland • Computer • Floating Point Systems • Galaxy YH-1 • Goodyear Aerospace MPP • Gould NPL • Guiltech • ICL • Intel Scientific Computers • International Parallel

    Machines • Kendall Square Research • Key Computer Laboratories

    [Gordon Bell, keynote at ISCA 2000].

    •MasPar •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    15

    Dead Supercomputer Society • ACRI • Alliant • American Supercomputer • Ametek • Applied Dynamics • Astronautics • BBN • CDC • Convex • Cray Computer • Cray Research • Culler-Harris • Culler Scientific • Cydrome • Dana/Ardent/Stellar/Stardent • DAP (ICL) • Denelcor • Elexsi • ETA Systems • Evans and Sutherland Computer • Floating Point Systems • Galaxy YH-1

    • Goodyear Aerospace MPP • Gould NPL • Guiltech • Intel Scientific Computers • International Parallel Machines • Kendall Square Research • Key Computer Laboratories • MasPar • Meiko • Multiflow • Myrias • Numerix • Prisma • Tera • Thinking Machines • Saxpy • Scientific Computer Systems (SCS) • Soviet Supercomputers • Supertek • Supercomputer Systems • Suprenum • Vitesse Electronics

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    16

    >> Stream-based Computing

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    17

    Coarse Grain Reconfigurable Arrays vs. Parallel Processes

    I-Seq ALU

    I-Seq ALU I-Seq ALU

    I-Seq ALU I-Seq ALU

    I-Seq ALU

    I-Seq ALU I-Seq ALU

    • • •

    • • •

    I-Seq ALU

    • • •

    • • •

    • • •

    • • •

    • • •

    • • •

    Data

    Sequencer

    rALU rALU rALU

    rALU rALU rALU

    rALU rALU rALU

    Paralellität auf Prozeß-Ebene Paralellität auf Datenpfad-Ebene

    Parallelism at Process Level

    Parallelism at Datapath Level

    reconfigurable hardwired no instruction

    sequencing !

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    18

    Concurrent Computing

    DPU instruction sequencer

    DPU instruction sequencer

    DPU instruction sequencer

    DPU instruction sequencer

    ....

    Bus(es) or switch box

    CPU extremely inefficient

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    19

    Stream-based Computing

    DPU DPU DPU DPU

    driven by data stream from / to memory or, from / to peripheral interface

    transport-triggered execution no instruction sequencer inside !

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    20

    Stream-based Computing: (r)DPU array

    for both,

    reconfigurable,

    and, hardwired

    DPU DPU DPU

    DPU DPU DPU

    DPU DPU DPU

    driven by data streams

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    21

    >>> extremely high efficiency

    • avoiding address computation overhead

    • avoiding instruction fetch and interpretation overhead

    • high parallelism, massively multiple deep pipelines

    • much less configuration memory

    • no routing areas to configure functions from CLBs

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    22

    Systolic Stream-based Computing System

    Systolic Array [H. T. Kung, 1980]: an array of DPUs (Data Path Units)

    y 1 0 ( )

    y 2 0 ( )

    y 3 0 ( )

    x 1

    x 2

    x 3

    -

    -

    -

    a 12

    a 11 a 21

    a 32

    a 31

    a 23

    a 33

    a 22

    a 13

    -

    -

    y 1

    y 2

    y 3

    -

    -

    -

    -

    DPU architecture

    y

    + *

    x

    a

    data

    streams

    equations

    placement linear

    projection

    or

    algebraic

    mapping

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    23

    computing in space

    Computing in space and time

    data

    streams

    y 1 0 ( )

    y 2 0 ( )

    y 3 0 ( )

    -

    -

    -

    y 1

    y 2

    y 3

    -

    -

    -

    x 1

    x 2

    x 3

    -

    - -

    computing in time

    a 12

    a 11 a 21

    a 32

    a 31

    a 23

    a 33

    a 22

    a 13

    placement

    systolic arrays etc.

    and other transformations migration by re-timing

    this dichotomy is completely ignored

    by our CS curricula

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    24

    2

    General Stream-based Computing System heterogenous Array of DPUs (data path units)

    Scheduler

    Mapper

    expression tree

    DPU architectures

    y

    + *

    x

    a

    1

    simultaneous

    placement

    & routing

    3

    +

    + +

    +

    *

    *

    * sh *

    sh

    sh sh

    xf

    xf

    -

    - data

    streams

    4

    The same mapper for both: Reconfigurable, or hardwired

    Kress DPSS [1995]

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    25

    Converging Design Flows

    this synthesis method is a generalization of

    systolic array synthesis: super systolic synthesis

    and DPA [Broderson, 2000]:

    terms:

    DPU: datpath unit

    DPA: data path array

    rDPU: reconfigurable DPU

    rDPA: reconfigurable DPA

    the same synthesis method may be used for mapping an algorithm onto both:

    rDPA [Kress, 1995],

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    26

    Super Pipe Networks

    pipeline properties array applications

    shape resources

    mapping scheduling

    (data stream formation)

    systolic array

    regular data

    dependencies only

    linear only

    uniform only

    linear projection or algebraic synthesis

    super-systolic rDPA

    no restrictions simulated

    annealing or P&R algorithm

    (e.g. force-directed) scheduling algorithm

    *

    *) KressArray [1995]

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    27

    >> Stream-based Memory Architecture

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    28

    Hot Research Topic: Memory Architectures

    • High Performance Embedded Memory Architectures

    • High Performance Memory Communication Architectures [Herz]

    • Custom Memory Management Methodology [Cathoor]

    • Data Reuse Transformations [Kougia et al.]

    • Data Reuse Exploration [Soudris, Wuytak]

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    29

    Processor Memory Performance Gap

    1

    10

    100

    1000 Performance

    1980 1990 2000

    µProc

    60%/yr..

    DRAM

    7%/yr..

    Processor-Memory

    Performance Gap:

    (grows 50% / year)

    DRAM

    CPU

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    30

    RAs: Cache does not help

    • the memory bandwidth problem is often more dramatic then for microprocessors

    • interleaving is not practicable, since based on sequential instruction streams

    • classical caches do not help, since instruction sequencing is not used

    • the problem: throughput of parallel data streams, not instruction streams

    • super pipe networks, no parallel computers !

    • Stream-based arrays are a memory bandwidth problem

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    31

    http://kressarray.de

    Efficient Memory Communication should be directly supported by the Mapper Tools

    sequencers

    memory ports

    application

    not used

    Legend: Optimized Parallel Memory Controller

    An example by Nageldinger’s KressArray Xplorer

    Synthesizable Memory Communication

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    32

    The Disk Farm? or a System On a Card?

    The 500GB disc card LOTS of bandwidth A few disks replaced by >10s Gbytes RAM and a processor

    14"

    MicroDrive:1.7” x 1.4” x 0.2” 2006: ?

    1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)

    Integrated IRAM processor 2x height

    Connected via crossbar switch growing like Moore’s law

    16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops

    [Gordon Bell, Jim Gray,

    ISCA2000]

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    33

    Memory Communication Architecture

    • hot research topic in embedded systems

    • storage context transformations [Herz, others]

    • for low power

    • for high performance

    • startups provide memory IP or generators

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    34

    Stream-based Soft Machine

    Scheduler Memory

    (data memory)

    memory bank

    memory bank

    memory bank

    memory bank

    memory bank

    ...

    ...

    “instructions”

    rDPA Compiler

    Sequencers (data stream

    generator)

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    35

    >> Design Space Explorers

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    36

    • domain-specific Reconfigurable Platforms will be suitable to cope with the 2nd Design Crisis

    • just as the general purpose massively parallel computer system

    general purpose is unrealistic

    an Illusion ...

    KressArray Explorer ...

    • fully general purpose reconfigurable sometimes is ....

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    37

    Universal RAs: is it feasible?

    ... such as obviously also the Universal Massively Parallel Computer Architecture

    ... counter-example: Application Domain of Image Processing

    The General Purpose (coarse grain) Reconfigurable Array

    appears to be an Illusion ...

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    38

    -> Design Space Exploration

    • Design Space Exploration – Design Space Explorer (DSEs) – Platform Space Explorers (PSEs) – Compiler / PSE symbiosis – Parallel computing vs. reconfigurable

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    39

    Design Space Exploration Systems

    Explorer System year source inter-active

    status evaluation status generation

    DPE 1991 [66] no abstract models rule-based

    Clio 1992 [67] yes prediction models device generator

    DIA 1998 [68] yes prediction from library rule-based

    DSE for RAW 1998 [49] no analytical models analytical

    ICOS 1998 [76] no fuzzy logic greedy search

    DSE for Multimedia

    1999 [77] no simulation branch and bound

    Xplorer 1999 [11] [50] yes fuzzy rule-based simulated annealing

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    40

    DSEs: an overview

    • For VLSI design in general

    • for parallel Computer Systems

    • Xplorer the only one for reconfigurable platforms (auch MATRIX ?)

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    41

    >> KressArray Xplorer

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    42

    KressArray DPSS

    Application Set

    DPSS

    published at ASP-DAC 1995

    Architecture Editor

    Mapping Editor

    statist. Data

    Delay Estim.

    Analyzer

    Architecture Estimator

    interm. form 2

    expr. tree

    ALE-X Compiler

    Power Estimator

    Power Data

    VHDL Verilog

    HDL Generator Simulator

    User

    ALEX Code

    Improvement Proposal Generator

    Suggestion

    Selection User

    Interface

    interm. form 3

    Mapper

    Design Rules

    Datapath Generator Generator

    Kress rDPU

    Layout

    data stream Schedule

    Scheduler

    KressArray Xplorer (Platform Design Space Explorer)

    Xplorer

    Inference Engine (FOX)

    Sug- gest- ion

    KressArray family

    parameters

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    43

    Architecture & Mapping Editor

    Sta

    tistic

    s

    KressArray DPSS

    Datastream Generator

    HDL Generator Simulator

    Datapath Generator Generator

    Delay & Power

    Estimator Improvement

    Proposal Generator

    User DPSS

    Source Input KressArray

    (Design Space) Platform Space Explorer

    http://kressarray.de

    Xplorer

    Application Set

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    44

    Design Flow of Domain-specific

    Architecture Optimization

    ApplicationCompilation

    ApplicationSelection

    ApplicationMapping

    MappingAnalysis

    ModificationSuggestion

    ArchitectureModification

    ArchitectureVerification

    OptimizedArchitecture

    ApplicationSet

    Initial Arch.Estimation

    or benchmark

    Nageldinger’s KressArray

    Design Space Xplorer:

    including a

    Fuzzy Logic

    Improvement

    Proposal

    Generator accessible

    by internet: http://kressarray.de

    runs best with Netscape 4.6.1

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    45

    KressArray Design Space Xplorer

    DPSS-N Data Path Systhesis System

    Analyser

    HDL Generator HDL Description

    .v

    Module Generator

    .krs

    Kress IP Library

    other IP

    Editor / User Interface

    Architecture Estimation

    Intermediate Format

    .map

    ALE-X Compiler

    ALE-X Code

    .alex

    User

    Mapper

    Interm. Format

    .map

    including configware code

    Technology Mapping

    Scheduler

    Data .seq Sequencing

    Code

    Kress rDPU .krs Layout

    Placement & Routing

    M a p p i n

    g

    Statistical Data

    .stat

    to Synthesis Environment

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    46

    >> Machine paradigms

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    47

    d a ta cou n ter

    instructions

    program cou n ter: state register

    Compiler Memory

    Datapath

    har dw ired

    Sequencer

    Computer Computer tightly coupled

    by compact instruction code

    “von Neumann” “von Neumann” does not support

    soft data paths

    does not support

    soft data paths

    Datapath

    Xputer Xputer

    Scheduler

    Compiler Memory

    multiple sequencer

    Datapath Array

    “instructions”

    University of Kaiserslautern

    Xputer Lab

    loosely coupled by decision data bits only

    Xputer: Xputer: The Soft

    Machine

    Paradigm

    The Soft

    Machine

    Paradigm reconfigurable reconfigurable

    also for hardwired also for hardwired

    Computer: the wrong Machine Paradigm

    “von Neumann”

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    48

    Soft Machine Paradigm

    Xputer Xputer Parallel Xputer Parallel Xputer

    reconfigurable

    Scheduler

    Compiler Memory

    Sequencer Datapath

    “instructions”

    d a ta cou n ter

    Scheduler

    Compiler

    Sequencer Datapath

    Sequencer

    “instructions”

    d a ta cou n ters reconfigurable

    mem

    ory

    mem

    ory

    • • • •

    multiple

    Decision data only; i, e, loose coupling

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    49

    Computer: the wrong Machine Paradigm

    Compiler Memory

    Sequencer

    Decoder Datapath

    instructions

    program cou n ter

    har dw ired

    tightly coupled by a compact instruction code “von

    Neumann” “von Neumann” does not support

    soft data paths:

    does not support

    soft data paths:

    “von Neumann”

    at run time: no

    instruction fetch

    at run time: no

    instruction fetch

    :

    Instruction Sequencer

    Datapath

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    50

    Machine Paradigms

    machine categoryComputer

    (“v. Neumann”)Xputer

    (no transputer!)

    driven by: control flow data streams (no “dataflow”)

    engine principles instruction sequencing data sequencing

    state register program counter (multiple) data counter(s)

    communicationpath set-up

    at run time at load time

    resource single ALU array of ALUs & other rDPUsdatapath

    operation sequential parallel pipe network

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    51

    Machine Paradigms

    machine categoryComputer

    (“v. Neumann”)Xputer [8]

    (no transputer!)

    Machine paradigm procedural sequencing: deterministic

    driven by: control flow(no dataflow [13])

    data stream(s)

    RA support no yes

    engine principles Instruction sequencing data sequencing

    state register program counter (multiple) data counter(s)

    communicationpath set-up

    at run time at load time

    resource single ALU array of ALUsdatapath operation sequential parallel

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    52

    Fundamental Ideas available

    • Data Sequencer Methodology

    • Data-procedural Languages (Duality w. v. N.)

    • ... supporting memory bandwidth optimization

    • Soft Data Path Synthesis Algorithms

    • Parallelizing Loop Transformation Methods

    • Compilers supporting Soft Machines

    • SW / CW Partitioning Co-Compilers

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    53

    >> Co-Compilation

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    54

    FPGA-Style Mapping for coarse grain reconfigurable arrays

    mapping Kress DPSS CHESS RaPiD Colt

    placement simulated annealinggenetic

    algorithm

    routing

    simulatedannealing

    Pathfindergreedy

    algorithm

    Compiler

    Mapper

    Scheduler specifies and

    assembles the data streams

    from / to array

    DPSS

    KressArray DPSS

    (Datapath Synthesis System)

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    55

    Changing Models of Computing

    “von Neumann”

    downloading

    RAM

    downloading

    data path instruction sequencer

    I / O

    (procedural) Software

    contemporary

    host

    hardwired

    downloading

    accelerator(s)

    CAD

    RAM

    reconfigurable computing

    host

    re-

    downloading

    conf. accelerator(s)

    RAM RAM

    Software Configware

    ASICs

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    56

    Changing Models of Computation

    contemporary host

    hardwired

    Compiler

    accelerator(s)

    CAD

    RAM

    reconfigurable computing

    host

    re-

    Co-Compiler

    conf. accelerator(s)

    RAM RAM

    Software Configware

    ASICs

    *) even 80% hardware people hate their tools

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    57

    mProcessor

    Co-Compilation

    partitioning compiler

    Computer Machine Paradigm

    Software running on

    Xputer “Soft” Machine Paradigm

    Configware running on GNU C

    compiler Analyzer / Profiler

    supporting different platforms

    Resource Parameters

    inte

    rface

    X-C compiler

    Reconfigurable Accelerators KressArray

    DPSS

    high level programming language source X-C

    Partitioner

    Jürgen Becker’s Co-DE-X Co-Compiler [ASP-DAC’95]

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    58

    Co-Compilation

    Xputer “Soft” Machine Paradigm

    Configware running on

    partitioning compiler

    high level programming language source

    mProcessor Reconfigurable

    Accelerators interf

    ace

    Reconfigurable

    Architecture (RA)

    -- instead of hardwired

    We introduce: Co-Compilation

    Computer Machine Paradigm

    Software running on

    Xputer “Soft” Machine Paradigm

    Configware running on

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    59

    Jürgen Becker’s Co-DE-X Co-Compiler

    Analyzer / Profiler

    host

    GNU C compiler

    para d igm Computer machine

    DPSS KressArray

    X-C compiler

    Xputer machine paradigm

    Partitioner

    X-C is C language extended by MoPL X-C

    Resource Parameters

    supporting different platforms

    supporting platform-based design

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    60

    Loop Transformation Examples

    loop 1-8 body body endloop

    loop 1-8 body endloop

    loop 9-16 body endloop

    fork

    join strip mining

    loop 1-4 trigger endloop

    loop 1-2 trigger endloop

    loop 1-8 trigger endloop

    reconf.array: host: loop 1-16 body endloop

    sequential processes: resource parameter driven Co-Compilation

    loop unrolling

  • [email protected]

    Enabling Technologies for System-on-Chip Development,

    November 19-20, 2001, Tampere, Finland http://www.cs.tut.fi/soc/

    Reconfigurable Computing Architectures and Methodologies for System-on-Chip;

    Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.

    Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    61

    History of Loop Transformations

    David Loveman, 1977, Allen and Kennedy, et al.

    Loop Unrolling, Loop Fusion, Strip Mining ....

    • (Parameter-driven) Time to Time/Space Partitioning 1995/97 [Karin Schmidt / Jürgen Becker]: downto Datapath Level:

    e. g.: Transformation from Sequential Process to Super-systolic

    • Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks

    2000 [Michael Herz]: optimized RA to Memory Communication Bandwidth:

    70ies - 80ies: at Process Level: • Sequential to Parallel Processes, incl. Vectorization

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    62

    History of Loop Transformations

    • For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.:

    Loop Unrolling, Loop Fusion, Strip Mining ....

    • For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams

    • For parallel Datapaths: Jürgen Becker (1997): to • Sequential to Super-Systolic Transformation • Optimize Throughput of Reconfigurable Arrays (RAs)

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    63

    Future Coarse Grain RA Development

    • It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full-custom-style VLSI Design (array cells).

    • It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    64

    >> Design Space Explorers

    • EDA revolution

    • Dead Supercomputer

    • Stream-based Computing

    • Stream-based Memory Architecture

    • Design Space Explorers

    • KressArray Xplorer

    • Machine paradigms

    • Co-Compilation http://www.uni-kl.de

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    65

    Schedule

    time slot

    08.30 – 10.00 Reconfigurable Computing (RC)

    10.00 – 10.30 coffee break

    10.30 – 12.00 Stream-based Computing for RC

    12.00 – 14.00 lunch break

    14.00 – 15.30 Resources forRC

    15.30 – 16.00 coffee break

    16.00 – 17.30 FPGAs: recent developments

    © 2001, [email protected] http://www.fpl.uni-kl.de

    University of Kaiserslautern

    Xputer Lab

    66

    END