Top Banner
(keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island, Greece, April 25-26, 2006
60

(keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

(keynote)(from HPC to)

New Horizons of Very High Performance Computing

(VHPC): Hurdles and Chances

Reiner Hartenstein

TU Kaiserslautern

Rhodes Island, Greece, April 25-26, 2006

Page 2: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de2

TU KaiserslauternReconfigurable Supercomputing

(VHPC) going commercial

Cray XD1

silicon graphics RASC

… it‘s a paradigm shift !… and other vendors

Page 3: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de3

TU Kaiserslautern

The Pervasiveness of RC

162,000

127,000

158,000113,000

171,000194,000

# of hits by Google

1,620,000

915,000

398,000

272,000

647,000

1,490,000

# of hits by Google

“FPGA and ….”ECE-savvy scene Math/SW-savvy sceneunqualified for RC ?

Page 4: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de4

TU Kaiserslautern

world-wide a mass movement

Methodology ?

reminds me to the mass migration of lemmings

terminology chaosnot really a sense of direction

an urgent need to get organized

Page 5: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de5

TU Kaiserslautern>> Outline <<

•Reconfigurable Computing Paradox

•The Supercomputing Paradox

•We are using the wrong model

•Coarse-grained Reconfigurable Devices

•Super Pentium for Desktop Supercomputer

http://www.uni-kl.de

Page 6: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de6

TU KaiserslauternThe Reconfigurable Computing

Paradox

very poor effective integration density

„very power-hungry“ [Rick Kornfeld*]

very poor application development support

poor FPGA technology:

lower clock frequencies, and more expensive.

RC education: extremely poor, or none

Languages and tools unacceptable for software peoplemost hardware experts (86%**) hate their tools

**) DeHon ‘98 *) personal communication

poor tools:

poor education:

However, brilliant

results everywhere

what paradox ?

ignored by CS curricula

… teach like for a 50 year old mainframe …

Page 7: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de7

TU Kaiserslautern

Computing Curricula 2004fully ignores

Reconfigurable Computing

Joint Task Force for

FPGA & synonyma: 0 hits

not even here

(Google: 10 million hits)

Education ?

Page 8: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de8

TU Kaiserslautern

Computing Curricula v.2005:no changes other than „… FPGA, etc.“(not really mentioning that it‘s missing)

Completed ?

Taskforce activity completed ?Next task force in 2020 or later ?

Page 9: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de9

TU Kaiserslautern

End of this week: brainstorming session at DARPA:

(urgently needed – overdue! )

Tools ?

Page 10: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de10

TU Kaiserslautern

fine-grained RC: 1st DeHon‘s Law Technology:

reconfigurability overhead>

routing congestion

wiring overhead

overhead:

>> 10 000

1980 1990 2000 2010100

103

106

109

FPGAlogical

FPGArouted

density:

FPGAphysical

(Gordon Moore curve)

transistors / microchip

(microprocessor)

immense area inefficiency

[1996: Ph. D, MIT]

Page 11: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de11

TU Kaiserslautern

X 2/yr

FPGA

published speed-up factors

1980 1990 2000 2010100

103

106

109

8080

Pentium 4

7%/yr

50%/yr

http://xputers.informatik.uni-kl.de/faq-pages/fqa.html

10 000

Los Alamos traffic simulation

Los Alamos traffic simulation

47

real-time face detectionreal-time face detection6000

video-rate stereo vision

video-rate stereo vision

900pattern

recognitionpattern

recognition730

SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching

Smith-Waterman pattern matching

288

BLASTBLAST52protein identificationprotein identification

40

molecular dynamics simulationmolecular dynamics simulation

88

Reed-Solomon Decoding

Reed-Solomon Decoding2400

Viterbi DecodingViterbi Decoding

400

FFTFFT

100

1000MA

CMA

C

Grid-based DRC:no FPGA: DPLA on MoM by TU-KL

Grid-based DRC:no FPGA: DPLA on MoM by TU-KL

20002000

2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]

39,4

Lee Routing (by TU-KL)

Lee Routing (by TU-KL)

160

Grid-based DRC („fair

comparizon“)

Grid-based DRC („fair

comparizon“)1500015000

DSP and wirelessImage processing,Pattern matching,

Multimedia

Bioinformatics

GRAPEGRAPE20

Astrophysics

DPLADPLA

MoM Xputer architecture

Microprocessor

rela

tive

perf

orm

anc

e

Memory

10 000

x1.25 / yr (Moore)

cryptocrypto

1000

pre-FPGA era

Page 12: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de12

TU Kaiserslautern

pre FPGA era: Why DPLA* was so good

Close to Moore because of small overhead (wiring, programmability, routing)

Large arrays of canonical boolean expressions

PLA layout ~similar to RAM / ROM layout:

Mid’ 80ies: first very tiny FPGAs available

*) designed by TU-KL, fabricated by E.I.S. German multi university project

GAG Generic Address Generator to avoid address computation overhead

2ASM: Auto-Sequencing MemoryASM

[M. Herz et al.: ICECS 2003, Dubrovnik]

Reiner Hartenstein
ASM means: no instruction streams neededfor address computationGeneralization of DMAM. Herz et al.: ICECS 2003, Dubrovnik
Page 13: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de13

TU Kaiserslautern(anti-von-Neumann machine

paradigm)Data Counter instead of Program CounterGeneralization of the DMA

datacounter

GAG RAM

ASM: Auto-Sequencing MemoryASM

GAG & enabling technology:published 1989 [by TU-KL],Survey paper: [M. Herz et al.*: IEEE ICECS 2003, Dubrovnik] *) IMEC & TU-KL

**) -- patented by TI** 1995

Storge Scheme optimization methodology, etc.

Reiner Hartenstein
ASM means: no instruction streams neededfor address computationGeneralization of DMAM. Herz et al.: ICECS 2003, Dubrovnik
Page 14: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de14

TU Kaiserslautern

Thousands or Millions of $ for free

Application migration [from supercomputer] resulting not only in massive speed-upsElectricity bills reduced by an order of magnitude and even more you may get for free…. up to millions of $ dollars per year

(also a matter of national energy policy)

GoogleAmsterdam

NY

Page 15: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de15

TU KaiserslauternReconfigurable Scientific

Computing How software types do programming the FPGAs ?Hiring a good student from the EE Dept. ?

Because of Missing RC education: Far away from optimum solutions ?Much higher speedup achievable ?

1 or 2 more orders of magnitude ? 100.000 ? 1.000.000 ?

Page 16: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de16

TU Kaiserslautern

X 2/yr

FPGA

By education: better speed-up factors ?

1980 1990 2000 2010100

103

106

109

8080

P4

7%/yr

50%/yr

http://xputers.informatik.uni-kl.de/faq-pages/fqa.html

10 000

Los Alamos traffic simulation

Los Alamos traffic simulation

47

real-time face detectionreal-time face detection6000

video-rate stereo vision

video-rate stereo vision

900pattern

recognitionpattern

recognition730

SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching

Smith-Waterman pattern matching

288

BLASTBLAST52protein identificationprotein identification

40

molecular dynamics simulationmolecular dynamics simulation

88

Reed-Solomon Decoding

Reed-Solomon Decoding2400

Viterbi DecodingViterbi Decoding

400

FFTFFT

100

1000MA

CMA

C

Grid-based DRC:no FPGA: DPLA on MoM by TU-KL

Grid-based DRC:no FPGA: DPLA on MoM by TU-KL

20002000

2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]

39,4

Lee Routing (by TU-KL)

Lee Routing (by TU-KL)

160

Grid-based DRC („fair

comparizon“)

Grid-based DRC („fair

comparizon“)1500015000

DSP and wirelessImage processing,Pattern matching,

Multimedia

Bioinformatics

GRAPEGRAPE20

Astrophysics

DPLADPLA

MoM Xputer architecture

Microprocessor

rela

tive

perf

orm

anc

e

Memory

10 000

x1.25 / yr (Moore)

cryptocrypto

1000

tool

s & e

du a

vaila

ble

?

Page 17: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de17

TU Kaiserslautern>> Outline <<

•Reconfigurable Computing Paradox

•The Supercomputing Paradox

•We are using the wrong model

•Coarse-grained Reconfigurable Devices

•Super Pentium for Desktop Supercomputer

http://www.uni-kl.de

Page 18: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de18

TU Kaiserslautern

   

   

   

   

The Supercomputing Paradox

Growing listed Teraflops

Often limited sustained Teraflops

Almost stalled application implementation progress

Increasing number of processors running in parallel

COTS processor decreasing cost

Very high total cost of the Tera(?)flops

promising technology

poor results

Scientists waiting for affordable compute capacity

The Law of More

Reiner Hartenstein
programmer productivity shrinking with growing number of processors
Page 19: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de19

TU Kaiserslautern>> Outline <<

•Reconfigurable Computing Paradox

•The Supercomputing Paradox

•We are using the wrong model

•Coarse-grained Reconfigurable Devices

•Super Pentium for Desktop Supercomputer

http://www.uni-kl.de

Page 20: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de20

TU Kaiserslautern

   

   

   

   

Why traditional supercomputing / HPC failed

instruction-stream-based: memory-cycle-hungry

the wrong way, how the data are moved around

because of the wrong multi-core interconnect architecture

extr

emel

y unbal

ance d

stolen from Bob Colwell

CPU

Page 21: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de21

TU Kaiserslautern

Earth Simulator

5120 Processors, 5000 pins eachES 20: TFLOPS

Crossbar weight: 220 t, 3000 km of thick cable,moving data around

inside the

Page 22: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de22

TU Kaiserslautern

Bringing together data and processor

moving the grand piano

by SoftwareMoving data to the processor:

Page 23: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de23

TU Kaiserslautern>> Outline <<

•Reconfigurable Computing Paradox

•The Supercomputing Paradox

•We are using the wrong model

•Coarse-grained Reconfigurable Devices

•Super Pentium for Desktop Supercomputer

http://www.uni-kl.de

Page 24: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de24

TU Kaiserslautern

coarse-grained RC: Hartenstein‘s Law

rDPA

FPGArouted

>> 10 000

1980 1990 2000 2010100

103

106

109

(Gordon Moore curve)

transistors / microchip

rDPA physical rDPA logical

area efficiency very close to Moore‘s law

[1996: ISIS, Austin, TX]

e.g.

KressArray

family

Page 25: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de25

TU Kaiserslautern

X 2/yr

FPGA

higher speed-up factors by coarse-grained?

1980 1990 2000 2010100

103

106

109

8080

P4

7%/yr

50%/yr

http://xputers.informatik.uni-kl.de/faq-pages/fqa.html

10 000

Los Alamos traffic simulation

Los Alamos traffic simulation

47

real-time face detectionreal-time face detection6000

video-rate stereo vision

video-rate stereo vision

900pattern

recognitionpattern

recognition730

SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching

Smith-Waterman pattern matching

288

BLASTBLAST52protein identificationprotein identification

40

molecular dynamics simulationmolecular dynamics simulation

88

Reed-Solomon Decoding

Reed-Solomon Decoding2400

Viterbi DecodingViterbi Decoding

400

FFTFFT

100

1000MA

CMA

C

Grid-based DRC:no FPGA: DPLA on MoM by TU-KL

Grid-based DRC:no FPGA: DPLA on MoM by TU-KL

20002000

2-D FIR filter [TU-KL]2-D FIR filter [TU-KL]

39,4

Lee Routing (by TU-KL)

Lee Routing (by TU-KL)

160

Grid-based DRC („fair

comparizon“)

Grid-based DRC („fair

comparizon“)1500015000

DSP and wirelessImage processing,Pattern matching,

Multimedia

Bioinformatics

GRAPEGRAPE20

Astrophysics

DPLADPLA

MoM Xputer architecture

Microprocessor

rela

tive

perf

orm

anc

e

Memory

10 000

x1.25 / yr (Moore)

cryptocrypto

1000Coa

rse-

grai

ned

arra

ys ?

Page 26: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de26

TU Kaiserslautern

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

Coarse grain is about computing, not logic

rout thru only

not usedbackbus connect

SNN filter on KressArray (mainly a pipe network)

[Ulrich Nageldinger]

reconfigurable Data Path Unit, e. g. 32 bits wide

reconfigurable Data Path Unit, e. g. 32 bits wide

no CPUrDPUrDPU

Page 27: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de27

TU Kaiserslautern

SW 2coarse-grained CW migration example

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

S

+

Page 28: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de28

TU KaiserslauternCompare it to software solution on CPU

on a very simple CPU C = 1

memory cycles

nanoseconds

if C then read A

read instruction

instruction decoding

read operand*

operate & register transfers

if not C then read B

read instruction

instruction decoding

add & store

read instruction

instruction decoding

operate & register transfers

store result

total

S

+

ABR C

Clock200

=1

S

+

S = R + (if C then A else B endif);

Page 29: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de29

TU Kaiserslautern

hypothetical branching example to illustrate software-to-configware

migration

*) if no intermediate storage in register file

C = 1simple conservative CPU example

memory cycles

nanoseconds

if C then read A

read instruction 1 100instruction decoding

read operand* 1 100operate & reg. transfers

if not C then read B

read instruction 1 100instruction decoding

add & store

read instruction 1 100instruction decoding

operate & reg. transfers

store result 1 100

total 5 500

S = R + (if C then A else B endif);

S

+

ABR C

clock200 MHz(5 nanosec)

=1

no m

emor

y cy

cles

:

no m

emor

y cy

cles

:

spee

d-up

fac

tor

= 1

00

spee

d-up

fac

tor

= 1

00

Page 30: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de30

TU Kaiserslautern

moving the locality of operation into the route of the data stream by P&R

Why the speed-up? What‘s the difference?

instead of moving data by instruction streams

Page 31: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de31

TU Kaiserslautern

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

rout thru only

not usedbackbus connect[Ulrich Nageldinger]

The wrong mind set ....

S = R + (if C then A else B endif);

=1

+

ABR C

section of a very large pipe network:

decision

not knowing this solution:symptom of the hardware / software chasm

and the configware / software chasm

„but you can‘t implement decisions!“

We need Reconfigurable Computing Education

Page 32: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de32

TU Kaiserslautern

   

   

   

   

The new paradigm: how the data are traveling

not transport-triggered: old hat

pipeline, or chaining

super systolic array

no, not by instruction execution

DPU DPU DPU

vN Move Processor

instruction-driven

+ instruction-driven

[Jack Lipovski, EUROMiCRO, Nice, 1975]

P&R: move locality of operation, not data !

Page 33: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de33

TU Kaiserslautern

DPA

xxx

xxx

xxx

|

||

x x

x

x

x

x

x x

x

- -

-

input data stream

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

|

|output data streams

„data

streams“ time

port #

time

time

port #time

port #

define: ... which data item at which time at which port

Data streams

(pipe network)

H. T. Kung paradigm(systolic array)

implemented by distributed

memory

datacounter

GAG RAM

ASM

ASM

ASM

ASM

ASM

ASM

AS

M

AS

M

AS

M

AS

M

AS

M

AS

MASM: Auto-Sequencing

Memory

50 & more on-chip ASM are feasible

50 & more on-chip ASM are feasible

Page 34: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de34

TU Kaiserslautern

The Generalization of the Systolic Array

[R. Kress]:use optimization algorithmse. g.: simulated annealing

Achievement: also non-linear and non-uniform pipes, and even more wild pipe structures possible

reconfigurability makes sense

discard algebraic synthesis methods

remedy?

only for applications with regular data dependencies

Kress-Kung paradigmsuper systolic array

Page 35: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de35

TU Kaiserslautern>> Outline <<

• Reconfigurable Computing Paradox

• The Supercomputing Paradox

• We are using the wrong model

• Coarse-grained Reconfigurable Devices

• Super Pentium for Desktop Supercomputer

http://www.uni-kl.de

Page 36: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de36

TU Kaiserslautern

Here is the common model

data-stream-based

instruction-stream-

based

software code

accelerator reconfigurable

accelerator hardwired

configware code

CPU

it’s not von Neumann the vN monopoly in our curricula is severely harmful

wagging the dog

the tail is

we need dual paradigm education

Page 37: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de37

TU Kaiserslautern

A potential Pentium successorDiscard most caches

have 64* cores, 0.5 - 1 GHz

with clever interconnect for:

concurrent processes and

and for multithreading,

Kung-Kress pipe network

The Desk-top Supercomputer!

*) CPU mode / DPU mode capability

and, for

CPU

mod

eDP

U m

ode

Page 38: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de38

TU Kaiserslautern“Super Pentium” configuration

examplerDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU

CPUCPU

CPUCPU CPUCPU

CPUCPU

Page 39: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de39

TU Kaiserslautern

e. g.: ~ 8 x 8 rDPA: all feasible under 500 MHz

GamesGames MusicMusicVideosVideos

SMeXPPSMeXPP

CameraCamera

Baseband-Baseband-ProcessorProcessor

Radio-Radio-InterfaceInterface

AudioAudio--InterfaceInterface

SD/MMC CardsSD/MMC Cards

LCD DISPLAY

rDPArDPA

• Variable resolutions and refresh rates• Variable scan mode characteristics• Noise Reduction and Artifact Removal• High performance requirements• Variable file encoding formats• Variable content security formats• Variable Displays• Luminance processing• Detail enhancement• Color processing• Sharpness Enhancement• Shadow Enhancement• Differentiation • Programmable de-interlacing heuristics• Frame rate detection and conversion• Motion detection & estimation & compensation• Different standards (MPEG2/4, H.264)• A single device handles all modes

World TV & game console & multi media center

http://pactcorp.com

Page 40: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de40

TU Kaiserslautern

Dual Paradigm Application Development

instruction-stream-

based

software code

accelerator reconfigurable

accelerator hardwired

configware codedata-stream-based

CPU

software/configwareco-compiler

high level language

Page 41: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de41

TU KaiserslauternSoftware / Configware Co-

Compilation

Juergen Becker’s CoDe-

X, 1996

CPUCPU

Resource Parameters

supportingdifferentplatforms

SWcompiler

CWcompiler

C language source

Partitioner

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

rDPUrDPU rDPUrDPU rDPUrDPU rDPUrDPU

Placement &

Routing

Placement &

Routing(Move the Locality of Operation

)

Page 42: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de42

TU Kaiserslautern

Bringing together data and processor

Move the stool

byConfigware

Place the location of execution into the data pipe

Page 43: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de43

TU Kaiserslautern>> Conclusions <<

•Reconfigurable Computing Paradox

•The Supercomputing Paradox

•We are using the wrong model

•Coarse-grained Reconfigurable Devices

•Super Pentium for Desktop Supercomputer

•Conclusions http://www.uni-kl.de

Page 44: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de44

TU Kaiserslautern

Conclusions (1): Hurdles

Obstacles are:

unbelievably disastrous tools market:

unbelievably ignorant curricula:

enabling technologies available, partly decades old, but not used

transdisciplinary models not available nor taught at CS, nor elsewhere

fragmentation into application-domain-specific cultures and trick boxes

… teach like for a 50 year old mainframe …

Page 45: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de45

TU Kaiserslautern

Conclusions (2): Future Work

CS disciplines must recognize and accept its strategic role and its responsibility toward all its application disciplines: embedded and scientific computing.

The monopoly of the von-Neumann-based mind set in CS education:

heavily stalls progress in R&D, not only in HPC causes high cost in R&D, not only in supercomputing

The von-Neumann-only-based mind set in CS urgently needs to go to adopt the dual paradigm common model

CS graduates are not qualified for our job market

Page 46: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de46

TU Kaiserslautern

Conclusions (3): Chances

New horizons: chances are brilliant

Page 47: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de47

TU Kaiserslautern

thank you

Page 48: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de48

TU Kaiserslautern

END

Page 49: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de49

TU Kaiserslautern

thank you

Page 50: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de50

TU Kaiserslautern

Backup:

Page 51: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de51

TU Kaiserslautern

Co-Compiler Enabling Technology

is available from academia

only a small team needed for commercial re-implementation

on the road map to the Personal Supercomputer

Page 52: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de52

TU KaiserslauternCompilation: Software vs.

Configware

source program

softwarecompiler

software code

Software Engineeri

ng

Software Engineeri

ng

configware code

mapper

configwarecompiler

scheduler

flowware code

source „program“

Configware

Engineering

Configware

Engineering

placement &

routing

data

C, FORTRANMATHLAB

Page 53: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de53

TU Kaiserslautern

configware resources: variable

Nick Tredennick’s Paradigm Shifts explain the differences

2 programming sources needed

flowware algorithm: variable

Configware EngineeringConfigware Engineering

Software EngineeringSoftware Engineering

1 programming source needed

algorithm: variable

resources: fixedsoftware

CPU

Page 54: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de54

TU Kaiserslautern

Co-Compilation

softwarecompiler

software code

Software / Configware Co-Compiler

Software / Configware Co-Compiler

configware code

mapperconfigware

compiler

scheduler

flowware code

data

C, FORTRAN, MATHLAB

automatic SW / CW partitionersimulated annealing

simulated annealing

simulated annealing

simulated annealing

Page 55: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de55

TU Kaiserslautern

Co-Compiler for Hardwired Kress/Kung Machine

[e. g. Brodersen]

softwarecompiler

software code

Software / Flowware

Co-Compiler

Software / Flowware

Co-Compiler

flowwarecompiler

scheduler

flowware code

data

source

automatic SW / CW partitioner

Page 56: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de56

TU KaiserslauternThe first archetype machine model

mainframe

CPU

compile orassemble

proceduralpersonalization

Software IndustrySoftware Industry Software Industry’sSecret of Success

simple basic .Machine Paradigm

personalization:RAM-based

instruction-stream- based mind set

“von Neumann”

Page 57: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de57

TU KaiserslauternThe 2nd archetype machine model

compilestructural

personalization

Configware IndustryConfigware Industry

Configware Industry’sSecret of Success

personalization:RAM-based

data-stream- based mind set

“Kress-Kung”

accelerator reconfigurable

simple basic .Machine Paradigm

Page 58: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de58

TU Kaiserslautern

„Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack“ [Herb Riley, R. Associates]

Page 59: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de59

TU Kaiserslauternmodern FPGA bestsellers:

The new model is reality:FPGA fabrics, together with several µprocessors, many memory banks, and other IP cores, on the same COTS microchip

Page 60: (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

© 2006, [email protected] http://hartenstein.de60

TU Kaiserslautern

500MHz FlexibleSoft Logic Architecture

200KLogic Cells

500MHz Programmable DSP Execution Units

0.6-11.1GbpsSerial Transceivers

500MHz PowerPC™ Processors(680DMIPS)

withAuxiliary Processor Unit

1Gbps DifferentialI/O

500MHz multi-portDistributed 10 Mb SRAM

500MHz DCM DigitalClock Management

DSP platform FPGA[courtesy Xilinx Corp.]