From Organic Computing to Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern PASA, Frankfurt, March 16, 2006
Dec 19, 2015
From Organic Computing to Reconfigurable
Computing
Reiner Hartenstein
TU Kaiserslautern
PASA, Frankfurt, March 16, 2006
© 2005, [email protected] http://hartenstein.de2
TU Kaiserslautern
Reconfigurable Computing (RC)
and FPGA* in the media
#####
Design Starts until 2010: from 80,000 to
110,000 [Dataquest]
June 2005
fastest growing segment of the semiconductor
market: ~6 billion US-$ [Dataquest]
*) Field-Programmable Gate Array
Google: 10 million hits
© 2005, [email protected] http://hartenstein.de3
TU Kaiserslautern
The Pervasiveness of RC
162,000
127,000
158,000113,000
171,000194,000
# of hits by Google
1,620,000
915,000
398,000
272,000
647,000
1,490,000
# of hits by Google
search “FPGA and ….”
© 2005, [email protected] http://hartenstein.de4
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•Von Neumann loosing its dominance
•Software vs. Configware
•The dual paradigm approach
•Coarse-grained Reconfigurable Devices
•Conclusionshttp://www.uni-kl.de
© 2005, [email protected] http://hartenstein.de5
TU Kaiserslautern The RC Paradox
Effective integration density much worse than the Gordon Moore curve: by a factor of more than 10,000„very power-hungry“ [Rick Kornfeld*]
*) personal communication
application development: until recently still Logic Design on a very strange platform
The awful technology of FPGAs:
FPGAs run at lower clock frequencies, draw more power and are more expensive.
© 2005, [email protected] http://hartenstein.de6
TU Kaiserslautern
fine-grained RC: low effective integration density
immense area inefficiency
reconfigurability overhead
routing congestion
wiring overhead
overhead:
> 10 000
1980 1990 2000 2010100
103
106
109
FPGAlogical
FPGArouted
density:
FPGAphysical
(Gordon Moore curve)
transistors / microchip
(microprocessor)
[DeHon, Ph.D 1996]
© 2005, [email protected] http://hartenstein.de7
TU Kaiserslautern
published speed-up factors#
1980 1990 2000 2010100
103
106
109
8080
P4
7%/yr
50%/yr
http://xputers.informatik.uni-kl.de/faq-pages/fqa.html
100 000
Los Alamos traffic simulation
Los Alamos traffic simulation
47
real-time face detectionreal-time face detection6000
video-rate stereo vision
video-rate stereo vision
900pattern
recognitionpattern
recognition730
SPIHT wavelet-based image compressionSPIHT wavelet-based image compression 457Smith-Waterman pattern matching
Smith-Waterman pattern matching
288
BLASTBLAST52protein identificationprotein identification
40
molecular dynamics simulationmolecular dynamics simulation
88
Reed-Solomon Decoding
Reed-Solomon Decoding2400
Viterbi DecodingViterbi Decoding
400
FFTFFT
100
1000MA
CMA
C
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
Grid-based DRC:no FPGA: DPLA on MoM by TU-KL
2000
2-D FIR filter (no FPGA: DPLA by TU-KL)2-D FIR filter (no FPGA: DPLA by TU-KL)39,4
Lee Routing (DPLA by TU-
KL)
Lee Routing (DPLA by TU-
KL)
160
Grid-based DRC („fair
comparizon“)
Grid-based DRC („fair
comparizon“)15000
DSP and wirelessImage processing,Pattern matching,
Multimedia
Bioinformatics
GRAPEGRAPE20
Astrophysics
MoM Xputer architecture
cryptocrypto
Microprocessor
rela
tive p
erf
orm
ance
Memory
X 2/yr
© 2005, [email protected] http://hartenstein.de8
TU Kaiserslautern
HeHon‘s LawMOPS / milliWatt
1
10
100
1000
2 1 0.5 0.25 0.13 0.1 0.07
µ feature sizeRISC
FPGA
© 2005, [email protected] http://hartenstein.de9
TU Kaiserslautern
However ....
Application migration [from supercomputer] resulting in performance increase up to 4 orders of magnitude
Reducing electricity bill by an order of magnitude
Hits the memory wall from a different direction
People think that high-performance must mean expensive
© 2005, [email protected] http://hartenstein.de10
TU Kaiserslautern
why the RC paradigm shift is so important
Move the stool or the grand piano?
by Software
byConfigware
© 2005, [email protected] http://hartenstein.de11
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•Von Neumann loosing its dominance
•Software vs. Configware
•The dual paradigm approach
•Coarse-grained Reconfigurable Devices
•Conclusionshttp://www.uni-kl.de
© 2005, [email protected] http://hartenstein.de12
TU Kaiserslautern
Cray XD1
vN paradigm loosing its dominanceXilinx inside !Xilinx inside !
Xilinx FPGAXilinx FPGA
© 2005, [email protected] http://hartenstein.de13
TU Kaiserslautern
von Neumann is not the common model
programcounter
DPUCPU
RAMmemory
von Neumann bottleneck
von Neumann instruction-
stream-based machine
co-processors
acceleratorCPU
instruction-stream-based
data-stream-
based
hard
ware
software
mainframe age:
microprocessor age:
wagging the dog
the tail is
vN paradigm dominance ?
© 2005, [email protected] http://hartenstein.de14
TU Kaiserslautern
Here is the common model
programcounter
DPUCPU
RAMmemory
von Neumann bottleneck
von Neumann instruction-
stream-based machine
co-processors
acceleratorCPU
instruction-stream-based
data-stream-
based
hard
ware
software
mainframe age:
microprocessor age:
configware age:
morp
hw
are
accelerator reconfigurable
accelerator hardwired
CPU
© 2005, [email protected] http://hartenstein.de15
TU Kaiserslautern
Here is the common model
programcounter
DPUCPU
RAMmemory
von Neumann bottleneck
von Neumann instruction-
stream-based machine
co-processors
acceleratorCPU
instruction-stream-based
data-stream-
based
hard
ware
software
mainframe age:
microprocessor age:
configware age:
CPU accelerator reconfigurable
morp
hw
aresoftware/configware
co-compiler
© 2005, [email protected] http://hartenstein.de16
TU KaiserslauternFundamentally different mind set
no program counter
non-von-Neumann
completely different OS principles
no instruction fetch at run time
it’s configware: definitely it is not software
© 2005, [email protected] http://hartenstein.de17
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•Von Neumann loosing its dominance
•Software vs. Configware
•The dual paradigm approach
•Coarse-grained Reconfigurable Devices
•Conclusionshttp://www.uni-kl.de
© 2005, [email protected] http://hartenstein.de18
TU KaiserslauternCompilation: Software vs.
Configware
source program
softwarecompiler
software code
Software Engineeri
ng
Software Engineeri
ng
configware code
mapper
configwarecompiler
scheduler
flowware code
source „program“
Configware
Engineering
Configware
Engineering
placement &
routing
data
C, FORTRANMATHLAB
© 2005, [email protected] http://hartenstein.de19
TU Kaiserslautern
configware resources: variable
Nick Tredennick’s Paradigm Shifts explain the differences
2 programming sources needed
flowware algorithm: variable
Configware EngineeringConfigware Engineering
Software EngineeringSoftware Engineering
1 programming source needed
algorithm: variable
resources: fixedsoftware
CPU
© 2005, [email protected] http://hartenstein.de20
TU Kaiserslautern
Co-Compilation
softwarecompiler
software code
Software / Configware Co-Compiler
Software / Configware Co-Compiler
configware code
mapperconfigware
compiler
scheduler
flowware code
data
C, FORTRAN, MATHLAB
automatic SW / CW partitionersimulated annealing
simulated annealing
simulated annealing
simulated annealing
© 2005, [email protected] http://hartenstein.de21
TU Kaiserslautern
Organic Computing ?Bio-inspired use of FPGAs
• evolvable „hardware“ community:
• crossover of chromosomes
• In love with genetic algorithms: darwinistic way to fitness thru generations of populations
• inefficient, but unexpected results possible
• simulated annealing (genetic morphing) - fitness by synthesis: highly efficient
© 2005, [email protected] http://hartenstein.de22
TU Kaiserslautern
Software / Configware Co-Compilation
Resource Parameters
supportingdifferentplatformsAnalyzer
/ Profiler
SW code
SWcompiler
paradigm“vN" machine
CW Code
CWcompiler
Kress/Kung machine paradigm
Partitioner
C language source
FW Code
Juergen Becker’s CoDe-X, 1996
simulated annealing
© 2005, [email protected] http://hartenstein.de23
TU Kaiserslautern
Co-Compiler for Hardwired Kress/Kung Machine
[e. g. Brodersen]
softwarecompiler
software code
Software / Flowware
Co-Compiler
Software / Flowware
Co-Compiler
flowwarecompiler
scheduler
flowware code
data
source
automatic SW / CW partitioner
© 2005, [email protected] http://hartenstein.de24
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•Von Neumann loosing its dominance
•Software vs. Configware
•The dual paradigm approach
•Coarse-grained Reconfigurable Devices
•Conclusionshttp://www.uni-kl.de
© 2005, [email protected] http://hartenstein.de25
TU Kaiserslautern
The dual paradigm approach
von Neumann paradigm Kress-Kung paradigm
Software Engineering
Software Engineering
Configware
Engineering
Configware
Engineering
ASM
CPU
© 2005, [email protected] http://hartenstein.de26
TU Kaiserslautern
DPA
xxx
xxx
xxx
|
||
x x
x
x
x
x
x x
x
- -
-
input data streams
xx
x
x
x
x
xx
x
--
-
-
-
-
-
-
-
-
-
-
xxx
xxx
xxx
|
|
|
|
|
|
|
|
|
|
|
|
|
|output data streams
„data
streams“ time
port #
time
time
port #time
port #
Flowware defines: ... which data item at which time at which port
Data streams (flowware)
(pipe network)
ASM
ASM
ASM
ASM
ASM
ASM
AS
M
AS
M
AS
M
AS
M
AS
M
AS
M
algebraic synthesis algorithms:H. T. Kung paradigm(systolic array)
Auto-Sequencing
Memory
RA
M
GA
G
ASM
implemented by distributed
memory
© 2005, [email protected] http://hartenstein.de27
TU Kaiserslautern
500MHz FlexibleSoft Logic Architecture
200KLogic Cells
500MHz Programmable DSP Execution Units
0.6-11.1GbpsSerial Transceivers
500MHz PowerPC™ Processors(680DMIPS)
withAuxiliary Processor Unit
1Gbps DifferentialI/O
500MHz multi-portDistributed 10 Mb SRAM
500MHz DCM DigitalClock Management
DSP platform FPGA[courtesy Xilinx Corp.]
© 2005, [email protected] http://hartenstein.de28
TU Kaiserslautern
Generalization of the systolic array ....
discard algebraic synthesis methods
[Rainer Kress]
use optimization algorithms instead
for example: simulated annealing
the achievement: also non-linear and non-uniform pipes, and even more wild pipe structures possible
now reconfigurability makes sense
remedy?
© 2005, [email protected] http://hartenstein.de29
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•Von Neumann loosing its dominance
•Software vs. Configware
•The dual paradigm approach
•Coarse-grained Reconfigurable Devices
•Conclusionshttp://www.uni-kl.de
© 2005, [email protected] http://hartenstein.de30
TU Kaiserslautern
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
Coarse grain is about computing, not logic
rout thru only
not usedbackbus connect
SNN filter on KressArray (mainly a pipe network)
[Ulrich Nageldinger]
Example: mapping onto rDPA by DPSS: based on simulated annealing
reconfigurable function block, e. g. 32 bits wide
no CPU
© 2005, [email protected] http://hartenstein.de31
TU Kaiserslautern
coarse-grained RC: high integration density
FPGArouted
> 10 000
1980 1990 2000 2010100
103
106
109
(Gordon Moore curve)
transistors / microchip
rDPA physical rDPA logical
[Hartenstein, ISIS 1996]
The Reconfigurable Computing Paradox
© 2005, [email protected] http://hartenstein.de32
TU Kaiserslautern
hardwired
hardwired and coarse-grained reconf.
(rDPA)
Claassen‘s Law
2 1 0.5 0.250.001
0.01
0.1
1
10
100
1000
0.13 0.1 0.07
µ feature size
MOPS / milliWatt
standard microprocessor
DSP
instruction set processors(fine grained reconf.)
FPGAs
+ Hartenstein‘s Amendment
© 2005, [email protected] http://hartenstein.de33
TU Kaiserslauterncommercial rDPA
example:
PACT XPP - XPU128XPP128 rDPA
• Evaluation Board available, and • XDS Development Tool with Simulator
buses not
shown
rDPU
CF
G
PAE
core
ALU CtrlALU
CF
GC
FG
PAE
core
CF
GC
FG
PAE
core
PAE
core
ALU CtrlALUALU CtrlALU
CF
GC
FG
CF
GC
FG
• Full 32 or 24 Bit Design working silicon • 2 Configuration Hierarchies
© PACT AG, http://pactcorp.com
(r)DPA
© 2005, [email protected] http://hartenstein.de34
TU Kaiserslautern>> Outline <<
•Reconfigurable Computing Paradox
•Von Neumann loosing its dominance
•Software vs. Configware
•The dual paradigm approach
•Coarse-grained Reconfigurable Devices
•Conclusionshttp://www.uni-kl.de
© 2005, [email protected] http://hartenstein.de35
TU Kaiserslautern
Conclusions
RC is reducing cost without loss of performance and flexibility.
FPGAs may be configured like for a micro-processor for C/C++ code.An FPGA can perform a specific algorithm at very high speed.
Using a high-level language, the FPGA can be programmed for a wide variety of algorithms without any deep knowledge of the underlying architecture.
RC is reducing the electricity bill and the required building floor area
Speed-up factors of up to 4 orders of magnitude hve been reported
Compared to ASICs, prototyping time is on the order of hours rather than months, with a cost less than a tenth of that for an ASIC.
The personal supercomputer is near
© 2005, [email protected] http://hartenstein.de36
TU KaiserslauternConclusions (2)
We urgently need Reconfigurable Computing Education
An Update of CS curricula is overdue
© 2005, [email protected] http://hartenstein.de39
TU KaiserslauternThe first archetype machine model
mainframe
CPU
compile orassemble
proceduralpersonalization
Software IndustrySoftware Industry Software Industry’sSecret of Success
simple basic .Machine Paradigm
personalization:RAM-based
instruction-stream- based mind set
“von Neumann”
© 2005, [email protected] http://hartenstein.de40
TU KaiserslauternAn Archetype Common Model needed
Guidance for organizing efficient solutions
Make the project manageable
Allow to share lessions between applications and between application areas
Useful simple archetype not widely accepted
Archetype common model should provide ....
Progress stalled by the software/configware chasm
Configware IndustryConfigware Industryfrom the
© 2005, [email protected] http://hartenstein.de41
TU KaiserslauternThe 2nd archetype machine model
compilestructural
personalization
Configware IndustryConfigware Industry
Configware Industry’sSecret of Success
personalization:RAM-based
data-stream- based mind set
“Kress-Kung”
accelerator reconfigurable
simple basic .Machine Paradigm
© 2005, [email protected] http://hartenstein.de42
TU Kaiserslautern
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
S
+
for demo: a tiny section of the pipe networkinter-rDPU-communication: no memory cycles needed
configware solution: computing in space
© 2005, [email protected] http://hartenstein.de43
TU KaiserslauternCompare it to software solution on CPU
on a very simple CPU C = 1
memory cycles
nanoseconds
if C then read A
read instruction
instruction decoding
read operand*
operate & register transfers
if not C then read B
read instruction
instruction decoding
add & store
read instruction
instruction decoding
operate & register transfers
store result
total
S = R + (if C then A else B endif);
S
+
ABR C
Clock200
=1
S
+
© 2005, [email protected] http://hartenstein.de44
TU Kaiserslautern
hypothetical branching example to illustrate software-to-configware
migration
*) if no intermediate storage in register file
C = 1simple conservative CPU example
memory cycles
nanoseconds
if C then read A
read instruction 1 100instruction decoding
read operand* 1 100operate & reg. transfers
if not C then read B
read instruction 1 100instruction decoding
add & store
read instruction 1 100instruction decoding
operate & reg. transfers
store result 1 100
total 5 500
S = R + (if C then A else B endif);
S
+
ABR C
clock200 MHz(5 nanosec)
=1
sect
ion
of a
maj
or p
ipe
netw
ork
on rD
PU
no m
emor
y cy
cles
:
no m
emor
y cy
cles
:
spee
d-up
fac
tor
= 1
00
spee
d-up
fac
tor
= 1
00
© 2005, [email protected] http://hartenstein.de45
TU Kaiserslautern
The wrong mind set ....
S = R + (if C then A else B endif);
=1
+
ABR C
section of a very large pipe network:
decision
not knowing this solution:symptom of the hardware / software chasm
and the configware / software chasm
„but you can‘t implement decisions!“
© 2005, [email protected] http://hartenstein.de46
TU Kaiserslautern
The hardware / software chasm
If I use the term "software", a variety of images might appear in the engineering audience's mind.
Still we have "hardware" engineers and "software" engineers that go to different schools, attend different conferences, avoid each other's cocktail parties, and almost never play on the same volleyball teams at the company picnic. System designers begin to plan their creations around the skill sets and development processes of hardware engineers and software engineers. The two become oil and water.
The hardware / software chasm
© 2005, [email protected] http://hartenstein.de47
TU Kaiserslautern
Blurred line between hardware and software
The line between "hardware" and "software" is rapidly blurring and even becoming irrelevant from a system design perspective. As this happens, the traditional roles and skillsets of hardware and software engineers are being challenged, and a new generation of designers is emerging as a result.
the obfuscation caused by the pervasiveness of softness.
© 2005, [email protected] http://hartenstein.de48
TU Kaiserslautern
We need Reconfigurable Computing Education
We need a unification in dealing with problems, which are shared across many different application domains
There is an urgent need to cure severe qualification deficiencies of our graduates.
We need new curricula in CS and CE for providing an integrating dual paradigm mind set instead of vN-only
© 2005, [email protected] http://hartenstein.de49
TU KaiserslauternTerminology clean-up
Software: for scheduling instruction streams
Flowware: for scheduling data streams
Configware: for configuring morphware
Programming sources:
vonNeumann
primarilynon-vonNeumann
© 2005, [email protected] http://hartenstein.de50
TU KaiserslauternWhy coarse grain
much more MOPS/milliWatt
reconfigurable Data Path Unit (e. g. rALU)
mind set close to classical computing background
instead of rLB (~1 bit wide) use rDPU (e. g. 32 bits wide)
instead of FPGA use rDPA
rDPU rDPU rDPU rDPUrDPU rDPU rDPU rDPUrDPU rDPU rDPU rDPUrDPU rDPU rDPU rDPUReconfigurable Computing
(RC)
much more area-efficientmuch less
reconfigurability overhead
© 2005, [email protected] http://hartenstein.de51
TU Kaiserslautern
„data stream“: an ambigouos definition
Reconfigurable Computing is not instruction-stream-based
it‘s data-stream-based
it‘s different from the operation of the (indeterministic) „dataflow machine“
other definition also from multimedia area
usable definition from systolic array area
© 2005, [email protected] http://hartenstein.de52
TU Kaiserslautern>> Outline <<
•Reconfigurable Devices
•Coarse-grained Reconfigurable Devices
•Data-stream-based Computing
•The contemporary Common Model
•Reconfigurable Supercomputing
•Conclusionshttp://www.uni-kl.de
© 2005, [email protected] http://hartenstein.de53
TU KaiserslauternWhy the speed-up ...
... although FPGA is clock slower by x 3 or even more(most know-how from „high level synthesis“ discipline)
decisions without memory cycles nor clock cycles
most „data fetch“ without memory cycle
© 2005, [email protected] http://hartenstein.de54
TU Kaiserslautern
data moved around by software
i.e. by memory-cycle-hungry instruction streams which fully hit the memory wall
P&R: move
locality of
operation, not data !
extr
emel
y unbal
ance
d
stolen from Bob Colwell
CPU
© 2005, [email protected] http://hartenstein.de55
TU Kaiserslautern
Replace Caches by ...
stolen from Bob Colwell
CPUcaches
… by 16 x 16 reconfigurable data path array (rDPA)
which fits on the same chip
© 2005, [email protected] http://hartenstein.de56
TU Kaiserslautern
Similarly skilledwith hardware description languages, Hardware engineers had to adopt the methodologies and techniques of software engineers - Increased softness has an impact on even our products themselves
The required skills for your respective jobs are converging (against the grain in an age of increased specialization) and you'll soon be working with (and competing against) a new generation of embedded engineers that are similarly skilled in both disciplines.
© 2005, [email protected] http://hartenstein.de57
TU Kaiserslautern
Using FPGAs
Reducing cost without loss of performance and flexibility.
It may be configured like a general flexible micro-processor executing conventional C/C++ code, and as a highly specific programmability of FPGAs distinguishes to ASICs.
An FPGA can perform a specific algorithm at very high speed. Compared to ASICs, prototyping time is on the order of hours rather than months, with a cost less than a tenth of that for an ASIC.
Using a high-level language, the FPGA can be programmed for a wide variety of algorithms without any deep knowledge of the underlying architecture.
Field-programmable FPGAs
© 2005, [email protected] http://hartenstein.de58
TU Kaiserslautern
Co-Compiler Enabling Technology
is available from academia
only a small team needed for commercial re-implementation
on the road map to the Personal Supercomputer
© 2005, [email protected] http://hartenstein.de59
TU KaiserslauternConclusions (1)
We need a unification in dealing with problems, which are shared across many different application domains.
RC suffers from fragmentation into different cultures of the many application domains.
CS is the only domain being qualified f. such an effort
© 2005, [email protected] http://hartenstein.de60
TU KaiserslauternConclusions (2)
IEEE Computer Society should advocate to improve application development methodologiesand, a common educational approach useful for the wide variety of application domainsinside IEEE Computer Society, a TC on RC should lobby for more
© 2005, [email protected] http://hartenstein.de61
TU KaiserslauternConclusions (3)
reverse the downtrend in CS enrolment
educate not only students …
increase membership
make CS more fascinating
Strategic issue for entire IEEE Computer Society
© 2005, [email protected] http://hartenstein.de62
TU KaiserslauternConclusions (4)
The personal supercomputer is near, not only for the desktop, but also for a new road map to large scale supercomputing of up to now unthinkable highest performance dimensions.
IEEE-CS should accept this fascinating challenge, by spearheading the paradigm shift.
IEEE-CS is needed as a translator to explain the impact to managers and to a wide public.
© 2005, [email protected] http://hartenstein.de63
TU Kaiserslautern
RC education last week at Karlsruhe
Attendees declared ready to work for a task force
35 submissions from
Australia, Brasil, India, USA, and throughout Europe
But education is just one of several facets ……But education is just one of several facets ……
© 2005, [email protected] http://hartenstein.de64
TU Kaiserslautern
However ....
“What did you say again that your company does?” My father posed the question, “Gate arrays,” I replied, “They’re chips used to…”
“Oh yes, that’s right, Gatorade.” ….. “I used to give that to my marching band members so they wouldn’t get dehydrated on hot days. Don’t remember it coming in chip form …..”
Explain to your grandmother what it means if you’re one of the world’s leading experts on optical proximity correction (OPC) for nanometer-scale semiconductor lithography?
Could you perhaps relate it to some difficulty she has with needlepoint and her cataracts?
Even those with a scientific or technical background often won’t understand precisely what we do. A PhD in molecular biology won’t help to understand VHDL and Verilog synthesis for FPGAs.
Trying to relate DNA sequences to LUT truth tables might offer a starting point, but somebody has to be able to bridge the technology and terminology gap, even to initiate that analogy. Try explaining FPGAs with the consumer electronics approach. “People tend to relate when you tell them what your part goes into. Today, finally, ‘chip’ seems universally understood. I never get people asking about potato chips anymore.”
© 2005, [email protected] http://hartenstein.de65
TU Kaiserslautern
However ....Abstract. Google’s yaw-dropping hit rates illustrate the pervasiveness of Reconfigurable Computing (RC), mainstream in embedded systems already for years, and now being adopted by supercomputing (Cray, sgi, etc.). From FPGA usage as accelerators, speed-up factors by up to two orders of magnitude are reported, as well as floor space requirements and electricity invoice amounts reduced by one order of magnitude. About 3 orders of magnitude and more is obtained by using coarse-grained reconfigurable datapath arrays (rDPAs) available from a number of start-ups.This is astonishing, since FPGAs and rDPAs have a substantially lower clock speed than microprocessors. Algorithmic cleverness is the secret of success, based on software to configware migration mechanisms, striving away from memory-cycle-hungry instruction-stream-based computing paradigms.The main benefit of RC platforms - having replaced the use of hardwired accelerators - is their flexibility by non-procedural programmability. This also contributes to those concepts of Organic Computing, which rely on processes of evolution, self-organization, adaptation and fault tolerance. The main hurdles on the way to heart-stopping new horizons of cheap highest performance are CS-related educational deficits causing the configware / software chasm and a methodology fragmentation between the different cultures of application domains. Current CS curricula do not sufficiently meet their transdisciplinary responsibility. The talk gives a survey on fundamental issues in RC and on new directions in CS-related curricula, focused on a dual paradigm organic computing approach.
© 2005, [email protected] http://hartenstein.de66
TU Kaiserslautern
However ....
Application migration [from supercomputer] resulting in performance increase up to 4 orders of magnitude
„Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack“ [Herb Riley, R. Associates]
Reducing electricity bill by an order of magnitude
Hits the memory wall from a different direction
© 2005, [email protected] http://hartenstein.de68
TU KaiserslauternConclusions
IEEE Computer Society should advocate to introduce a dual paradigm approach – away from the monopoly of the vN mind set
IEEE Computer Society should advocate a common model useful for the wide variety of application domains
© 2005, [email protected] http://hartenstein.de69
TU KaiserslauternConclusions
We need a unification in dealing with problems, which are shared across many different application domains.
RC suffers from fragmentation into different cultures of the many application domains.
Each domain uses its own trick box.We should teach the world to think outside the box
CS is the only domain qualified for this unification
© 2005, [email protected] http://hartenstein.de70
TU KaiserslauternAn Archetype Common Model needed
Configware IndustryConfigware Industryfrom the
IEEE Computer Society should advocate to introduce a dual paradigm transdisciplinary education by using Configware Engineering as the counterpart of Software Engineering by new curricula in CS and CE for providing an integrating dual paradigm mind set supporting a unification in dealing with problems, which are shared across many different application domains - to cure severe qualification deficiencies of our graduates.