Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer Engineering University of Arizona, Tucson AZ, USA [email protected], [email protected]http://www.ece.arizona.edu/~embedded
15
Embed
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Design Space Exploration for Application Specific FPGAs in
System-on-a-Chip Designs
Mark Hammerquist, Roman LyseckyDepartment of Electrical and Computer Engineering
Experimental Setup Consider several MCNC benchmark
circuits of varying complexity alu4, apex6, bigkey, cordic, des,
dsip, misex1, mult32a, s1423, s298
Design Metric Calculation Delay is reported by VPR after
routing Power Model utilized to estimate
power consumption Poon et al. (TODAES 2005)
Area Routing area is reported by VPR Developed a transistor level
estimation method to determine CLB area requirements
University of Arizona
HW Circuit (BLIF)
Tech. Mapping (FlowMap)
Mapped Circuit (BLIF)
Design Space Exploration for ASFPGAs
Packing/Activity Est. (T-VPack)
Packed Circuit (Netlist)
Switching Activity
Placement/Routing/Power Est. (VPR with Power Model)
HW Bitstream Design Metrics (Area, Delay,
Energy)
LUT Size
CLB Size
Connectiv
ity/
Channel
Wid
th/FP
GA
Size
FPGA Arch. & Bitstream
8
Experimental ResultsASFPGA vs Delay/Energy/Area-Optimized FPGA
ASFPGA Optimized for one particular
hardware application Design space exploration
determined three best architectures for each circuit
Delay/Energy/Area-Optimized Best average delay, energy, or
area across all hardware circuits
Delay- and energy-optimized architecture:
5-input LUTs, 4 LUTs per CLB, 80% connectivity
Area-optimized architecture: 3-input LUTs, 2 LUTs per CLB,
90% connectivity
University of Arizona
9
Experimental Results ASFPGA vs Delay/Energy/Area-Optimized FPGA
ASFPGA provides good reductions over delay-optimized, energy-optimized, and area-optimized FPGAs 5% faster, 10% more energy efficient, or 17% smaller, on average
University of Arizona
0%
15%
30%
45%
60%
75%
alu4
apex
6
bigke
y
cord
icde
sds
ip
mise
x
mult
32
s142
3s2
98
Avera
ge
Benchmark Circuit
Per
cen
tag
e R
edu
ctio
n Delay Energy Area
67% less energy 49% smaller26% faster
10
Experimental Results Experimental Results ASFPGA vs Balance-Optimized FPGA
ASFPGA Optimized for one particular
hardware application Design space exploration
determined three best architectures for each circuit
Balance-Optimized Balanced FPGA between delay,
energy, and area Selected FPGA architecture with
best average area/delay/energy (ADE) cost
ADE is average of the individual area, delay, energy costs for each FPGA across all benchmarks
Calculated as the area/delay/ energy for an architecture divided by max area/delay/ energy for that hardware circuit
FPGA architecture with best average ADE cost across all circuits:
5-input LUTs, 2 LUTs per CLB, 60% connectivityUniversity of Arizona
11
Experimental Results ASFPGA vs Balance-Optimized FPGA
ASFPGA can provide significant reductions in delay/energy/area over balance-optimized FPGA 25% faster, 36% more energy efficient, or 28% smaller, on average
University of Arizona
0%
20%
40%
60%
80%
100%
alu4
apex6
bigkey
cord
icdes
dsip
mise
x
mult3
2
s142
3s2
98
Avera
ge
Benchmark Circuit
Per
cen
tag
e R
edu
ctio
n Delay Energy Area
73% less energy
49% less area
39% shorter delay
12
Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA
ASFPGA Optimized for one
particular hardware application
Design space exploration determined three best architectures for each circuit
Fixed-Size Balance-Optimized Limited to a fixed size and
balanced between area, delay, and energy
Fixed size is min size needed to support all hardware benchmarks considered
63x63 CLBs
University of Arizona
13
Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA
ASFPGA can provide significant reductions in delay/energy/area over fixed-size balance-optimized FPGA 50% faster, 75% more energy efficient, or 82% smaller, on average
University of Arizona
0%
20%
40%
60%
80%
100%
alu4
apex6
bigkey
cord
icdes
dsip
mise
x
mult3
2
s142
3s2
98
Avera
ge
Benchmark Circuit
Per
cen
tag
e R
edu
ctio
n Delay Energy Area
> 40% area savings for all circuits
> 60% energy savings for most circuits
14
Conclusions and Future Work
Conclusions Presented an initial design space exploration framework for
Application-Specific FPGAs Allows an FPGA architecture to be customized to a particular
hardware circuit before manufacturing Yet flexible enough to support changes to the hardware after
fabrication ASFPGAs are 5% faster, 10% more energy efficient, or 17% smaller
than traditional metric-optimized FPGAs As much as 50% faster, 75% more energy efficient, or 82% smaller, on
average, compared to fixed-size balance-optimized FPGA
Current/Future Work FPGA architecture customization that constructs/optimizes an
FPGA from the logic characteristics of the hardware circuit Potentially can provide significant additional savings by further customizing
individual CLBs and routing resources – but yields irregular FPGA fabric Requires new FPGA CAD tools to handle irregularity to support hardware