Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Design Space Exploration for Application Specific FPGAs in

System-on-a-Chip Designs

Mark Hammerquist, Roman LyseckyDepartment of Electrical and Computer Engineering

University of Arizona, Tucson AZ, [email protected], [email protected]

http://www.ece.arizona.edu/~embedded

2

Introduction and MotivationFPGAs vs. ASICs

FPGAs vs ASICs in SoC Designs Advantages of FPGAs

Programmed by downloading bits to the FPGA

Much like software executing on a microprocessor

Allows hardware modifications throughout the development cycle

And, even after manufacturing Correct costly design errors without requiring

respin

Dynamically reconfigurable FPGAs can be used to implement multiple

hardware circuits throughout its execution

Disadvantages of FPGAs 10-40x larger than ASICs 5-12x more power than ASICs 3-4x longer delay than ASICs

Kuon et al. FPGA 2006

University of Arizona

µP

Periphs

I$

D$

FPGA

µP

Periph(s)

I$

D$

ASIC

How can we take advantage of FPGAs without the significant overheads?

3

Introduction and MotivationApplication-Specific FPGAs

SoCs require fabrication Provides an opportunity to customize the

FPGA architecture Reduce area, reduce energy, improve

performance

Application-Specific FPGA Create an FPGA architecture tailored to

the specific hardware circuit Flexible-optimized

Optimized for one application, but flexible enough to implement other hardware circuits or additions

Fully-optimized Highly optimized for one application – only

flexible enough to support minor changes Trades off flexibility for smaller

area/power/delay


HW Circuit

ASFPGA Generation

FPGA Architecture & Bitstream

µP

Periphs

I$

D$

FPGAASFPGA

4

Introduction and MotivationPrevious Work


Researchers have investigated various methods for optimizing reconfigurable fabrics Levinthal et al. (DesignCon, 2005)

Coarse-grained reconfigurable logic cells with fixed routing

Aken’Ova et al. (IEEE Custom IC, 2005) FPGA-specific standard cells

Rose et al. (FPGA 2003, 2005) Auto generate transistor-level implementation of FPGA from

architectural description Enabling technology

Holland et al. (FPL 2004, 2005; FPGA, 2006) Automated tool flow for creating domain-specific reconfigurable logic Domains: floating point, arithmetic, encryption, sorters

5

Application-Specific FPGAs (ASFPGAs)Traditional FPGA CAD Tool Flow

Traditional CAD Tool Flow Utilize academic FPGA CAD tools to

map hardware circuits to target FPGA

Technology mapping (FlowMap) Packing (T-VPack) Placement and routing (VPR)

FPGA architecture is known a prioiri and represents the target FPGA

Application-Specific FPGA FPGA’s architectural features can

be tuned to the target hardware circuit

FPGA CAD tools can be utilized to explore the available architectural options

Currently focus on a creating a flexible-optimized ASFPGA

HW Circuit (BLIF)

Tech. Mapping (FlowMap)

Mapped Circuit (BLIF)

Packing(T-VPack)

Packed Circuit (Netlist)

Placement/Routing (VPR)

HW Bitstream Design Metrics (Area, Delay,

Energy)

LUT Size

CLB Size

Connectiv

ity/

Chann

el

Wid

th/FP

GA

Size

FPGA Arch.


6

Application-Specific FPGAs (ASFPGAs)Design Space Exploration Framework

Design Space Exploration Framework Explores a set of configurable options

for the target FPGA Goal: Find lowest area/delay/power

FPGA architecture for target application

Configurable FPGA Options LUT Size:

3-, 4-, or 5-input LUTs

CLB Size: 2 or 4 LUT CLBs

Connection Block Connectivity: 100%, 90%, 80%, 70%, 60%

FPGA Size: NxN fixed size

Channel Width: 100%-130% of minimum channel width

More configurable options exist, but are not considered at this time


HW Circuit (BLIF)



Design Space Exploration for ASFPGAs

Packing/Activity Est. (T-VPack)


Switching Activity

Placement/Routing/Power Est. (VPR with Power Model)


Energy)

LUT Size

CLB Size

Connectiv

ity/

Channel

Wid

th/FP

GA

Size

FPGA Arch. & Bitstream

7

Application-Specific FPGAs (ASFPGAs)Experimental Setup

Experimental Setup Consider several MCNC benchmark

circuits of varying complexity alu4, apex6, bigkey, cordic, des,

dsip, misex1, mult32a, s1423, s298

Design Metric Calculation Delay is reported by VPR after

routing Power Model utilized to estimate

power consumption Poon et al. (TODAES 2005)

Area Routing area is reported by VPR Developed a transistor level

estimation method to determine CLB area requirements


HW Circuit (BLIF)



Design Space Exploration for ASFPGAs

Packing/Activity Est. (T-VPack)


Switching Activity

Placement/Routing/Power Est. (VPR with Power Model)


Energy)

LUT Size

CLB Size

Connectiv

ity/

Channel

Wid

th/FP

GA

Size

FPGA Arch. & Bitstream

8

Experimental ResultsASFPGA vs Delay/Energy/Area-Optimized FPGA

ASFPGA Optimized for one particular

hardware application Design space exploration

determined three best architectures for each circuit

Delay/Energy/Area-Optimized Best average delay, energy, or

area across all hardware circuits

Delay- and energy-optimized architecture:

5-input LUTs, 4 LUTs per CLB, 80% connectivity

Area-optimized architecture: 3-input LUTs, 2 LUTs per CLB,

90% connectivity


9

Experimental Results ASFPGA vs Delay/Energy/Area-Optimized FPGA

ASFPGA provides good reductions over delay-optimized, energy-optimized, and area-optimized FPGAs 5% faster, 10% more energy efficient, or 17% smaller, on average


0%

15%

30%

45%

60%

75%

alu4

apex

6

bigke

y

cord

icde

sds

ip

mise

x

mult

32

s142

3s2

98

Avera

ge

Benchmark Circuit

Per

cen

tag

e R

edu

ctio

n Delay Energy Area

67% less energy 49% smaller26% faster

10

Experimental Results Experimental Results ASFPGA vs Balance-Optimized FPGA

ASFPGA Optimized for one particular

hardware application Design space exploration

determined three best architectures for each circuit

Balance-Optimized Balanced FPGA between delay,

energy, and area Selected FPGA architecture with

best average area/delay/energy (ADE) cost

ADE is average of the individual area, delay, energy costs for each FPGA across all benchmarks

Calculated as the area/delay/ energy for an architecture divided by max area/delay/ energy for that hardware circuit

FPGA architecture with best average ADE cost across all circuits:

5-input LUTs, 2 LUTs per CLB, 60% connectivityUniversity of Arizona

11

Experimental Results ASFPGA vs Balance-Optimized FPGA

ASFPGA can provide significant reductions in delay/energy/area over balance-optimized FPGA 25% faster, 36% more energy efficient, or 28% smaller, on average


0%

20%

40%

60%

80%

100%

alu4

apex6

bigkey

cord

icdes

dsip

mise

x

mult3

2

s142

3s2

98

Avera

ge

Benchmark Circuit

Per

cen

tag

e R

edu

ctio

n Delay Energy Area

73% less energy

49% less area

39% shorter delay

12

Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA

ASFPGA Optimized for one

particular hardware application

Design space exploration determined three best architectures for each circuit

Fixed-Size Balance-Optimized Limited to a fixed size and

balanced between area, delay, and energy

Fixed size is min size needed to support all hardware benchmarks considered

63x63 CLBs


13

Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA

ASFPGA can provide significant reductions in delay/energy/area over fixed-size balance-optimized FPGA 50% faster, 75% more energy efficient, or 82% smaller, on average


0%

20%

40%

60%

80%

100%

alu4

apex6

bigkey

cord

icdes

dsip

mise

x

mult3

2

s142

3s2

98

Avera

ge

Benchmark Circuit

Per

cen

tag

e R

edu

ctio

n Delay Energy Area

> 40% area savings for all circuits

> 60% energy savings for most circuits

14

Conclusions and Future Work

Conclusions Presented an initial design space exploration framework for

Application-Specific FPGAs Allows an FPGA architecture to be customized to a particular

hardware circuit before manufacturing Yet flexible enough to support changes to the hardware after

fabrication ASFPGAs are 5% faster, 10% more energy efficient, or 17% smaller

than traditional metric-optimized FPGAs As much as 50% faster, 75% more energy efficient, or 82% smaller, on

average, compared to fixed-size balance-optimized FPGA

Current/Future Work FPGA architecture customization that constructs/optimizes an

FPGA from the logic characteristics of the hardware circuit Potentially can provide significant additional savings by further customizing

individual CLBs and routing resources – but yields irregular FPGA fabric Requires new FPGA CAD tools to handle irregularity to support hardware

modifications


15

Thanks

Questions?


Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Documents

application specific

fpga architecturereduce

multiple hardware circuits

advantage of fpgas

target fpgagoal

routing vprfpga architecture

automated tool flow

university of arizonapperiphsi