Design tools for Reconfigurable Computing in Wireless Communication Systems Eli Bozorgzadeh Center for Embedded Computing Systems Computer Science Department Univ. of California, Irvine [email protected]Reconfigurable Hardware • Hardware on chip that could implement various functionalities • Provides flexibility and adaptability • Software Programmability • Coarse Grained and Fine Grained • Field Programmable Gate Arrays (FPGA) 6/4/2009 2 Virginia Tech Symposium on Wireless Communications
45
Embed
Design Reconfigurable Computing in Wireless Communication ... · Design tools for Reconfigurable Computing in Wireless Communication Systems Eli Bozorgzadeh Center for Embedded Computing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Design tools for
Reconfigurable Computing in Wireless Communication Systems
Eli BozorgzadehCenter for Embedded Computing Systems
Computer Science DepartmentUniv. of California, Irvine
• Cost Function– Wirelength and Total Area– Total Congestion– Total Frames
Virginia Tech Symposium on Wireless Communications
276/4/2009
Whitespace Requirement
Virginia Tech Symposium on Wireless Communications
28
Fixed Fixed
Whitespace AllocationWhitespace Allocation
6/4/2009
FFPR Moves ‐Whitespace Allocation
• Empty space is added in the design to allow global wires to pass without crossing modules.
• The four offset parameters, n, s, e, w, are changed during simulated annealing iterations.
Virginia Tech Symposium on Wireless Communications
29
w
n
Block
s
e
whitespace
Floorplan with block offsets6/4/2009
Research Contributions
• Develop a floorplanner that maximizes reuse of frames by isolating components (RAW 2006)
• Multi‐layer floorplanning for handling multiple designs simultaneously (FPL 2006)
Virginia Tech Symposium on Wireless Communications
306/4/2009
Representation: Sequence Pair
• Sequence Pair consists of two sequences of blocks in the design. <(A, B, C), (B, A, C)>
• Placement of blocks follow the relationship{(.., a, .., b, ..), (.., a, .., b, ..)} => a is left of b{(.., a, .., b, ..), (.., b, .., a, ..)} => a is top of b
• Each block has one and only one relationship with the other blocks.
• O(n2) Time to calculate the placement
Virginia Tech Symposium on Wireless Communications
316/4/2009
Working of Sequence Pair
• Sequence Pair <(1,3,2), (2,1,3)>
Virginia Tech Symposium on Wireless Communications
32
1 3Horizontal Graph
(left‐right property)
2
1Vertical Graph
(top‐bottom property)
3
2
X = 0 X = 2 X = 0
Y = 0
Y = 2
Y = 2
6/4/2009
Working of Sequence Pair
• Sequence Pair <(1,3,2), (2,1,3)>
Virginia Tech Symposium on Wireless Communications
33
2
31
1 3
2
1
3
2
X=0 X=2 X=0
Y=0
Y=2
Y=2
6/4/2009
Multi Layer Sequence Pairs
• One sequence pair for all designs• Fixed Blocks occur only once• {(.., a, .., b, ..), (.., a, .., b, ..)} => a is left of b
if a and b belong to same design• {(.., a, .., b, ..), (.., b, .., a, ..)} => a is top of b
if a and b belong to same design
Virginia Tech Symposium on Wireless Communications
346/4/2009
Working of Multi‐Layer Sequence Pair
Virginia Tech Symposium on Wireless Communications
• Theorems: – All possible overlapping floorplans can be represented using multi‐layer sequence pair.
– Any multi‐layer sequence pair represents a valid overlapping floorplan.
– Of all the floorplans defined using a given multi‐layer sequence pair, the longest path algorithm gives area‐minimal floorplan for that sequence pair.
Virginia Tech Symposium on Wireless Communications
386/4/2009
Experiment Results• Effect of finding common components• Could result in more than 4.8 times savings in frames.
Virginia Tech Symposium on Wireless Communications
39
Number of frames with and without common placementPair Common
PlacementNo Common Placement
P1 962 1753
P2 2078 2978
P3 1994 2975
P4 3414 7662
P5 886 4329
Average 1869 2929
Reuse reduces Reconfiguration Delay
6/4/2009
Dependent Floorplan: Direction
Virginia Tech Symposium on Wireless Communications
40
Direction of floorplanning could affect the savings in frames.
Number of Reconfiguration Frames using Opposite Flows in Dependent Mode
Pair Design 1 -> 2 Design 2 -> 1
P1 962 698
P2 2078 821
P3 1994 1491
P4 3414 Unrouteable
P5 886 Unrouteable
Dependent Mode leads to Infeasibility and Limited Reuse
6/4/2009
Design A Design BTiming Constraint (ns)
CombinedFloorplan(seconds)
Dependent Floorplan(B A)
Timing Constraint (ns)
Combined Floorplan(seconds)
Dependent Floorplan(A B)
6.5 3476 X 5.5 X X
7 1662 X 6 525 X
7.5 1627 5250 6.5 479 X
8 1568 2410 7 469 1318
8.5 1636 2017 7.5 460 853
Reconfiguration‐aware Floorplanner
Reconfigurable Physical Layer for SDR
• SDR base stations provide the capability of assimilating different communication technologies (e.g., in disaster scenarios).
• FPGAs because of their ultimate flexibility and also DSP computation power are important platforms for SDR base stations .
Internet
SDR (PHY)
WiFi B
GPRS linkWiFi link
U1 U2
GPRS
ISM Band
Up/
Dow
nC
onve
rter
6/4/2009 42Virginia Tech Symposium on Wireless
Communications
Software Defined Radio Components on FPGA
• FFT Cores– data processing of 200 Mega‐samples/sec and transform lengths between 16 and 16,384 points
• Viterbi Decoder/encoder• decoding rates of 199 MSPS for a single channel and 273 MSPS for multi‐channel designs
• Serial vs. parallel implementations
• Turbo Decoder/encoder
• CORDIC, MAC, FIR, etc.
6/4/2009Virginia Tech Symposium on Wireless
Communications43
Control PlaneInformation PlaneData Plane
MediumAccess
Controller
ConfigurationManager
Static FPGAConfigurationGeneration
Implementation
Library
Sequence of configuratio
n
Configuration Controller
PowerPC
WiMAX
WiFi
GPRS
ConfigurationMemory
WiMAXDriver
GPRSDriver
WiFiDriver
Radio Front End
Data Sources NetworkMobilityManager
Medium Access Planner
CommunicationSoftware Server
Communication Software LibraryMobile Device
Profiles
SDR Hardware Platform and Network Applications Off-SDR Storage andComputing Server
HW/SW Cross‐LayerAdaptation
6/4/2009 44Virginia Tech Symposium on Wireless
Communications
Joint NSF projectWith Prof. Luke Bao
Reconfiguration Aware Physical Layer Planning in SDR
• While today FPGAs provide dynamic partial reconfiguration capability, their reconfiguration time overhead is a deterrence considering QoS required for real time traffic like VoIP
• Sequence of reconfiguration of communication protocols affects total reconfiguration time overhead thus the result of Sequence Generation Problem (SGP)can find the right sequence to minimize total reconfiguration time
• SGP can be embedded in a floorplanner to generate Sequence Aware Floorplans which systematically reduces reconfiguration time overhead
6/4/2009 45Virginia Tech Symposium on Wireless
Communications
Model of Reconfiguration
FPGA
Ti
TjCij
6/4/2009 46Virginia Tech Symposium on Wireless
Communications
Sequence Matters
T1
T2
T3
C23
C13
C12
T4
C14C34
T1‐>T2‐>T4‐>T3‐>T1
Reconfiguration Cost:
C12 + 0 + C34 + C13
T1‐>T2‐>T3‐>T4‐>T1
Reconfiguration Cost:
C12 + C23 + C34 + C14
<
6/4/2009 47Virginia Tech Symposium on Wireless
Communications
Problem Modeling
T1
T2
T3
C23
C13
C12
T4
C14C34
V1 V2
V3 V4
C12C12
C13C13 C24C24
C14C14
C23C23
6/4/2009 48Virginia Tech Symposium on Wireless
Communications
Multiple Implementations per Design
C12,21C12,21C11,22C11,22 C11,21
C11,21
C12,22
C12,22
V11 V12 A
V22 V21 B
I11
I21
I12
I22
C12,21
C11,22 C12,22
C11,21
6/4/2009 49Virginia Tech Symposium on Wireless
Communications
Parallel Implementation
V11A
B
V11,2
V2
V3
V12
I11
I12
I2
I3
C3,12
C3,11
C23
C2,12
6/4/2009 50Virginia Tech Symposium on Wireless
Communications
Sequence‐Aware Floorplanner ‐Methodology
Multi‐Layer SP Generation
Initialize Temperature
Layout‐Compaction
Random Moves
Input Netlist1, Netlist2, …, Netlistn
Sequence Generation Problem
Cost Calculation
Move Evaluation
Move Accepted?
YES
Update Floorplan
Store the Best Floorplan
Cool Down Temp.NO
6/4/2009 51Virginia Tech Symposium on Wireless
Communications
Partial Reconfiguration Scheme
Number of
Frames
Delay (ms)
Phase 1 Phase 2 Phase 3
1 1.7 0.104 0.033
2 3.2 0.218 0.065
MicroBlaze
ICAP
BRAM
SysAce Controller
Compact Flash
OPB_Timer
Local Memory
LMB
6/4/2009 52Virginia Tech Symposium on Wireless
Communications
Protocol Configuration Statistics
Wireless Protocols Design Size (slices)
Similarity to A (%)
Similarity to B (%)
Similarity to C (%)
Similarity to D (%)
Protocol A (802.16a) 20640 100 67 21 33
Protocol B (802.11a) 20160 67 100 25 40
Protocol C (WCDMA) 14240 21 25 100 45
Protocol D (CDMA) 12640 33 40 45 100
6/4/2009 53Virginia Tech Symposium on Wireless
Communications
Experiments Description
• Experiment 1: Comparing Best/Worst sequence and cost for the floorplanner while we do not consider configuration cost in the cost function
• Experiment 2: Comparing the two approaches where we try to find the best sequence once it is floorplanned Vs. when we floorplan the designs
• Comparing the Best/Worst sequence within the floorplanner
6/4/2009 54Virginia Tech Symposium on Wireless
Communications
Experiment 1
(Cost in ms)
Device (row x column) (slices)
Custom FPGA (100 x 320)
LX80(112 x 320)
LX100(128 x 384)
LX160(176 x 384)
Best Seq/Cost ABCD/26.9 ABCD/35.3 ABCD/6.7 ABCD/2.9
New Scheduling EDFScenario a (19 tasks) 16 11Scenario b (30 tasks) 24 17Scenario c (34 tasks) 35 27Scenario d (20 tasks) 20 11Scenario e (36 tasks) 27 17
Best Effort Comparison between our scheduler and EDF algorithm
6/4/2009 68Virginia Tech Symposium on Wireless
Communications
New Scheduling EDFScenario a 1.4 4.69Scenario b 1.94 10.24Scenario c 2.32 10.34Scenario d 0.79 7.17Scenario e 3.08 23.26
Reconfiguration Overhead comparison between our scheduler and EDF
6/4/2009 69Virginia Tech Symposium on Wireless
Communications
Adaptive‐aware Synthesis for Reconfigurable Systems
Reconfigurable MPSoCs
• Reconfigurability for self‐adaptive systems to respond to unsupervised events – Adaptivity to enhance performance/power (e.g. communication and DSP applications)
– Response to embedded sensor networks• Reconfigurability for self‐healing systems
– Reliability concerns due to transient errors, thermal runways, process variation, etc.
• Requirements– Architectural support for dynamic and static adaptivity/healing management
– Early design planning with system‐level CAD tools
Virginia Tech Symposium on Wireless Communications 716/4/2009
Dynamic Architecture Model (Virtex‐II like)
CLB
Off‐chip memory
Width
Height
Task Ti
On‐chip shared memory
Tj
Frame
Computation
Memory +Communication
Key Concerns in Commercial Architectures with partial RTR
Off‐chip memory
On‐chip shared memory
Column‐based partial RTR
Reconfiguration delay for convolution task (@100 MHz)greater than task execution time (for 256X256 image) !!
Sequential reconfiguration
Placement constraints
Exactly one task reconfigured in a single time‐instant
Significant reconfiguration delay
Delay‐hiding techniques such as configuration prefetch,configuration re‐use, etc
Criticality of linear placement: simple example
Infeasible
T1T3
T2
C1 C2 C3 C4
Execution tim
e
Width
T4
t2
T1 T3
T4
T2
t2
T1T3
T2
C1 C2 C3 C4
Execution tim
e
Width
T4
Feasible 6/4/2009 74
Virginia Tech Symposium on Wireless Communications
Maximize application performance by selecting parallelism granularity for individual data‐parallel tasks
Determine workload of each task instanceGranularity
Key IssuesReconfiguration overhead
Width
E1
Time
Sequential execution
E14E1
3E12E1
1
Width
Time
“Ideal” parallel execution
Ideal gain
Load balancing
E12E1
1
Width
Time R1
2
R14
R13
Reduced gain
E13
Execution with reconfigoverhead
E14
E12E1
1
Width
Time R1
2
R14
R13
Target gain
E13
“Load‐balanced” Execution Simple equationsfor single task
Key issues: Precedence constraints
T1
T2
2T1
1T1
3T1
2T2
1T2
Width
Time E1
E2
R2 E11
Width
Time
R12
E12 R2
1
R22
E22 E2
1
E11
Width
Time
R12
E12
R21
R22
E22E2
1
R13
E13
2T2
1T2
1T1
2T1
Case A Case BTask chain
Experiments: JPEG encoding
RGB2YCrCB_1 RGB2YCrCB_2
DCT
Quantize
Huffman
Colour image
Compressed Image
256X256/less area
RGB2YCrCB
DCT_1
Quantize
Huffman
Colour image
Compressed Image
DCT_2
256X256/more area
RGB2YCrCB_1
DCT_1
Quantize_1
Huffman
Colour image
Compressed Image
DCT_2
512X512/more area
Quantize_2
RGB2YCrCB_2
Problem Overview: Exploiting limited bandwidth
Memory
Memory controller
High‐performance shared bus
OFF‐CHIP
Platform
FPG
A
Local Memory
STATICALLYCONFIGURED
RUN‐TIME(RE)CONFIGUREDTask‐j
Task‐i
PPC Other Static
System Architecture for on‐the‐fly computing
6/4/2009 81Virginia Tech Symposium on Wireless
Communications
Problem Overview: Exploiting limited bandwidth
Platform
FPG
A
Memory
Memory controller
High‐performance shared bus
OFF‐CHIP
STATICALLYCONFIGURED
RUN‐TIME(RE)CONFIGURED
Task‐jTask‐i
PPCOther Static
System Architecture for on‐the‐fly computing
Bandwidth key system resourceTask execution time depends on bandwidth availability
Problem Objective: Minimize application execution time with limited bandwidth
6/4/2009 82Virginia Tech Symposium on Wireless
Communications
Task Microarchitecture
Shared
Com
mun
ication Med
ium
Interface logic
Mem
ory Access Logic
Interface Clock Domain Task Clock Domain
Rx Buffer
TxBuffer
(Data PREFETCH) (Task CORE)
Core Logic
6/4/2009 83Virginia Tech Symposium on Wireless
Communications
Theoretical principles for single task
Width
Time
E11
E13
R13
R12
E12
L1
BW = 3*BW_1 Constraint = 2.5*BW_1
L2Width
Time
E11
E13
R13
R12
E12
Width
Time
E11
E13
R13
R12
E12
L3
EqualBandwidth
Width
Time
E11
E13
R13
R12
E12
L4
DifferentBandwidth
Lemma:Assigning remaining bandwidth to lastinstance results in fastest execution
HalfFrequency
6/4/2009 84Virginia Tech Symposium on Wireless
Communications
Experimental results for unsharp masking (512 X 512)
Bandwidth = 630 MB/sSchedule length = 18.8 ms
RGB2YCbCr
Blur
Sub
Add
YCbCr2RGB
Colour Image
Filtered Image
140
110
210
210
140
Constraint = 600 MB/s
Schedule length = 16.25 ms
RGB2YCbCr_1
Blur
Sub
Add
YCbCr2RGB_1
Colour Image
Filtered Image
140
110
197
197
140
RGB2YCbCr_2
YCbCr2RGB_2
60
60
6/4/2009 85Virginia Tech Symposium on Wireless
Communications
Challenges in System Design Tools
• Lack of design automation tools that are aware of hardware/software adaptation in the system
• No feedback and back‐end tool awareness to application layer– Not aware of reconfiguration challenges at physical layer
• Need to develop automation tools to guide the designers both at physical layer and higher levels to converge and close the loop for effective adaptive systems
6/4/2009 86Virginia Tech Symposium on Wireless
Communications
StaticComputationStaticComputation
Adaptivity‐driven System SynthesisAdaptivityAdaptivity‐‐driven System Synthesisdriven System Synthesis