Architectural Optimization of Decomposition Algorithms for Wireless Communication Systems Ali Irturk † , Bridget Benson † , Nikolay Laptev ‡ , Ryan Kastner † † Department of Computer Science and Engineering University of California, San Diego {airturk, b1benson, kastner}@cs.ucsd.edu 1 ‡ Department of Computer Science University of California, Los Angeles [email protected]April 2009
30
Embed
Architectural Optimization of Decomposition Algorithms for Wireless Communication Systems
Architectural Optimization of Decomposition Algorithms for Wireless Communication Systems. April 2009. Motivation. Matrix Decompositions are essential computations for wireless communications; Matrix Decompositions are used for simplifying matrix inversion which are used in - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Architectural Optimization of Decomposition Algorithms
for Wireless Communication Systems
Ali Irturk†, Bridget Benson†, Nikolay Laptev‡, Ryan Kastner†
† Department of Computer Science and EngineeringUniversity of California, San Diego
{airturk, b1benson, kastner}@cs.ucsd.edu
1
‡Department of Computer ScienceUniversity of California, Los Angeles
Matrix Decompositions are essential computations for wireless communications;
Matrix Decompositions are used for simplifying matrix inversion which are used in • Equalization algorithms to remove the effect of
the channel on the signal,• Minimum mean square error algorithms for pre-
coding in spatial multiplexing,• Detection-estimation algorithms in space-time
coding.
QR,A-1
2
Motivation
3
There are a number of tools that translate Matlab algorithms to a hardware description language;
However, we believe that the majority of these tools take the wrong approach;
We take a more focused approach, specifically developing a tool that is targeting matrix computation algorithms.
Computing Platforms
4
ASICs DSPs FPGAs GPU CELL BE
Exceptional Performance
Long Time to Market Substantial Costs
Ease of Development Fast Time to Market Low Performance
Ease of Development Fast Time to Market ASIC-like Performance
Major Contributions
5
Design of a novel tool, GUSTO, for automatic generation and optimization of application specific matrix computation architectures from a given Matlab algorithm;
Comparison of different matrix decomposition methods in terms of different matrix dimensions, bit widths and parallelism;
Thorough study of area and throughput tradeoffs of matrix decomposition architectures using different parameterizations;
A case study: Implementation of Adaptive Weight Calculation Core using QRD-RLS algorithm.
GUSTO General architecture design Utility and Synthesis Tool for Optimization
GUSTO an easy-to-use tool for more efficient design space exploration and development; automatically generates and optimizes application specific architectures; creates a prototype hardware system in just minutes instead of days or weeks.
GUSTO Bit width
(e.g. 19 bits of precision)
Resource Allocation (e.g. 4 multipliers and 3 adders)
Modes(e.g. Heterogeneous cores connected using
hierarchical datapaths)
Algorithm(e.g. QR decomposition)
HDL files
Error AnalysisNumber of bits used
40342822 40 46 52 58 6416
10-15
100
10-5
10-10
Aver
age
Err
or
6
Outline
Motivation
GUSTO: Design Tool and Methodology
Decomposition Methods
Results• Inflection Point Analysis• Architectural Design Alternatives
Conclusions
7
GUSTO Design Flow
Algorithm AnalysisAlgorithm
Instruction Generation
Resource AllocationType and # of Arithmetic Resources
Design Library
Error Analysis Error Analysis
Architecture GenerationData Representation
Collecting Scheduling Information
Resource Trimming for Hardware Optimization
Area, Latency and Throughput Results
Simulation Results
General Purpose Architecture
Application Specific Architecture
8
GUSTO Design Flow
Algorithm AnalysisAlgorithm
Inst.Cont.
AAAA
MMMM
Mem.Cont.
Processing Element
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
Software Defined Radio
Software Defined Radio
GUSTO provides options to divide the given algorithm into smaller processing elements which are small in area and highly optimized for throughput.
?
9
GUSTO Design Flow
Instruction Generation
Resource AllocationType and # of
Arithmetic Resources Design Library + -* /
GUSTO uses instruction scheduling for better resource utilization and provides different scheduling methods.
GUSTO generates resource constrained architectures, i.e. the user chooses the number and type of arithmetic units.
Inst.Cont.
AAAA
MMMM
Mem.Cont.
Processing Element
?
10
GUSTO Design Flow
Error AnalysisError Analysis
GUSTO employs fixed point arithmetic in generated architectures;
GUSTO performs error analysis to find an appropriate fixed point representation which provides results with the accuracy similar to that of a floating point implementation.
GUSTO MATLAB
Error Analysis Metrics:1) Mean Error2) Peak Error
3) Standard Deviation of Error4) Mean Percentage Error
User Defined Input Data
Fixed Point Arithmetic Results(using variable bit width)
Floating Point Arithmetic Results(Single/Double precision)
11
GUSTO Design Flow
Architecture Generation
GUSTO generates a CPU like architecture with• Dynamic Instruction Scheduling;• Dynamic Memory Assignments;• Full Connectivity between functional units.
Instruction Controller
Arithmetic Unit
Memory Controller
Arithmetic Unit
Arithmetic Unit
Arithmetic Unit
Multipliers
Adders
MultipliersMultipliersMultipliersMultipliers
Arithmetic Units
Full Connectivity
Dynamic Instruction Scheduling
Dynamic Memory
Assignments
12
GUSTO Design Flow
Collecting Scheduling Information
Instruction Controller
Arithmetic Unit
Memory Controller
Arithmetic Unit
Arithmetic Unit
Arithmetic Unit
Multipliers
Adders
MultipliersMultipliersMultipliersMultipliers
Arithmetic Units
Full Connectivity
Static Instruction Scheduling
Static Memory Assignments
GUSTO collects scheduling information from instruction and memory controllers.
GUSTO uses this information to eliminate unneeded resources, automatically creating a small, fast statically scheduled architecture.
13
GUSTO Design Flow
Resource Trimming for Hardware Optimization
GUSTO simulates the architecture to define the usage of arithmetic units, multiplexers, register entries and input/output ports and trims away the unused components with their interconnects.
GUSTOs’ optimization provides tremendous silicon savings while ensuring the correctness of solution.
Multiplier
Adder
Memory
Full Connectivity
Multiplier
Adder
Memory
Required Connectivity
14
GUSTOTrimming Feature
A
In_A1 In_A2
Out_mem2
Out_A
Out_mem1
B
In_B1 In_B2
Out_B
mem
In_mem1
A
Out_AOut_BOut_mem
1Out_mem2 Out_
AOut_BOut_mem
1Out_mem2
Out_A
01011010In_A1
In_A2
Out_A Out_B Out_mem
1
Out_mem
2
Simulation runs
15
GUSTOTrimming Feature
A
In_A1 In_A2
Out_mem2
Out_A
Out_mem1
B
In_B1 In_B2
Out_B
mem
In_mem1
B
Out_AOut_BOut_mem
1Out_mem2 Out_
AOut_BOut_mem
1Out_mem2
Out_B
00000000In_B1
In_B2
Out_A Out_B Out_mem
1
Out_mem
2
Simulation runs
16
Outline
Motivation
GUSTO: Design Tool and Methodology
Decomposition Methods
Results• Inflection Point Analysis• Architectural Design Alternatives
•F. Edman, V. Öwall, “A Scalable Pipelined Complex Valued Matrix Inversion Architecture”, IEEE International Symposium on Circuits and Systems. (2005).•M. Karkooti, J.R. Cavallaro, C. Dick, “FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm”, Asilomar Conference on Signals, Systems and Computers (2005).•C. Dick, F. Harris, M. Pajic, D. Vuletic, “Real-Time QRD-Based Beamforming on an FPGA Platform,” Asilomar Conference on Signals, Systems and Computers (2006).
27
Adaptive Weight Calculation (AWC) Core
Outline
Motivation
GUSTO: Design Tool and Methodology
Decomposition Methods
Results• Inflection Point Analysis• Architectural Design Alternatives
Conclusions
28
GUSTO General architecture design Utility and Synthesis Tool for Optimization
GUSTO is a tool to provide automatic generation and optimization of a variety of application specific processing elements (PEs) with different parameterization options;
Current Projects includes implementation of• Short Preamble Processing unit for OFDM Receiver design.
GUSTO Bit width
(e.g. 19 bits of precision)
Resource Allocation (e.g. 4 multipliers and 3 adders)