RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar – Southern Methodist University April 23, 2003 This work has been supported in part by NSF, Nokia and Texas Instruments
41
Embed
Sridhar Rajagopal - Rice University Electrical and Computer …sridhar/ppts/smu-talk.pdf · 2003. 4. 29. · Sridhar Rajagopal Department of Electrical and Computer Engineering Rice
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
RICE UNIVERSITY
Flexible wireless communication architectures
Sridhar Rajagopal
Department of Electrical and Computer EngineeringRice University, Houston TX
Rapid, structured architectures with flexibility-performance tradeoffs
7RICE UNIVERSITY
Scalable Wireless Application-specific Processors
Ø Family of flexible programmable processorsð Clusters of ALUsð High performance by supporting 100’s of ALUsð Can provide customization for various algorithmsð Adapts (“swaps”) architecture dynamically for power
+
?
**
+
**
+
**
+
**
…? ? ?
Scale Clusters
ScaleALUs
8RICE UNIVERSITY
Rapid, structured design for SWAPs
Low “complexity”, parallel, fixed pointalgorithms
Architecture Exploration ASIC
designapply
DSPdesign
apply
SWAPs+?**
+
**
+
**
+
**
…? ? ?
9RICE UNIVERSITY
Research vision summary
Ø Provide a structured framework to rapidly explore:ð flexible, high performance, low power architectures (SWAPs)
Ø Efficient algorithm design for mapping to SWAPs
Ø Understanding of algorithms, DSPs and ASICs used
Ø Flexibility-performance trade-offs
Inter-disciplinary research:Wireless communications, VLSI Signal Processing, Computer
Ø Multiple algorithmsð Different ALU, cluster requirements
Ø Turning off ALUs ( –add –mul compiler options)ð Use the right #ALUs from “explore” tool
Ø Turning off clustersð Data across SRF of all clustersð Cluster only has access to its own SRFð Next kernel may need data from SRF of other clustersð Reconfiguration support needs to be provided
27RICE UNIVERSITY
SWAPs provide cluster reconfiguration
SRF
Clusters
Mux-DemuxNetwork
WithStreambuffers
MDX2 MDX2
MDX1
LATCH LATCH LATCH LATCH
Additional latency (few cycles) due to microcontroller stalls
- Minimal loss in performance
28RICE UNIVERSITY
Cluster reconfiguration for Viterbi
Packet 1Constraint length 7
(16 clusters)
Packet 2Constraint length 9
(64 clusters)
Packet 3Constraint length 5
(4 clusters)
DP Can be turned OFF
29RICE UNIVERSITY
64-bit Rate ½
Packet 1K = 7
Packet 2K = 9
Packet 3K = 5
Kernels(Computation)
No Data Memoryaccesses
Exe
cution
Tim
e (c
ycle
s)Clusters Memory
SWAPs provide flexibility at negligible overhead
30RICE UNIVERSITY
SWAP exploration for Viterbi decoding
1 10 1001
10
100
1000
Number of clusters
Freq
uen
cy n
eed
ed t
o a
ttai
n r
eal-
tim
e (i
n M
Hz)
K = 9K = 7 K = 5Different SWAPs
(Without reconfiguration)Same SWAP
(With reconfiguration)
DSP
Ideal C64x (w/o co-proc) needs ~200 MHz for real-time
Max DP
31RICE UNIVERSITY
SWAPs : Salient features
Ø 1-2 orders of magnitude better than a DSP
Ø Any constraint length ⇒ 10 MHz at 128 Kbps
Ø Same code for all constraint lengths ð no need to re-compile or load another codeð as long as parallelism/cluster ratio is constant
Ø Power savings due to dynamic cluster scaling
32RICE UNIVERSITY
Expected SWAP power consumption
Ø Power model based on [Khailany’03]Ø 64 clusters and 1 multiplier per cluster:ð 0.13 micron, 1.2 Vð Peak Active Power: ~9 mW at 1 MHz (DSP ~1 mW)ð Area: ~53.7 mm2
Ø 10 MHz, 128 Kbps with reconfiguration
Exploring the VLSI Scalability of Stream Processors, Brucek Khailany et al, Proceedings of theNinth Symposium on High Performance Computer Architecture, February 8-12, 2003
0 10 20 30 40 50 60 700102030405060708090
Active Clusters (max 64)P
ow
er (
inm
W)Viterbi Clusters Used Peak Power
K = 9 64 ~90 mW
K = 7 16 ~28.57 mW
K = 5 4 ~13.8 mW
overhead 0 ~8.1 mW
DSP, K = 9 1 ~200 mW
33RICE UNIVERSITY
Multiuser Estimation-Detection+Decoding
Real-time target : 128 Kbps per user
1 10 10010
100
1000
10000
100000
Number of clustersFreq
uenc
y ne
eded
to
atta
in r
eal-
time
(in M
Hz)
FASTMEDIUMSLOW
32-user base-station
Mobile
DSP
Ideal C64x (w/o co-proc) needs ~15 GHz for real-time
Fading scenarios
34RICE UNIVERSITY
Expected SWAP power : base-station
Ø 32 user base-station with 3 X’s per cluster and 64 clusters:ð 0.13 micron, 1.2 Vð Peak Active Power: ~18.19 mW for 1 MHz (increased X)ð Area: ~93.4 mm2
Ø Total Peak Base-station power consumption:ð ~18.19 W at 1 GHz for 32 users at 128 Kbps/user
35RICE UNIVERSITY
Talk Outline
Ø Research vision
Ø SWAP Background
Ø Algorithm design for SWAPs
Ø Architecture design for SWAPs
Ø Current and Future Research Goals
36RICE UNIVERSITY
Current research: Flexibility vs. performance
SWAPs: 128 Kbps at ~10-100 mW for Viterbið Borrow DP from ASICs!
Ø suitable for base-stationsð Flexibility more important than power
Ø suitable for mobile devicesð Power constraints tighterð can be customized for further power savings
Handset SWAPs (H-SWAPs)ð Borrow Task pipelining from ASICs!ð Application-specific units and specialized comm. network
37RICE UNIVERSITY
Handset SWAPs: H-SWAPs
Ø Trade Data Parallelism for Task Pipelining
SRF
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
…
DP
SWAPs(max. clusters
and reconfigure)
+++*
+++*
+++*
+++*
LimitedDP
SWAPlet(limit
clusters)
+++*
+++*
+++*
+++*
LimitedDP
++*
++*
++*
++*
LimitedDP
++++
++++
LimitedDP
H-SWAPs(collection of customized
SWAPlets)
38RICE UNIVERSITY
Sample points in architecture exploration
DSPs(1 cluster)
ILPSubword
ILPSubword
DP
SWAPs(multiple)
H-SWAPs(optimized for handsets)
ILPSubword
DP Task PipeliningCustom ALUs
Programmable solutions with increased customization
Performance, Power benefits(with decreasing flexibility)
39RICE UNIVERSITY
Future: Efficient algorithms and mapping
MultipathChannel
EqualizerMRC Decoder
DetectorDemodulator
Non-Coherent
STC
Beam-forming
CoherentSTC
ChannelEstimator
Channel
Turbo Equalizer
Multiple antenna systems with 1-2 orders-of-magnitude higher complexity
40RICE UNIVERSITY
Future research: Architectures
Generalized and structured framework and tools ð Joint algorithm-architecture explorationð Area-time-power-flexibility tradeoffs
Potential applications: embedded systemsØ Image and Video processing: ð Cameras : variety of compression algorithms
Ø Biomedical applications:
ð Hearing aids: DSP running on body heat*
Ø Sensor networksð Compression of data before transmission
*Quote: Gene Frantz, TI Fellow
41RICE UNIVERSITY
SWAPs: Flexibility, Performance, Power
Ø Need flexibility in future wireless devicesð Algorithms and Architectures
Ø Rapid Exploration for Scalable, Wireless Application-specific Processorsð Structured approach with flexibility-performance trade-offs
Ø SWAPs - flexibility, high performance and low powerð Exploit data parallelism like ASICsð 1-2 orders better performance than DSPsð Turn off unused clusters and unused ALUs for low power