1 1 ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol DSP Architectures for Next-Generation Wireless Communications Chris Nicol Bell Laboratories Australia Lucent Technologies [email protected]Ingrid Verbauwhede Department of Electrical Engineering University of California Los Angeles [email protected]2 ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol Mobile Wireless Trends Subscribers in (000) 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000 1,600,000 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Global W ireline Gobal Wireless Wireless CAGR 21% Global Penetration (2010) - 21% (Cellular+PCS+WLAS+Other) Wireline CAGR - 5% Global Penetration (2010) - 20% Global Pop - 7 bill CAGR 1995-2010 - 1.4% Subscribers (000) World-wide deployment of mobile communications is exceeding expectations
36
Embed
DSP Architectures for Next-Generation Wireless Communicationsingrid/Presentations/isscc_dsptut.pdf · DSP Architectures for Next-Generation Wireless Communications ... 1data/program
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
1ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
DSP Architectures for Next-Generation Wireless Communications
32ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Pipeline RISC compared to DSPRISC:example
DSP: memory intensive applications:
r0 = *p0; // load dataa0 = a0 + r0; // execute
MemoryAccessDecodeFetch Execute
MemoryAccessDecodeFetch Execute
MemoryAccessDecodeFetch Execute
Too expensive for DSP
ExecuteDecodeFetchMemoryAccess
ExecuteDecodeFetchMemoryAccess
ExecuteDecodeFetchMemoryAccess
ExecuteDecodeFetchMemoryAccess
Penalty: data dependent branch is expensive
17
33ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Other control features
Hardware looping:
• Because software branch is expensive• “Zero overhead hardware loops” (for tight FIR loops)
hardware supported
Interrupts: hardware with shadow registers for extremely fastcontext switching.
Special instruction cache:• Single instruction “repeat” buffer• Multiple instruction cache: under programmers control!• E.g. Lucent DSP16210:31x 32 instruction cache
Predictable worst case execution time!
34ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Low Power DSP’sC54x 1V DSP(Texas Instruments - ISSCC 1997)
0.35µ 3LM CMOS80 M 16b MAC/s at 3.3V1.4 mW/MHz at 3.3V30 µW stand-by power
0.25µ 3LM CMOS65 M 16b MAC/s at 1.0V0.21 mW/MHz at 1.0V4.0 mW stand-by power
Dual Vt process
18
35ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
BUT: DSP Software Development
• Complex DSP architecture not amenable to compiler technology
• Algorithms are modeled in high level language (e.g. C++)
• Solutions are implemented and debugged in hand-optimized assembler - large development effort with minimal tool support
HLL
algorithmic
model
prototype
code
production
code
hand coded assembler
optimize & debug
Long, frustrating time to market
Fragile legacy code
Still used in handhelds, but change in basestations, Part II
36ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Mobile Wireless Evolution
SERVICE
First Generation
Mobile TelephoneService: Carphone
Analog CellularTechnology
MacrocellularSystems
Past
Second Generation
Digital Voice +and Messaging/Data
Services
Fixed Wireless Loop
Digital CellularTechnology + INemergence
Microcellular &Picocellular:capacity, quality
Enhanced CordlessTechnology
Now
Third Generation
Integrated High QualityAudio and Data.Narrowband andBroadband MultimediaServices + IN integration
Broader BandwidthEfficient Radio Transmission
Information Compression
Higher FrequencySpectrum Utilization
IN + Network Managementintegration
Year 2000-2005
Fourth Generation
TelePresencing
Education, training anddynamic information access
Wireless- Wireline andBroadbandTransparency
Knowledge-BasedNetwork Operations
Unified Service Network
Year 2010?
TECHNOLOGY
WCDMAUWC-136 TDMAcdma2000
NMTTACSAnalog AMPS
GSMIS-54/ 136 TDMAIS-95/ cdmaOnePDCDECT
We are entering the decade of wireless data communications - and World-War 3G
Global roaming
19
37ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Mobile Data Services• Carriers invest >$500 per subscriber but subscriber voice calls (and therefore revenues) are reducing.
• Data currently 3% of wireless traffic - projected to >50% by 2005
• Wireless Internet : Average internet connection 30 mins
• Text Messaging: Saturating 2G voice networks
2.5 Generation Mobile Standards [1]GPRS: Packet Data over GSM - timeslot multiplexing, multi-slots per user.EDGE: 8-PSK modulation + GPRS, 384 Kbps max to 1 user.
38ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Evolution of Mobile Wireless Network Architecture
…
BaseStations
PacketMode
ServersHigh Speed Data,
Multimedia,Voice over IP,
etc.
WirelessControlServers
(Feature Control,Network Management,
Billing, etc.)
RadioClients
MSC
BSC
…
Internet / Advanced ServicesPSTN
CircuitMode
Servers(Voice, LowSpeed Data,
etc.)
PSTN
NetworkServers
MobileSwitches
Packet Connectivity (ATM / IP)
2G Network IP-based 3G Network
Mobile networks are being upgraded in preparation for the delivery of high speed data services.
20
39ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Mobile Wireless Infrastructure
Macro-cell GSM Basestation(6-12 TRX)
Micro-cell GSM Basestation(2 TRX)
40ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
2G Basestation Baseband Processing
• Multiple DSPs used for baseband processing.• RISC Microcontroller for timing, framing, I/O control• Software upgradable over the network• DSPs dominate cost and power consumption
DSP RISCMicro
Controller
I/O
T1/E1
DSP
DSP
DSP
DSP
DSP
DSP
DSP
I/O
I/O I/O ASIC
DSP
DSP
AFE
AFE
ChannelEqualization
ChannelDe/coding Encryption
RAM
RAM
Tx
TxRx
Rx
Tx/Rx baseband processing board for 2-carrier GSM basestation
41ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
3G Basestation Baseband Processing
• Increased Receiver Algorithm Sensitivity• Antenna Arrays - Smart Antennas• Multi-Standard Basestations using Software Radio Architecture• 3G - constraint length 9, rate 1/2 convolutional coding for voice.• 3G - constraint length 4, Turbo codes for data
Increased DSP performance needed in next-generation basestation
High Performance DSPs+ Custom Logic needed for 3G (Viterbi decoding and Turbo decoding)
RAKE combinerreassemble multipath
(DSP, ASIC)
Sliding correlatordespreading
(ASIC)
Deinterleaver(DSP)
DecoderViterbi algorithmTurbo decoding
(DSP, ASIC)
Code trackingdelay-lock-loop(ASIC, DSP)
Channel estimation(DSP)
Code generatorchannelisation code
scambling code(ASIC))
Code generatorchannelisation code
scambling code(ASIC))
Synchronisationcell search
slot syn, frame syn.(DSP)
Path search(ASIC)
SIR measurementfast power control
(DSP)
Power control
Courtesy: Bing Xu: Bell Labs Australia
42ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Receiver Algorithms for GSM Basestation
• Enhanced Receiver Sensitivity• Larger Cells in Suburban Areas = Reduced network cost• Mobile transmits with less power = Increased battery life
EstimatingWirelessChannel
EqualizingMulti-pathEffects
ChannelDecoding
SpeechDecoding
Existing Receiver
New Iterative Receiver
Challenge - requires 6x DSP MIPS of existing receiver in basestation
EstimatingWirelessChannel
EqualizingMulti-pathEffects
ChannelDecoding
SpeechDecoding
SpeechStatistics
1.3dB improvement
Courtesy: Magnus Sandell: Bell Labs UK
22
43ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
OmnidirectionalCell Site
Three SectorCell Site
Intelligent AntennaCell Site
• A multiple antenna element system• Combined with a base station architecture and signal processingtechniques designed to dynamically select or form the “optimum” beam pattern per user
Smart Antennas
Increased cost in RF electronics and enhanced DSP requirements.
44ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Fixed Multi-Beam Versus Adaptive Beam
Mobile
Reflected Ray
Select from--or use--multiple “fixed” antenna beams to optimize
performance.
Fixed Multi-BeamMobile 1
Direct Ray
Reflected Rays
Mobile 1
Mobile 2
Adaptively “weight” and combine multiple antenna elements to optimize
performance.
Adaptive Beam
Mobile 2
Interferer
Direct Ray
Interferer
23
45ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
46ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Wideband Receiver Architecture
HighSpeed
A/D
BasebandProcessing
......
CH1
CHM
CH1
CH2
CH3
CHM
. . .
freqfBB
CH1
CHM
DigitalChanneliser
RF-IF &Filter
CH1
freq
CHM
freq
CH1
CH2
CH3
CHM
. . .
freqfRF
CH1
CH2
CH3
CHM
. . .
freqfIF
Increased DSP performanceneeded for Software Radio
24
47ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Turbo Codes
• Parallel concatenation of convolutional codes is used to give the codes structure so they can be decoded
• Pseudorandom interleaving is used to give the codes performance which approaches that for random coding
• Resulting encoder structure: Two Recursive Systematic Convolutional(RSC) Codes
Encoder#1
Encoder#2Int
erlea
ver MUX
Input
ParityOutput
Systematic Output
For 3G Wireless (UMTS and CDMA2000)• Voice service: BER requirement 10-3
• Data service: BER requirement 10-5
48ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Turbo Decoding
• Key idea: iterative decoding (up to 10 iterations for 3G)• There is one decoder for each elementary encoder.• Each decoder estimates the a-posteriori probability (APP) of each data
bit.• The APP’s are used as a priori information by the other decoder.
Decoder#1
Decoder#2
DeMUX
Interleaver
Interleaver
Deinterleaver
systematicdataparitydata
APPAPP
hard bitdecisions
25
49ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Soft-Output Decoding Algorithms
Requirements for Turbo:– Accept Soft-Inputs in the form of a priori probabilities (APP) – Produce APP estimates of the data.– “Soft-Input Soft-Output”
Trellis-Based Estimation Algorithms
ViterbiAlgorithm
MAPAlgorithm
max-log-MAP
log-MAP
Sequence Estimation
Symbol-by-symbolEstimation
Improved SOVA
SOVA
SOVA and log-MAP use modified Add-Compare-Select operations - not onlyselect the maximum path metric - but also need to keep the difference.
Today’s High-performance DSPs are highly MAC-focussed (for filtering in modem applications). Some DSPsprovide hardware support for efficient implementation of Viterbi - none support SOVA or log-MAP
Iterative channel estimation also usesSoft-Input Soft-Output decoders.
50ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
The Maximum A Posteriori (MAP) Algorithm
( ) [ ][ ]
( ){ }
( ){ }
�
���
�
�
� ′
� ′
=��
��
�
=+==
=′
=′
0:,
1:,,,
,,
ln0Pr1Prln
k
k
uss
uss
kk
k ssp
ssp
uuuL
y
y
yy
( ) [ ][ ]0Pr
1Prln===
dddLLog-Likelihood Ratio: ( ) ( )
( )( )( ) ( )dL
dypdyp
ydyd
ydL +�
���
�
==
=���
���
�
==
=01
ln0Pr1Pr
ln
• A Priori value of Pr[d=1],Pr[d=0]• Output of decoder contains additional extrinsic information• The sum of the a priori information and the extrinsic information will be the a priori information for the next-stage of decoding, for both 2nd decoder or 1st decoder in the next iteration
1) uk is the kth bit of the desired data sequence, 2) y be the observed sequence, 3) the state transitions from state s’ at time k-1 to state s at time k, 4) We want to evaluate this LLR for every k
• All 8 exec units used in inner loop - maximum efficiency– 2 MACs per cycle
Hand-coded assembly: 32-tap FIR filter
Assembly syntax more difficult to learn.Hard to get full use of all 8 execution units at once.Software pipelining difficult to implement, and requires longer prolog/epilog (larger
code size).
Courtesy: Gareth Hughes: Bell Labs Australia
30
59ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
shl .s2 b14,2,b14||[a1] mpy .m1 1,a11,a12|| add .s1 a7,a4,a10|| sub .l1x b13,a4,a11|| add .l2 b13,b5,b11|| mpy .m2 1,b10,b12|| ldh .d2 *b4++[2],a7|| ldh .d1 *a5++[2],b13; end of LOOP
Cycle 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
.D1 STH new 8 LDH old0 STH new 0 STH new 8 LDH old0 STH new 0 STH new 8 LDH old0 STH new 0 STH new 8 LDH old0 STH new 0 STH new 8 LDH sd1 STH m[2] STH m[3]
.L1 CMPGT t0 SUB b0 CMPGT t0 SUB b0 CMPGT t0 SUB b0 CMPGT t0 SUB b0 ADD m0 SUB -m0
.L2 CMPGT t8 ADD b8 SUB a8 CMPGT t8 ADD b8 SUB a8 CMPGT t8 ADD b8 SUB a8 CMPGT t8 ADD b8 SUB a8 SUB old SUB -m1 SUB m1 SUB I
.S1 B JLOOP ADD a0 SUB k B JLOOP ADD a0 SUB k B JLOOP ADD a0 SUB k B JLOOP ADD a0 SUB k
.S2 SUB j SHL tr *ADD t0,t8 SUB j SHL tr *ADD t0,t8 SUB j SHL tr *ADD t0,t8 SUB j SHL tr *ADD t0,t8 ADD tr B JLOOP MVK j
Cycle 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
.D1 STH new 0 STH new 8 LDH old0 STH new 0 STH new 8 LDH old0 STH new 0 STH new 8 LDH old0 STH new 0 STH new 8 LDH old0 STH new 0 STH m[0] STH m[1] LDH old1
Separate Address and Data busses - each with pipelined protocol
Arbiter(round-robin)
66ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Memory Hierarchy in MIMD DSPs
Multiple copies of 1 application (e.g. odd/even slot channel equalisation)
Mix of different applications (e.g. equalisation, convolutional decoding)
• Heterogenous mix of applications
• Multiple copies of same software - Shared memory multiprocessing
SRAM
DSP
SRAM
DSP DSPCache
DSPCache
DRAM
2 copies of software 1 copy of software
Flat Memory Architecture vs. Hierarchical Memory Architecture
Inefficient
34
67ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Shared Memory Multiprocessing
64 Semaphores provided for process synchronization
DSP
hit
DSP DSPDSPAccessto shareddata
Snoop(miss)
Snoop(hit)
Snoop(miss)
Coherent TransactionMemoryController
Access to shared datauses coherent transaction.Caches “snoop” the addressand query their tag RAMs.A cache hit prevents the memory controller fromservicing the request.
L-1 cache coherency using a snoopy protocol (modified MESI used)
68ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Daytona Multiprocessor DSP Chip
128-b Split Transaction Bus
HostInterface
I/O &Memory
Controller
Test &JTAG Port
Arbiter
Semaphore
120mmCore Area
100 MHzSpeed
4WPower
Tech
Chip Characteristics2
0.25um
Bell Laboratories Research Chip for 3G Wireless Base-stations / Head-end xDSL
64-b 4-MACSIMD DSP
32-b RISC
Cache Memory
64-b 4-MACSIMD DSP
32-b RISC
Cache Memory
64-b 4-MACSIMD DSP
32-b RISC
Cache Memory
64-b 4-MACSIMD DSP
32-b RISC
Cache Memory
Paper 4.2, ISSCC2000
35
69ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
Photomicrograph of Daytona Test Chip
8KB Re-configurable Memory
DLLSPARC
Vector Unit (RVU)
BUS IN
T
HDS
LRU
I/O Subsystem
ArbiterSemph
Proces
sing Elem
ent (P
E)
Split
Tra
nsac
tion
Bus
Proces
sing Elem
ent (P
E)
Proces
sing Elem
ent (P
E)
Paper 4.2, ISSCC2000
70ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
AcknowledgementsThe following people contributed to the work in this tutorial:
Low Power DSPs for WirelessWanda Gass: Texas InstrumentsMihran Touriguian: Atmel
High Performance DSPs for Wireless InfrastructureBryan Ackland: Bell Labs US - High Perf. DSP ArchitectureGareth Hughes: Bell Labs Australia - LU DSP16210, ‘C6x and Starcore benchmarksBing Xu: Bell Labs Australia - SOVA, MAP, LOG-MAPRan-Hong Yan: Bell Labs UK - 3G WirelessDaytona Team: (J Williams, K.J. Singh, J. Othmer, B. Ackland), Bell Labs US.
36
71ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
References
[1] P. Lapsley, J. Bier, A. Shoham, E. Lee, “DSP Processor Fundamentals,” IEEE Press, New York, 1997.[2] D. Skillikorn, “A Taxonomy for Computer Architectures,” Computer Magazine, Nov. 1988.[3] H. Kabuo, M. Okamoto, I. Tanaka, H. Yasoshima, S. Marui, M. Yamasaki, T. Sugimura, K. Ueda, T. Ishikawa, H. Suzuki, R. Asahi, “An 80 MOPS-Peak High-Speed and Low-Power-Consumption 16-b Digital Signal Processor,” IEEE Journal of Solid-State Circuits, Vol. 31, No. 4, April 1996, pg. 494-503.[4] E. A. Lee, D. G. Messerschmitt, Digital communication, Boston: Kluwer Academic Publishers, 1988.[5] W. Lee et al., “A 1V DSP for Wireless Communications,” Proceedings IEEE International Solid-State Circuits Conference, pp. 92-93, February 1997. [6] S. Lin, and J. Costello Jr., Error Control Coding: Fundamentals and applications, Prentice Hall, New Jersey, 1983[7] Lucent 16000, http://www.lucent.com/micro/ or http://www.lucent.dk/micro/dsp16000/[8] Thomas Parsons, Voice and Speech Processing, McGraw-Hill Book Company, New York, 1987.[9] TMS320C54x User’s Guide, available from the Texas Instruments Literature Response Center.[10] I. Verbauwhede, M. Touriguian, “A Low Power DSP Engine for Wireless Communications,” Journal of VLSI Signal Processing 18, pg. 177-186, 1998, Kluwer Academic Publishers.[11] I. Verbauwhede, M. Touriguian, “Wireless digital signal processors,” Chapter in Digital Signal Processing for Multimedia Systems, Edited by K.K. Parhi, T. Nishitani, Publisher: Marcel Dekker, New York, 1999. [12] M. Okamoto, K. Stone, T. Sawai, H. Kabuo, S. Marui, M. Yamasaki, Y. Uto, Y. Sugisawa, Y. Sasagawa, T. Ishikawa, H. Suzuki, N. Minamida, R. Yamanaka, K. Ueda, “A High Performance DSP Architecture for Next Generation Mobile Phone Systems,” 1998 IEEE DSP Workshop.[13] Lode specifications, available from www.atmel.com[14] M.W. Oliphant, “The Mobile Phone meets the Internet”, IEEE Spectrum pp. 20-28, Aug. 1999.[15] L. C. Godara, “Application of Antenna Arrays to Mobile Communications: Part 1”, Proc. IEEE, Vol 85, No. 7. pp1031-1060, July 97
72ISSCC 2000, DSP Tutorial. Ingrid Verbauwhede, Chris Nicol
References (cont)[16] G. D. Forney, Jr., “Maximum Likelihood Sequence Estimation of Digital Sequences in the Presence of IntersymbolInterference”, IEEE Trans. Inform. Theory, V IT-18, pp. 363-378, May 1972.[17] C. Berrou, A. Glavieux, P. Thitimajshima, “Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes (1)”, Proc. ICC’93, May 1993.[18] J. Hagenauer, P. Hoeher, “A Viterbi Algorithm with Soft-Decision Outputs and its Applications”, Proc. Globecom 89, Nov. 1989, pp.47.1.1-47.1.7[19] L. Bahl, J. Cocke, F. Jelinek, J. Raviv, “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate”, IEEE Trans. Inform. Theory, V IT-20, pp. 284-287, Mar. 1974.[20] J. Turley, H. Hakkaraainen, “TI’s new ‘C6x DSP Screams at 1600 MIPS”, Microprocessor Report, Vol 11, No. 2, pp14, Feb 1997[21] “Starcore Launched First Architecture”, Microprocessor Report, V12, No. 14. pp 22, Oct 1998[22] B. Ackland & P. D’Arcy, “A New Generation of DSP Architectures”, Proc. IEEE CICC99, Paper 25.1.1[23] J. Williams, K.J. Singh, C.J. Nicol, B. Ackland, “A 3.2 GOPs Multiprocessor DSP for Communication Applications”,Proc. IEEE ISSCC2000, Paper 4.2