ISSCC 2010 / SESSION 15 / LOW-POWER PROCESSORS ...

276 • 2010 IEEE International Solid-State Circuits Conference

ISSCC 2010 / SESSION 15 / LOW-POWER PROCESSORS & COMMUNICATION / 15.2

15.2 A 4.5mW Digital Baseband Receiver for Level-A Evolved EDGE

Christian Benkeser1,2, Andreas Bubenhofer1, Qiuting Huang1,2

1ETH Zürich, Zürich, Switzerland 2Advanced Circuit Pursuit (ACP), Zollikon, Switzerland

Recent popularity of smart phones and other mobile broadband devices hasgiven fresh impetus to 3G technology and beyond, which provides a key enablerto the mobile industry’s only current growth sector. Despite the high data ratesof 3G-enabled devices, good user experience still crucially depends on the avail-ability of a fallback mode such as the GSM/EDGE network. While EDGE providesa respectable substitute where 3G is absent, enhancement is desirable both tolessen the disparity between HSPA and legacy EDGE and to improve 2G-onlyservice in regions where upgrade to 3G is not imminent. Evolved EDGE (E-EDGE)is a recent standard [1] that aims to quintuple the EDGE rate to 1.2Mb/s by phas-ing in a set of extra technical features, including 32QAM and turbo coding. Thiscontribution explores the challenges posed by higher order modulation anddescribes an efficient digital receiver that preserves the low-cost/low-powerattributes of EDGE-enabled devices.

As a narrowband system, GSM employs a Gaussian filter to fit each user chan-nel into 200kHz, which is significantly below the symbol rate. This spectral effi-ciency comes at the cost of receiver complexity, where equalization is requiredto correct the inherent inter-symbol interference (ISI) caused by the Gaussian fil-ter. Equalization of GMSK signals has relied on maximum-likelihood sequenceestimation (MLSE), where the channel impulse response (CIR) is estimated andthen applied to all possible symbol sequences to emulate the effect of ISI. Theclosest-matching sequences to the received signals are stored and used to back-track the most probable transmitted sequence. As the GMSK alphabet is small,a DSP implementation of MLSE is not difficult for GSM.

When higher-order modulation is introduced to improve throughput, MLSEquickly becomes a bottleneck due to the larger alphabet size M. MLSE complex-ity can be described in terms of the trellis diagram used in the Viterbi algorithm,where the required storage is proportional to the number of trellis states perstage Z, Z=ML-1, and computation to the branches B between stages, B=ML. Theconstraint length L must be at least 8 for GSM to handle worst-case test chan-nels. The introduction of EDGE, where M=8 for 8PSK, increases the complexityto such an extent as to make MLSE impractical. Figure 15.2.1 shows therequired Z and B versus modulation order. Both explode beyond L=5 and M=8.

To contain computation complexity, pre-filtering is typically used to shorten L byconcentrating the energy towards the front of the resulting CIR. This is usuallycombined with suboptimal equalizers such as decision-feedback sequence esti-mators (DFSE), where per-survivor processing further prunes required opera-tions. Nevertheless, some 300MOPS are required for EDGE even as L is reducedto 2. Increasing M further to 32 for E-EDGE increases computation by 16× to4.8GOPS. Even if this were possible with a 45nm DSP [3], the cost and poweradvantages that are typical of GSM/EDGE and important for many regions, wouldbe lost.

In such performance/cost-limited scenarios, a custom ASIC provides the mostversatile vehicle with which to improve die area and power efficiency at architec-tural, algorithmic and circuit levels. Here we present an equalizer-based receiverASIC that supports GSM/GPRS/EDGE and E-EDGE.

Figure 15.2.2 shows the receiver block diagram, where the equalizer anddecoder form the key functions of the receiver. The equalizer consists mainly ofa pre-filter and an adaptive DFSE. Once per GSM burst, CIR is estimated andused to compute the pre-filter coefficients by linear prediction [2]. The latter arethen used to transform the CIR and received samples before they are both fed tothe DFSE. To handle 32QAM the pre-filter needs to be ~50% longer for E-EDGE.This leads to 32 taps, whose adaption requires the inversion of a 32×32 com-plex-valued matrix at 2×20b precision. To curtail the required chip area our real-ization exploits the recursive nature of the Levinson algorithm [4], which lendsitself to aggressive sharing of hardware resources. The latter have been soreduced to only 2 complex multipliers, 1 sequential divider and 1.3kb of memo-ry.

Taking advantage of the flexibility afforded by ASICs, a parallel approach with 8trellis branches computed concurrently has been taken for DFSE realization asshown in Fig. 15.2.3, which enables the varying number of trellis states relatedto all 4 GSM standards to be processed in a unified circuit. An area-optimizedlookup table storing all possible constellation points is used to modulate thesequence of 8 symbols for GMSK/8PSK/16/32QAM. The resulting 8 complexsamples are then convolved with CIR taps. The intensive computation due to the64 complex multiplications per cycle to process 8 branches in parallel has beendrastically reduced in our DFSE (Fig. 15.2.3), where operations have been organ-ized to exploit the fact that branches from the same state are mostly identical,and as such can be pre-processed and buffered for multiple use. To complete theconvolution with CIR taps only 1 multiplication and 1 addition is required foreach branch. Figure 15.2.4 illustrates the superiority of this approach, where therelative increase of clock cycles due to pre-processing and the reduction of com-plex multiplications are shown for EDGE and E-EDGE. The efficiency improvesmore than 5-fold for 32QAM.

The final part of branch metric computation compares the transformed referencesymbols with received samples, which requires Euclidean distance in the com-plex space to be computed. In our design, the branch metrics are approximatedwith the L1-distance (Fig. 15.2.3) at the expense of 1dB SNR loss, to avoid cost-ly square and root operations.

The parallel DFSE allows the equalization for 32QAM E-EDGE to be performed inonly 23k cycles per burst, which translates to a 40MHz system clock on a 0.6Vsupply and therefore low dynamic power.

The channel decoding half of the receiver ASIC comprises a flexible 64-stateViterbi decoder and a 3GPP turbo decoder that presents implementation chal-lenges no less severe than those outlined above for the equalizer. We have drawnextensively on our recent HSDPA turbo decoder work [5], and take advantage ofthe smaller block sizes of E-EDGE turbo codes to further improve chip area by60%.

The receiver ASIC supports 23 different modulation and coding schemes. Thesequential nature of its operations allows aggressive application of clock gatingat higher decision levels to shut down hardware resources in the DFSE ordecoder not needed for a particular modulation or decoder phase, and savepower.

Block-error rate (BLER) performance is shown in Fig. 15.2.5 for E-EDGE modesand the Hilly Terrain (HT) GSM test channel, whose large delay spread makes itmost difficult for equalization. In all modes the required BLER<0.1 [1] can be ful-filled.

Other key characteristics of the implemented digital receiver are summarized inFig. 15.2.6. The ASIC core occupies 2.0mm2 and comprises 184k gates and 40kbof memory. The measured maximum clock frequency is 151MHz at 1.2V. At the40MHz target frequency the supply can be lowered to 0.6V. The correspondingaverage power during reception and related GSM burst processing is less than5mW, even for 32QAM signals.

Acknowledgements:The authors gratefully acknowledge the help of A. Burg. This project was fund-ed by CTI, Switzerland (project no 8514.3 NMPP-NM) in collaboration withAdvanced Circuit Pursuit (ACP) AG.

References:[1] 3GPP TR45.005 V8.5.0, “Radio Transmission and Reception”, June 2009.[2] W. H. Gerstacker, R. Schober, “Equalization concepts for EDGE”, IEEETransactions on Wireless Communications, pp.190-199, Jan. 2002.[3] Texas Instruments, “Two new TMS320C550x low power DSPs from TexasInstruments offer up to 40 percent additional battery life for voice, biometrics,medical and other portable devices”, Press Release, June 2009.[4] J. G. Proakis, “Digital Communications”, 4th Edition, McGraw-Hill, 2001.[5] C. Benkeser, et al, “Design and Optimization of an HSDPA Turbo DecoderASIC”, IEEE Journal of Solid-State Circuits, Vol.44, no.1, pp.98-106, Jan 2009.

978-1-4244-6034-2/10/$26.00 ©2010 IEEE

277DIGEST OF TECHNICAL PAPERS •

ISSCC 2010 / February 9, 2010 / 2:00 PM

Figure 15.2.1: Complexity of trellis-based equalizers and the concept of ISI. Figure 15.2.2: Block diagram of the implemented receiver ASIC.

Figure 15.2.3: Parallel DFSE architecture and low-complexity solution withpre-processing.

Figure 15.2.5: Implemented receiver BLER of E-EDGE modulation and codingschemes after channel decoding for HT channel profile. Figure 15.2.6: Key characteristics of the receiver implementation.

Figure 15.2.4: Efficiency gain achieved with pre-processing in the DFSEimplementation (8 trellis branches processed in parallel).

15

• 2010 IEEE International Solid-State Circuits Conference 978-1-4244-6034-2/10/$26.00 ©2010 IEEE

ISSCC 2010 PAPER CONTINUATIONS

Figure 15.2.7: Die micrograph.

ISSCC 2010 / SESSION 15 / LOW-POWER PROCESSORS ...

Documents