276 • 2010 IEEE International Solid-State Circuits Conference ISSCC 2010 / SESSION 15 / LOW-POWER PROCESSORS & COMMUNICATION / 15.2 15.2 A 4.5mW Digital Baseband Receiver for Level-A Evolved EDGE Christian Benkeser 1,2 , Andreas Bubenhofer 1 , Qiuting Huang 1,2 1 ETH Zürich, Zürich, Switzerland 2 Advanced Circuit Pursuit (ACP), Zollikon, Switzerland Recent popularity of smart phones and other mobile broadband devices has given fresh impetus to 3G technology and beyond, which provides a key enabler to the mobile industry’s only current growth sector. Despite the high data rates of 3G-enabled devices, good user experience still crucially depends on the avail- ability of a fallback mode such as the GSM/EDGE network. While EDGE provides a respectable substitute where 3G is absent, enhancement is desirable both to lessen the disparity between HSPA and legacy EDGE and to improve 2G-only service in regions where upgrade to 3G is not imminent. Evolved EDGE (E-EDGE) is a recent standard [1] that aims to quintuple the EDGE rate to 1.2Mb/s by phas- ing in a set of extra technical features, including 32QAM and turbo coding. This contribution explores the challenges posed by higher order modulation and describes an efficient digital receiver that preserves the low-cost/low-power attributes of EDGE-enabled devices. As a narrowband system, GSM employs a Gaussian filter to fit each user chan- nel into 200kHz, which is significantly below the symbol rate. This spectral effi- ciency comes at the cost of receiver complexity, where equalization is required to correct the inherent inter-symbol interference (ISI) caused by the Gaussian fil- ter. Equalization of GMSK signals has relied on maximum-likelihood sequence estimation (MLSE), where the channel impulse response (CIR) is estimated and then applied to all possible symbol sequences to emulate the effect of ISI. The closest-matching sequences to the received signals are stored and used to back- track the most probable transmitted sequence. As the GMSK alphabet is small, a DSP implementation of MLSE is not difficult for GSM. When higher-order modulation is introduced to improve throughput, MLSE quickly becomes a bottleneck due to the larger alphabet size M. MLSE complex- ity can be described in terms of the trellis diagram used in the Viterbi algorithm, where the required storage is proportional to the number of trellis states per stage Z, Z=M L-1 , and computation to the branches B between stages, B=M L . The constraint length L must be at least 8 for GSM to handle worst-case test chan- nels. The introduction of EDGE, where M=8 for 8PSK, increases the complexity to such an extent as to make MLSE impractical. Figure 15.2.1 shows the required Z and B versus modulation order. Both explode beyond L=5 and M=8. To contain computation complexity, pre-filtering is typically used to shorten L by concentrating the energy towards the front of the resulting CIR. This is usually combined with suboptimal equalizers such as decision-feedback sequence esti- mators (DFSE), where per-survivor processing further prunes required opera- tions. Nevertheless, some 300MOPS are required for EDGE even as L is reduced to 2. Increasing M further to 32 for E-EDGE increases computation by 16× to 4.8GOPS. Even if this were possible with a 45nm DSP [3], the cost and power advantages that are typical of GSM/EDGE and important for many regions, would be lost. In such performance/cost-limited scenarios, a custom ASIC provides the most versatile vehicle with which to improve die area and power efficiency at architec- tural, algorithmic and circuit levels. Here we present an equalizer-based receiver ASIC that supports GSM/GPRS/EDGE and E-EDGE. Figure 15.2.2 shows the receiver block diagram, where the equalizer and decoder form the key functions of the receiver. The equalizer consists mainly of a pre-filter and an adaptive DFSE. Once per GSM burst, CIR is estimated and used to compute the pre-filter coefficients by linear prediction [2]. The latter are then used to transform the CIR and received samples before they are both fed to the DFSE. To handle 32QAM the pre-filter needs to be ~50% longer for E-EDGE. This leads to 32 taps, whose adaption requires the inversion of a 32×32 com- plex-valued matrix at 2×20b precision. To curtail the required chip area our real- ization exploits the recursive nature of the Levinson algorithm [4], which lends itself to aggressive sharing of hardware resources. The latter have been so reduced to only 2 complex multipliers, 1 sequential divider and 1.3kb of memo- ry. Taking advantage of the flexibility afforded by ASICs, a parallel approach with 8 trellis branches computed concurrently has been taken for DFSE realization as shown in Fig. 15.2.3, which enables the varying number of trellis states related to all 4 GSM standards to be processed in a unified circuit. An area-optimized lookup table storing all possible constellation points is used to modulate the sequence of 8 symbols for GMSK/8PSK/16/32QAM. The resulting 8 complex samples are then convolved with CIR taps. The intensive computation due to the 64 complex multiplications per cycle to process 8 branches in parallel has been drastically reduced in our DFSE (Fig. 15.2.3), where operations have been organ- ized to exploit the fact that branches from the same state are mostly identical, and as such can be pre-processed and buffered for multiple use. To complete the convolution with CIR taps only 1 multiplication and 1 addition is required for each branch. Figure 15.2.4 illustrates the superiority of this approach, where the relative increase of clock cycles due to pre-processing and the reduction of com- plex multiplications are shown for EDGE and E-EDGE. The efficiency improves more than 5-fold for 32QAM. The final part of branch metric computation compares the transformed reference symbols with received samples, which requires Euclidean distance in the com- plex space to be computed. In our design, the branch metrics are approximated with the L 1 -distance (Fig. 15.2.3) at the expense of 1dB SNR loss, to avoid cost- ly square and root operations. The parallel DFSE allows the equalization for 32QAM E-EDGE to be performed in only 23k cycles per burst, which translates to a 40MHz system clock on a 0.6V supply and therefore low dynamic power. The channel decoding half of the receiver ASIC comprises a flexible 64-state Viterbi decoder and a 3GPP turbo decoder that presents implementation chal- lenges no less severe than those outlined above for the equalizer. We have drawn extensively on our recent HSDPA turbo decoder work [5], and take advantage of the smaller block sizes of E-EDGE turbo codes to further improve chip area by 60%. The receiver ASIC supports 23 different modulation and coding schemes. The sequential nature of its operations allows aggressive application of clock gating at higher decision levels to shut down hardware resources in the DFSE or decoder not needed for a particular modulation or decoder phase, and save power. Block-error rate (BLER) performance is shown in Fig. 15.2.5 for E-EDGE modes and the Hilly Terrain (HT) GSM test channel, whose large delay spread makes it most difficult for equalization. In all modes the required BLER<0.1 [1] can be ful- filled. Other key characteristics of the implemented digital receiver are summarized in Fig. 15.2.6. The ASIC core occupies 2.0mm 2 and comprises 184k gates and 40kb of memory. The measured maximum clock frequency is 151MHz at 1.2V. At the 40MHz target frequency the supply can be lowered to 0.6V. The corresponding average power during reception and related GSM burst processing is less than 5mW, even for 32QAM signals. Acknowledgements: The authors gratefully acknowledge the help of A. Burg. This project was fund- ed by CTI, Switzerland (project no 8514.3 NMPP-NM) in collaboration with Advanced Circuit Pursuit (ACP) AG. References: [1] 3GPP TR45.005 V8.5.0, “Radio Transmission and Reception”, June 2009. [2] W. H. Gerstacker, R. Schober, “Equalization concepts for EDGE”, IEEE Transactions on Wireless Communications, pp.190-199, Jan. 2002. [3] Texas Instruments, “Two new TMS320C550x low power DSPs from Texas Instruments offer up to 40 percent additional battery life for voice, biometrics, medical and other portable devices”, Press Release, June 2009. [4] J. G. Proakis, “Digital Communications”, 4th Edition, McGraw-Hill, 2001. [5] C. Benkeser, et al, “Design and Optimization of an HSDPA Turbo Decoder ASIC”, IEEE Journal of Solid-State Circuits, Vol.44, no.1, pp.98-106, Jan 2009. 978-1-4244-6034-2/10/$26.00 ©2010 IEEE