A. Balatsoukas-Stimming * , N. Preyss * , A. Cevrero * , A. Burg * , C. Roth † * Department Of Electrical Engineering, EPFL, Lausanne, Switzerland, † Integrated Systems Laboratory, ETHZ, Zurich, Switzerland E-mail: {alexios.balatsoukas, nicholas.preyss, alessandro.cevrero, andreas.burg}@epfl.ch, [email protected] A Parallelized Layered QC-LDPC Decoder for IEEE 802.11ad IEEE 802.11ad: IEEE 802.11ad: Multi-gigabit throughput for wireless LAN >10x times higher throughput offers new wireless opportunities: • Raw HD streaming • Instant media library sync • Ultra-high throughput IP links IEEE 802.11ad requires high-speed baseband signal processing at low power consumption Challenges: • Complex channel conditions due to high delay spread • Large device variations of analog front-ends • Gbit/s bit rate (1.54Gbps mandatory, 3.08 & 6.16Gbps optional) Layered Decoding Schedule • Performance highly affected by message-passing schedule • Flooding Schedule: all variable-to-check messages updated, then all check-to- variable messages updated. Highly parallelizable, slow convergence • Layered Schedule: variable-to-check and check-to-variable messages for 1 st check node, then 2 nd , etc. Fast convergence, low parallelism • Twofold reduction in number of iterations ≈ twofold reduction in energy consumption • But: very challenging to achieve multi-gigabit throughput Parallelized Decoder Architecture 802.11ad Channel Coding: QC-LDPC Codes Application to IEEE 802.11ad requires: • Very high throughput • Low power Solution: Highly conflicting requirements! Control Sequence Optimization • Z=42 and N=16 are fixed by IEEE802.11ad • I=5 (number of iterations) is fixed to satisfy QoS requirements • L (sequence length) can be optimized Optimization Method Detailed view of COMB unit • Layered decoding & early termination → low power • Additional parallelization → high throughput • Re-arrange rows and columns of parity-check matrix to minimize pipeline stalls → higher throughput • (Almost) free lunch: only LLR access order changes • Parallelization overhead: ̴10% • Average length reduction: ̴13% • Reduction in max. length: ̴13% • Result: 3.12 Gbps min. throughput Conclusion Low-power layered LDPC decoder is feasible when multi- gigabit throughput is required Careful assignment of processing units to parity-check matrix blocks leads to very efficient parallelization [1] Draft Standard for Information Technology, Draft Amendment 5, IEEE P802.11ad/D5.0, IEEE Std., Sep. 2011. [2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MIT Press, 1963. [3] M. Weiner, B. Nikolic, and Z. Zhang, “LDPC decoder architecture for high- data rate personal-area networks,” in Proc. IEEE Int. Symp. Circuits and Systems, 2011. [3] M. P. C. Fossorier, “Quasi-cyclic low-density parity-check codes from circulant permutation matrices,” IEEE Trans. Inf. Theory, vol. 50, no. 8, 2004. [5] C. Studer, N. Preyss, C. Roth, and A. Burg, “Configurable high throughput decoder architecture for quasi-cyclic LDPC codes,” in Proc. 42nd Asilomar Conf. on Signals, Systems and Computers, 2008. [6] E. Sharon, S. Litsyn, and J. Goldberger, “Efficient serial message-passing schedules for LDPC decoding,” IEEE Trans. Inf. Theory, vol. 53, no. 11, Nov. 2007. [7] H. Shirani-Mehr, T. Mohsenin, and B. Baas, “A reduced routing network architecture for partial parallel LDPC decoders,” in Proc. 45th Asilomar Conf. on Signals, Systems and Computers, 2011. References: • Doubly parallelized architecture: 1. Two blocks of every row of H processed simultaneously 2. COMB unit combines partial results to ensure proper operation • Processing units and shifters doubled • No additional memory required • Simple routing preserved • Throughput: Synthesis Results • Parity-check matrix 1. Consists of 42x42 cyclic permutation matrices 2. Illustrates parity constraints imposed on bits by the code Parity-check matrix of rate ½ code 3. Is used to decode codewords via Min-Sum (MS) message-passing 4. Represents graph in which columns are variable nodes, rows are check nodes 5. Various coding rates are used depending on channel conditions Reference Architecture • MIN and SEL units perform basic functions of MS decoding on Z independent rows of H simultaneously • Parity-check matrix blocks are processed serially in a pipeline • Memory reads/writes dictated by control sequence, data dependencies avoided by pipeline stalling